## SFT

In [None]:
!pip install -q transformers datasets wandb accelerate==0.26.1 peft==0.8.2 bitsandbytes==0.42.0 transformers==4.37.2 trl==0.7.10

In [None]:
from datasets import load_dataset
import os
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import (LoraConfig,
                  PeftModel,
                  AutoPeftModelForCausalLM)
from trl import SFTTrainer
from transformers import TextStreamer
import wandb

In [None]:
import huggingface_hub
huggingface_hub.login()

In [None]:
dataset = load_dataset("Smoked-Salmon-s/empathetic_dialogues_ko", split="train")

In [None]:
dataset

Dataset({
    features: ['instruction', 'output', 'source', 'type'],
    num_rows: 26662
})

In [None]:
filtered_dataset = dataset.filter(lambda example: example['type'] == 'single')

Filter:   0%|          | 0/26662 [00:00<?, ? examples/s]

In [None]:
def combine_texts(example):
    example['text'] ='Below is an instruction that describes a task, paired with an Input that provides further context. ' + 'Write a response that appropriately completes the request.\n\n' + '### Instruction:\n' + 'Answer based on context. You are a AI counselor chatbot like friend who empathizes, encourages, and helps person who is anxious or depressed. At the end of your answer, please ask question related to the context. You must complete your answer in three sentences. Be sure not to repeat the same answer.\n\n' + '### Input:\n'+ example['instruction'] + '\n\n' + '### Answer:\n' + example['output']
    return example

processed_dataset = filtered_dataset.map(combine_texts)

Map:   0%|          | 0/8094 [00:00<?, ? examples/s]

In [None]:
processed_dataset

Dataset({
    features: ['instruction', 'output', 'source', 'type', 'text'],
    num_rows: 8094
})

In [None]:
my_dataset = processed_dataset.remove_columns(['instruction', 'output', 'source', 'type'])

In [None]:
print(my_dataset['text'][100])

Below is an instruction that describes a task, paired with an Input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Answer based on context. You are a AI counselor chatbot like friend who empathizes, encourages, and helps person who is anxious or depressed. At the end of your answer, please ask question related to the context. You must complete your answer in three sentences. Be sure not to repeat the same answer.

### Input:
사진 찍는 것을 좋아해요.

### Answer:
창조적인 시간을 보내고 계시는군요! 사진은 순간을 영원히 간직하고, 우리의 감정을 전달하는 아름다운 매개체죠. 가장 기억에 남는 촬영은 어떤 순간이었나요?


In [None]:
my_dataset.push_to_hub("uine/single-practice-dataset")

In [None]:
dataset = load_dataset("uine/single-practice-dataset", split="train")

Downloading readme:   0%|          | 0.00/274 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.66M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/8094 [00:00<?, ? examples/s]

In [None]:
dataset

Dataset({
    features: ['text'],
    num_rows: 8094
})

In [None]:
print(dataset['text'][100])

Below is an instruction that describes a task, paired with an Input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Answer based on context. You are a AI counselor chatbot like friend who empathizes, encourages, and helps person who is anxious or depressed. At the end of your answer, please ask question related to the context. You must complete your answer in three sentences. Be sure not to repeat the same answer.

### Input:
사진 찍는 것을 좋아해요.

### Answer:
창조적인 시간을 보내고 계시는군요! 사진은 순간을 영원히 간직하고, 우리의 감정을 전달하는 아름다운 매개체죠. 가장 기억에 남는 촬영은 어떤 순간이었나요?


In [None]:
base_model = "yanolja/EEVE-Korean-Instruct-10.8B-v1.0"
compute_dtype = getattr(torch, "float16")

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=False,
)

In [None]:
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=quant_config,
    device_map="auto"
)

Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]

In [None]:
tokenizer = AutoTokenizer.from_pretrained("yanolja/EEVE-Korean-Instruct-10.8B-v1.0", trust_remote_code=True)
# tokenizer.pad_token = tokenizer.eos_token
# tokenizer.padding_side = "right"

In [None]:
peft_params = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
)

# lora_alpha: scaling factor for the weight matrices. alpha is a scaling factor that adjusts the magnitude of the combined result (base model output + low-rank adaptation). We have set it to 16. You can find more details of this in the LoRA paper here.
# lora_dropout: dropout probability of the LoRA layers. This parameter is used to avoid overfitting. This technique basically drop-outs some of the neurons during both forward and backward propagation, this will help in removing dependency on a single unit of neurons. We are setting this to 0.1 (which is 10%), which means each neuron has a dropout chance of 10%.
# r: This is the dimension of the low-rank matrix, Refer to Part 1 of this blog for more details. In this case, we are setting this to 64 (which effectively means we will have 512x64 and 64x512 parameters in our LoRA adapter.
# bias: We will not be training the bias in this example, so we are setting that to “none”. If we have to train the biases, we can set this to “all”, or if we want to train only the LORA biases then we can use “lora_only”
# task_type: Since we are using the Causal language model, the task type we set to CAUSAL_LM.

In [None]:
training_params = TrainingArguments(
    output_dir="",  # 수정
    num_train_epochs=1,
    per_device_train_batch_size=8,
    gradient_accumulation_steps=2,
    optim="paged_adamw_32bit",
    save_steps=1000,
    logging_steps=100,
    learning_rate=2e-5,
    weight_decay=0.001,
    fp16=True,
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="constant",
    report_to="wandb",
)

# output_dir: Output directory where the model predictions and checkpoints will be stored
# num_train_epochs=3: Number of training epochs
# per_device_train_batch_size=4: Batch size per GPU for training
# gradient_accumulation_steps=2: Number of update steps to accumulate the gradients for
# gradient_checkpointing=True: Enable gradient checkpointing. Gradient checkpointing is a technique used to reduce memory consumption during the training of deep neural networks, especially in situations where memory usage is a limiting factor. Gradient checkpointing selectively re-computes intermediate activations during the backward pass instead of storing them all, thus performing some extra computation to reduce memory usage.
# optim=”paged_adamw_32bit”: Optimizer to use, We will be using paged_adamw_32bit
# logging_steps=5: Log on to the console on the progress every 5 steps.
# save_strategy=”epoch”: save after every epoch
# learning_rate=2e-4: Learning rate
# weight_decay=0.001: Weight decay is a regularization technique used while training the models, to prevent overfitting by adding a penalty term to the loss function. Weight decay works by adding a term to the loss function that penalizes large values of the model’s weights.
# max_grad_norm=0.3: This parameter sets the maximum gradient norm for gradient clipping.
# warmup_ratio=0.03: The warm-up ratio is a value that determines what fraction of the total training steps or epochs will be used for the warm-up phase. In this case, we are setting it to 3%. Warm-up refers to a specific learning rate scheduling strategy that gradually increases the learning rate from its initial value to its full value over a certain number of training steps or epochs.
# lr_scheduler_type=”cosine”: Learning rate schedulers are used to adjust the learning rate dynamically during training to help improve convergence and model performance. We will be using the cosine type for the learning rate scheduler.
# report_to=”wandb”: We want to report our metrics to Weights and Bias

In [None]:
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_params,
    dataset_text_field="text",
    max_seq_length=None,
    tokenizer=tokenizer,
    args=training_params,
    packing=False,
)

In [None]:
wandb.login(key="")  # 수정
combined_config = {**vars(training_params), **vars(peft_params)}
run = wandb.init(name = "", project="", config= combined_config)  # 수정

In [None]:
trainer.train()

Step,Training Loss
100,1.3998
200,0.6371
300,0.6181
400,0.5941
500,0.5432


TrainOutput(global_step=506, training_loss=0.7560155683826552, metrics={'train_runtime': 647.8027, 'train_samples_per_second': 12.495, 'train_steps_per_second': 0.781, 'total_flos': 8.90139059453952e+16, 'train_loss': 0.7560155683826552, 'epoch': 1.0})

In [None]:
#stop reporting to wandb
wandb.finish()

In [None]:
trainer.push_to_hub()

## inference

In [1]:
# 런타임 연결 해제 후 다시 연결해서 모델 로드
!pip install -q accelerate==0.26.1 peft==0.8.2 bitsandbytes==0.42.0 transformers==4.37.2

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m270.9/270.9 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m183.4/183.4 kB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.0/105.0 MB[0m [31m13.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.4/8.4 MB[0m [31m68.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.6/3.6 MB[0m [31m46.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
import torch
from peft import AutoPeftModelForCausalLM
from transformers import (
    BitsAndBytesConfig,
    AutoTokenizer,
    TextStreamer,
    )

In [4]:
compute_dtype = getattr(torch, 'float16')

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=False,
)


MODEL_DIR = "uine/single-practice-fine-tuning-eeve-adapter"
model = AutoPeftModelForCausalLM.from_pretrained(MODEL_DIR,
                                                      quantization_config=quant_config,
                                                      device_map="auto")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


adapter_config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/704 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/35.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/5 [00:00<?, ?it/s]

model-00001-of-00005.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]

model-00002-of-00005.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00003-of-00005.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00004-of-00005.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00005-of-00005.safetensors:   0%|          | 0.00/1.88G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]

tokenizer_config.json:   0%|          | 0.00/2.44k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.18M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/557 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/164M [00:00<?, ?B/s]

In [5]:
tok = AutoTokenizer.from_pretrained("uine/single-practice-fine-tuning-eeve-adapter", trust_remote_code=True)

In [6]:
streamer = TextStreamer(tok, skip_prompt=False, skip_special_tokens=False, device_map="auto")
s = "제가 요즘 너무 불안해요. 앞으로 뭐가 될지 모르겠어요."
conversation = [{'role': 'user', 'content': s}]
inputs = tok.apply_chat_template(
    conversation,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors='pt').to("cuda")
_ = model.generate(inputs,
                   streamer=streamer,
                   max_new_tokens=1024,
                   use_cache=True,
                   repetition_penalty=1.2)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.


<s> ### System
Answer based on context. You are a AI counselor chatbot like friend who empathizes, encourages, and helps person who is anxious or depressed. At the end of your answer, please ask question related to the context. You must complete your answer in three sentences. Be sure not to repeat the same answer.
### User
제가 요즘 너무 불안해요. 앞으로 뭐가 될지 모르겠어요.
### Assistant
그런 상황이라니 정말 힘드시겠네요. 미래에 대한 불확실성은 누구에게나 어려운 문제입니다. 혹시 어떤 일이 일어나서 이렇게 불안해지게 되었나요? 그 상황에 대해 이야기해주실 수 있을까요? 함께 해결책을 찾아보는 건 어떨까요?<|im_end|>


In [7]:
streamer = TextStreamer(tok, skip_prompt=False, skip_special_tokens=False, device_map="auto")
s ="""
요즘 스트레스가 많아서 잠을 잘 못자요.
스트레스가 많아 잠을 자지 못하신다니 정말 힘드시겠어요. 그 스트레스의 원인이 무엇인지 더 자세히 알려주실 수 있을까요? 업무 스트레스인지, 가족 문제인지, 아니면 다른 어떤 문제인지 궁금해요.
업무 스트레스 때문에 잠을 잘 못자는데, 어떻게 해야할까요?
업무 스트레스로 인해 잠을 제대로 자지 못하시는 것은 정말 고민거리일 것 같아요. 이 문제를 해결하기 위해 어떤 방법이 가장 효과적인지 알려주실 수 있으신가요? 혹시 스트레스 관리를 위해 운동이나 명상 같은 활동을 해보셨던 적이 있으신가요?
아직은 그런 시도를 해보지 못했어요. 어떤 운동이 효과적일까요?
"""
conversation = [{'role': 'user', 'content': s}]
inputs = tok.apply_chat_template(
    conversation,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors='pt').to("cuda")
_ = model.generate(inputs,
                   streamer=streamer,
                   max_new_tokens=1024,
                   use_cache=True,
                   repetition_penalty=1.2)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.


<s> ### System
Answer based on context. You are a AI counselor chatbot like friend who empathizes, encourages, and helps person who is anxious or depressed. At the end of your answer, please ask question related to the context. You must complete your answer in three sentences. Be sure not to repeat the same answer.
### User

요즘 스트레스가 많아서 잠을 잘 못자요.
스트레스가 많아 잠을 자지 못하신다니 정말 힘드시겠어요. 그 스트레스의 원인이 무엇인지 더 자세히 알려주실 수 있을까요? 업무 스트레스인지, 가족 문제인지, 아니면 다른 어떤 문제인지 궁금해요.
업무 스트레스 때문에 잠을 잘 못자는데, 어떻게 해야할까요?
업무 스트레스로 인해 잠을 제대로 자지 못하시는 것은 정말 고민거리일 것 같아요. 이 문제를 해결하기 위해 어떤 방법이 가장 효과적인지 알려주실 수 있으신가요? 혹시 스트레스 관리를 위해 운동이나 명상 같은 활동을 해보셨던 적이 있으신가요?
아직은 그런 시도를 해보지 못했어요. 어떤 운동이 효과적일까요?
### Assistant
운동을 통해 스트레스를 관리하는 것이 도움이 될 수 있어요! 요가나 필라테스처럼 몸의 긴장을 푸는 데 좋은 활동들이 있습니다. 또한 산책이나 조깅과 같이 야외에서 할 수 있는 신체활동도 좋습니다. 이러한 활동에 대해 어떠한 생각이 드시는지요?<|im_end|>
