To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://github.com/unslothai/unsloth#installation-instructions---conda).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save) (eg for Llama.cpp).

**[NEW] Llama-3 8b is trained on a crazy 15 trillion tokens! Llama-2 was 2 trillion.**

Use our [Llama-3 8b Instruct](https://colab.research.google.com/drive/1XamvWYinY6FOSX9GLvnqSjjsNflxdhNc?usp=sharing) notebook for conversational style finetunes.

# 1. Load Model, Google Drive and Predefined Data

## Load model

In [None]:
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes

* We support Llama, Mistral, Phi-3, Gemma, Yi, DeepSeek, Qwen, TinyLlama, Vicuna, Open Hermes etc
* We support 16bit LoRA or 4bit QLoRA. Both 2x faster.
* `max_seq_length` can be set to anything, since we do automatic RoPE Scaling via [kaiokendev's](https://kaiokendev.github.io/til) method.
* With [PR 26037](https://github.com/huggingface/transformers/pull/26037), we support downloading 4bit models **4x faster**! [Our repo](https://huggingface.co/unsloth) has Llama, Mistral 4bit models.
* [**NEW**] We make Phi-3 Medium / Mini **2x faster**! See our [Phi-3 Medium notebook](https://colab.research.google.com/drive/1hhdhBa1j_hsymiW9m-WzxQtgqTH_NHqi?usp=sharing)

In [None]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/mistral-7b-v0.3-bnb-4bit",      # New Mistral v3 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/llama-3-8b-bnb-4bit",           # Llama-3 15 trillion tokens model 2x faster!
    "unsloth/llama-3-8b-Instruct-bnb-4bit",
    "unsloth/llama-3-70b-bnb-4bit",
    "unsloth/Phi-3-mini-4k-instruct",        # Phi-3 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/mistral-7b-bnb-4bit",
    "unsloth/gemma-7b-bnb-4bit",             # Gemma 2.2x faster!
] # More models at https://huggingface.co/unsloth



🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


In [None]:
# load default model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

==((====))==  Unsloth 2024.8: Fast Llama patching. Transformers = 4.44.1.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.1+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.26.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/198 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2024.8 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


## Load Google Drive and Predefined Data

In [None]:
from google.colab import drive
import json

# Mount Google Drive
drive.mount('/content/drive')

# Adjust the file path as necessary
fq_direct= '/content/drive/My Drive/thesis_vu/colab/Data/train_data/clean_fq/' #change path
conver_direct= '/content/drive/My Drive/thesis_vu/colab/Data/train_data/split_conversations/' #change path

with open("/content/drive/My Drive/thesis_vu/colab/Data/predefined_data/icf_cate_comb.json", 'r', encoding='utf-8') as file:
    ICF_cate_comb = json.load(file)


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


<a name="Data"></a>
# 2. Training Data Preparation


We now use the Alpaca dataset from [yahma](https://huggingface.co/datasets/yahma/alpaca-cleaned), which is a filtered version of 52K of the original [Alpaca dataset](https://crfm.stanford.edu/2023/03/13/alpaca.html). You can replace this code section with your own data prep.

**[NOTE]** To train only on completions (ignoring the user's input) read TRL's docs [here](https://huggingface.co/docs/trl/sft_trainer#train-on-completions-only).

**[NOTE]** Remember to add the **EOS_TOKEN** to the tokenized output!! Otherwise you'll get infinite generations!

If you want to use the `llama-3` template for ShareGPT datasets, try our conversational [notebook](https://colab.research.google.com/drive/1XamvWYinY6FOSX9GLvnqSjjsNflxdhNc?usp=sharing).

For text completions like novel writing, try this [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing).

In [None]:
def format_query(data):
    formatted_string = "; ".join([f"role: {item['role']}, content: {item['content']}" for item in data])
    return formatted_string

def generate_prompt(activity, ICF_func, conversation, fq_type):
  """
  generate prompt

  Para::
  function: function type, being "communication", "self-care", "mobility"
  ICF_func: ICF category dictionary of one function
  conversation: conversation history
  fq_type: question type, being "emotion" or "function"
  """

  if "sub_activities" in ICF_func[activity]:
      sub_activity_str = f"""can ask more in details about {ICF_func[activity]["sub_activities"]} and """
  else:
      sub_activity_str = ""

  # for con_ls in convern_data[activity]:
  #   for n,conversation in enumerate(con_ls):
      # query about function level
  if fq_type == "function":
                query_0 = [
                    {"role": "system", "content": "You need to play the roles of a caretaker (C) and an elderly patient (P)."},  # roles to play
                    {"role": "system", "content": f"""You will be given a conversation history between a caretaker and a patient about one activity. You need to ask follow-up questions to continue the conversation. The questions you ask should have around 2 to 6 utterances, and each utterance should be completed and have less than 20 tokens.

                The format is as follows:
                C: utterance (asking follow-up questions)
                P: utterance (respond naturally)
                C: utterance (asking follow-up questions)
                P: utterance (respond naturally)
                ..."""
                    },
                    {"role": "system", "content": f"""Follow-up questions {sub_activity_str} must be able to evoke answers informing about the function level, such as questions evoking answers about mild, moderate, severe, or complete performance difficulty."""
                    },

                    {"role": "user", "content": f"The patients can answer naturally and describe the performance difficulty of performing {activity}. The performance difficulty can be slight, fair, severe, or complete."},  # shifting details in the conversations

                ]

            # query about emotional feedback
  elif fq_type == "emotion":
                query_0 = [
                    {"role": "system", "content": "You need to play the roles of a caretaker (C) and an elderly patient (P)."},  # roles to play
                    {"role": "system", "content": f"""You will be given a conversation history between a caretaker and a patient about one activity. You need to ask follow-up questions to continue the conversation. The questions you ask should have around 2 to 6 utterances, and each utterance should be completed and have less than 20 tokens.

                The format is as follows:
                C: utterance (asking follow-up questions)
                P: utterance (respond naturally)
                C: utterance (asking follow-up questions)
                P: utterance (respond naturally)
                ..."""
                    },
                    {"role": "system", "content": f"""Follow-up questions evoke answers informing about emotional feedback, such as questions evoking answers about positive or negative feelings about the activity."""
                    },

                    {"role": "user", "content": f"The patients can answer naturally and describe their feeling about performing {activity}. The feeling can be positive or negative."},  # shifting details in the conversations


                ]


  query_str = format_query(query_0)

  return query_str



In [None]:
def training_func(function, fq_type, prompt_method, fq_direct, conver_direct, ICF_cate_comb):
  """
  collect training data based on function, ty, prompt_method

  Return::
  dictionary of training data
  Para::
  function:function type, being "communication", "self-care", "mobility"
  fq_type: question type, being "emotion" or "function"
  prompt_method: "zeroshot" or "fewshot"
  fq_direct: directory to fq data
  conver_direct: directory to conversation data
  ICF_cate_comb:ICF category of ICF and examples and definition
  """

  training_func = {"instruction": [], "input": [], "output": []}
  # Construct file paths
  fq_path0 = fq_direct + prompt_method + f"/{function}_{fq_type}.json"
  conver_path0 = conver_direct + f"{function}/train_1.json"  # CHANGE TO TRAIN!!!

  # Load JSON files
  with open(fq_path0) as fqfile, open(conver_path0) as converfile:
      d_fq = json.load(fqfile)
      d_conver = json.load(converfile)

  # Check the number of conversations
  count = 0
  for (act, con_ls), (k, conver_dict) in zip(d_conver.items(), d_fq.items()):
      count += 1
      if not (len(con_ls) == len(list(conver_dict.keys()))):
          print(f"The number of conversations in conver data ({len(con_ls)}) and fq data ({len(list(conver_dict.keys()))}) do not match.")
          break

      # for con_ind, fq_ls in conver_dict.items():
      #   if not fq_ls:



  # Assuming ICF_cate_comb is a predefined dictionary containing some mappings
  ICF_func = ICF_cate_comb[f"{function}"]

  # Iterate through the conver data
  for act, conver_ls in d_conver.items():
      fq_act = d_fq[act]
      for n, conversation in enumerate(conver_ls):
          # ori_conver_num = len(conver_ls)

          # print((len(fq_act[str(n)])))
          # break
      # break

          # Number of fq items for each conversation

          rep_num = len(fq_act[str(n)])
          training_func["input"].extend([conversation] * rep_num)
          query_0 = generate_prompt(act, ICF_func, conversation, fq_type)
          training_func["instruction"].extend([query_0] * rep_num)
          training_func["output"].extend(fq_act[str(n)])
          # print(training_data["input"][0])

      # check amount
          if not len(training_func["input"]) == len(training_func["instruction"]) == len(training_func["output"]):
            print(f"amount not match of {act}, {n}th conversation")

          # else:
          #   print(f"Original conversation count: {ori_conver_num}, revised conversation count: {len(training_func['input'])}, deleted conversations: {ori_conver_num - len(training_func['input'])}")




  print(f"finish training data of {function}, {fq_type}, {prompt_method}")

  return training_func





### format training data

In [None]:
# format all of the instruction, input, output of each function and each type and save all the data as json
import json

# "zero-shot", multiple fq types, multiple functions

training_0 = {"instruction": [], "input": [], "output": []}

prompt_method = "zeroshot"
functions = ["communication", "self-care", "mobility"]
types = ["function", "emotion"]
# functions = [ "mobility"]
# types = ["function", "emotion"]


training_0={"instruction": [], "input":[], "output":[]}
for func in functions:
  for ty in types:
    training_function = training_func(func, ty, prompt_method, fq_direct, conver_direct, ICF_cate_comb)
    for k in list(training_function.keys()):
       training_0[k].extend(training_function[k])


print(len(training_0["input"]), len(training_0["instruction"]),len(training_0["output"]))



train_path_0 = '/content/drive/My Drive/thesis_vu/colab/Data/train_data/format_data/training_zeroshot.json'
with open (train_path_0, 'w') as f:
    json.dump(training_0, f)
    print("training data of zeroshot has been saved successfully!")


finish training data of communication, function, zeroshot
finish training data of communication, emotion, zeroshot
finish training data of self-care, function, zeroshot
finish training data of self-care, emotion, zeroshot
finish training data of mobility, function, zeroshot
finish training data of mobility, emotion, zeroshot
7926 7926 7926
training data of zeroshot has been saved successfully!


In [None]:
# save training_0 as csv
import pandas as pd

with open(train_path_0, 'r') as file:
    data = json.load(file)

# check
print(len(data["input"]))

df = pd.DataFrame(data)
print(df)

out_csv0='/content/drive/My Drive/thesis_vu/colab/Data/train_data/format_data/training_zeroshot.csv'
df.to_csv(out_csv0, index=False)
print(f"training data as csv has been saved successfully!")

7926
                                            instruction  \
0     role: system, content: You need to play the ro...   
1     role: system, content: You need to play the ro...   
2     role: system, content: You need to play the ro...   
3     role: system, content: You need to play the ro...   
4     role: system, content: You need to play the ro...   
...                                                 ...   
7921  role: system, content: You need to play the ro...   
7922  role: system, content: You need to play the ro...   
7923  role: system, content: You need to play the ro...   
7924  role: system, content: You need to play the ro...   
7925  role: system, content: You need to play the ro...   

                                                  input  \
0     C: How did your conversation with your daughte...   
1     C: How did your conversation with your daughte...   
2     C: How did your conversation with your daughte...   
3     C: How did your conversation with your daugh

In [None]:
# load csv and format prompts as training for llama
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN

def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    # print(f"the length of instruction is {len(instructions)}")
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }

# out_csv0= '/content/drive/My Drive/thesis_vu/colab/Data/training_zeroshot.csv'
from datasets import load_dataset
dataset0 = load_dataset('csv', data_files=out_csv0)
dataset0 = dataset0["train"]
dataset0 = dataset0.map(formatting_prompts_func, batched = True,)
print(dataset0)

Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/7927 [00:00<?, ? examples/s]

Dataset({
    features: ['instruction', 'input', 'output', 'text'],
    num_rows: 7927
})


In [None]:
print(dataset0['instruction'][0])

role: system, content: You need to play the roles of a caretaker (C) and an elderly patient (P).; role: system, content: You will be given a conversation history between a caretaker and a patient about one activity. You need to ask follow-up questions to continue the conversation. The questions you ask should have around 2 to 6 utterances, and each utterance should be completed and have less than 20 tokens.

                The format is as follows:
                C: utterance (asking follow-up questions)
                P: utterance (respond naturally)
                C: utterance (asking follow-up questions)
                P: utterance (respond naturally)
                ...; role: system, content: Follow-up questions can ask more in details about Comprehending the literal meaning conveyed by simple spoken messages;
Comprehending the literal and implied meaning conveyed by complex spoken messages;
Comprehending the human voice or sound without literal meaning. and  must be able to 

In [None]:
print(dataset0['input'][0])

C: How did your conversation with your daughter go yesterday?

P: It was lovely, we chatted about her new job plans and her latest adventures.

C: Did you enjoy catching up with her?

P: Yes, it brightened up my day to hear her voice and share her life updates.

C: Communication must be important to you.

P: Absolutely, it keeps me connected and brings me joy to share in their lives.


In [None]:
print(dataset0['output'][0])

C: Do you sometimes have difficulty understanding what your daughter is saying during your conversations?  P: Fortunately, I find it quite easy to comprehend her messages.  C: Have there been any instances where you struggled to hear or process her spoken messages clearly?  P: No, her voice is clear to me, and I can follow along easily.  C: Do you ever face any challenges when trying to respond or engage actively in conversations with her?  P: Not really, I can respond promptly, and we have engaging conversations without any difficulty.


<a name="Train"></a>
# 3. Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [None]:
# learning_rate = 2e-4
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset0, #CHANGE DATASET!!!
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 60,
        learning_rate = 2e-4, #TRY DIFFERENT LEARNING RATE!!!
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

Map (num_proc=2):   0%|          | 0/7927 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


In [None]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.748 GB.
5.613 GB of memory reserved.


In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 7,927 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
1,2.1149
2,2.109
3,2.1893
4,2.0417
5,1.9608
6,1.7119
7,1.4975
8,1.3336
9,1.0677
10,0.9123


In [None]:
#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

742.7573 seconds used for training.
12.38 minutes used for training.
Peak reserved memory = 7.656 GB.
Peak reserved memory for training = 2.043 GB.
Peak reserved memory % of max memory = 51.912 %.
Peak reserved memory for training % of max memory = 13.853 %.


<a name="Generate Output"></a>
# 4. Generate Output

```
# This is formatted as code
```

Inference
Let's run the model! You can change the instruction and input - leave the output blank!

##  Test Data Preparation

instruction & input

In [None]:
def format_query(data):
    formatted_string = "; ".join([f"role: {item['role']}, content: {item['content']}" for item in data])
    return formatted_string

def generate_prompt4test(activity, ICF_func, conversation, fq_type):
  """
  generate prompt for test

  Para::
  function: function type, being "communication", "self-care", "mobility"
  ICF_func: ICF category dictionary of one function
  conversation: conversation history
  fq_type: question type, being "emotion" or "function"
  """

  if "sub_activities" in ICF_func[activity]:
      sub_activity_str = f"""can ask more in details about {ICF_func[activity]["sub_activities"]} and """
  else:
      sub_activity_str = ""

  # for con_ls in convern_data[activity]:
  #   for n,conversation in enumerate(con_ls):
      # query about function level
  if fq_type == "function":
                query_0 =  """You will be given a conversation history between a caretaker and a patient about one activity. You need to ask follow-up questions to evoke answers informing about the function level. The questions you ask should have around 2 to 6 utterances, and each utterance should be completed and have less than 20 tokens.

                The format is as follows:
                C: utterance (asking follow-up questions)
                P: utterance (respond naturally)
                C: utterance (asking follow-up questions)
                P: utterance (respond naturally)
                ..."""


            # query about emotional feedback
  elif fq_type == "emotion":
                query_0 ="""You will be given a conversation history between a caretaker and a patient about one activity. You need to ask follow-up questions to evoke answers about positive or negative feelings about the activity. The questions you ask should have around 2 to 6 utterances, and each utterance should be completed and have less than 20 tokens.

                The format is as follows:
                C: utterance (asking follow-up questions)
                P: utterance (respond naturally)
                C: utterance (asking follow-up questions)
                P: utterance (respond naturally)
                ..."""

  # query_str = format_query(query_0)

  return query_0


In [None]:
def test_func(function, fq_type, prompt_method, conver_direct, ICF_cate_comb):
  """
  collect training data based on function, ty, prompt_method

  Return::
  dictionary of test data of instruction and input
  Para::
  function:function type, being "communication", "self-care", "mobility"
  fq_type: question type, being "emotion" or "function"
  prompt_method: "zeroshot" or "fewshot"
  conver_direct: directory to conversation data
  ICF_cate_comb:ICF category of ICF and examples and definition
  """

  test_func = {"instruction": [], "input": [], "output": []}

  conver_path0 = f"{conver_direct}{function}.json" #CHANGE TO TEST DATA!!!

  # Load JSON files
  with open(conver_path0) as converfile:

      d_conver = json.load(converfile)


  ICF_func = ICF_cate_comb[f"{function}"]

  # Iterate through the conver data
  for act, conver_ls in d_conver.items():

      for n, conversation in enumerate(conver_ls):
          test_func["input"].append(conversation)
          query_0 = generate_prompt4test(act, ICF_func, conversation, fq_type)
          test_func["instruction"].append(query_0)

      # check amount
          if not len(test_func["input"]) == len(test_func["instruction"]):
            print(f"amount not match of {act}, {n}th conversation")



  print(f"finish test data of {function}, {fq_type}, {prompt_method}")

  return test_func

In [None]:
prompt_method = "zeroshot"
test_conver= '/content/drive/My Drive/thesis_vu/colab/Data/test_data/in_conver/'

result = {}
functions = ["communication", "self-care", "mobility"]
types = ["function", "emotion"]


for function in functions:
  result[function]={}
  for fq_type in types:
    test_func_dic = test_func(function, fq_type, prompt_method, test_conver, ICF_cate_comb)
    result[function][fq_type]=test_func_dic

finish test data of communication, function, zeroshot
finish test data of communication, emotion, zeroshot
finish test data of self-care, function, zeroshot
finish test data of self-care, emotion, zeroshot
finish test data of mobility, function, zeroshot
finish test data of mobility, emotion, zeroshot


In [None]:
# check structure
print(result.keys())
print(result["communication"].keys())
print(result["communication"]["function"].keys())


dict_keys(['communication', 'self-care', 'mobility'])
dict_keys(['function', 'emotion'])
dict_keys(['instruction', 'input', 'output'])


In [None]:
# check conversation amount

## from results dict
instru_num = 0
for k,v in result.items():
    instru_num += len(v["function"]["instruction"])
    # print(k1)
    # print(len(v1["instruction"]))
print(f"amount of all instruction is: {instru_num}")
print()

## from dataset
al_conver = 0
for function in functions:
  conver_path0 = f"{test_conver}{function}.json" #CHANGE TO TEST DATA!!!

  with open(conver_path0) as converfile:
    d_conver = json.load(converfile)
  count = 0
  for act, conver_ls in d_conver.items():
    count += len(conver_ls)
  print(f"amount of {function} is: {count}")

  al_conver += count

print(f"amount of all conversations are: {al_conver}")

amount of all instruction is: 113

amount of communication is: 36
amount of self-care is: 25
amount of mobility is: 52
amount of all conversations are: 113


## Generate output


In [None]:
# #uncomment if you need to load the fine-tuned model
# ## load model


# alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

# ### Instruction:
# {}

# ### Input:
# {}

# ### Response:
# {}"""

# if True:
#     from unsloth import FastLanguageModel
#     model, tokenizer = FastLanguageModel.from_pretrained(
#         model_name = "YijingZZZ/thesis_llama3_train_zero", # YOUR MODEL YOU USED FOR TRAINING (learning rate: 2e-4)
#         max_seq_length = max_seq_length,
#         dtype = dtype,
#         load_in_4bit = load_in_4bit,
#     )
#     # FastLanguageModel.for_inference(model) # Enable native 2x faster inference

In [None]:
def extract_response(text):
    response_marker = "### Response:"
    start_index = text.find(response_marker)

    # If the marker is found, extract the text after it
    if start_index != -1:
        # Adjust the start_index to get the text after the marker
        start_index += len(response_marker)
        # Extract and return the text after the marker, trimming any leading/trailing whitespace
        return text[start_index:].strip()
    else:
      return False

def generate_response(instruction, input, model):
  """
  generate response with given instruction and input

  Para::
  - instruction: prompt instruction
  - input:conversation history
  - model: fine-tuned model
  """


  FastLanguageModel.for_inference(model) # Enable native 2x faster inference
  inputs = tokenizer(
  [
      alpaca_prompt.format(
          instruction, # instruction
          input, # input
          "", # output - leave this blank for generation!
      )
  ], return_tensors = "pt").to("cuda")

  outputs = model.generate(**inputs, max_new_tokens = 128, use_cache = True)
  output = tokenizer.batch_decode(outputs)
  # print(output)
  # print(f"the type of output is {type(output)}")
  if len(output) == 1:
    cleaned_output = extract_response(output[0])
    if not cleaned_output:
      print( f"Marker '### Response:' not found in the resp of {input}.")
      return False
  else:
    print("output list has more than one responses")
    return False

  return cleaned_output



In [None]:
# generate whole fq conversations

from tqdm import tqdm

for func in functions:
  for ty in types:
    test_func_dic = result[func][ty]
    if test_func_dic["output"]:
      print(len(test_func_dic["output"]))
      test_func_dic["output"] = []


    for instruction, input_text in tqdm(zip(test_func_dic["instruction"], test_func_dic["input"]), total=len(test_func_dic["instruction"])):
      output = generate_response(instruction, input_text, model)
      if output:
        test_func_dic["output"].append(output)
      else:
        num = test_func_dic["input"].index(input)
        print( f"No valid response was generated:{func}_{ty}_{num}.")



100%|██████████| 36/36 [04:27<00:00,  7.44s/it]
100%|██████████| 36/36 [03:56<00:00,  6.57s/it]
100%|██████████| 25/25 [02:57<00:00,  7.09s/it]
100%|██████████| 25/25 [02:53<00:00,  6.95s/it]
100%|██████████| 52/52 [06:36<00:00,  7.62s/it]
100%|██████████| 52/52 [06:08<00:00,  7.09s/it]


In [None]:
output_raw= '/content/drive/My Drive/thesis_vu/colab/Data/test_data/out_fq/out_raw/ftoutput_zero_raw.json'
with open (output_raw, 'w') as f:
    json.dump(result, f)
    print("raw ft output data of zeroshot has been saved successfully!")

raw ft output data of zeroshot has been saved successfully!


In [None]:
# extract questions only

import re

def clean_str(string):
    """
    Extract only the questions in a list.

    Parameters:
    string (str): The input string containing multiple questions.

    Returns:
    list: A list of each question string [Q1, Q2,...] in the FQ set.
    """

    # Regular expression to find sentences starting with 'C:'
    pattern = r'(?:\n*\n*C: (.*?))(?=\n*\n*P:|$)'

    # Find all matches using the pattern
    matches = re.findall(pattern, string, re.DOTALL)

    # Clean up each match: strip leading/trailing whitespace and replace newlines with spaces
    cleaned_matches = [re.sub(r'<\|end_of_text\|>', '', match).strip().replace('\n', ' ') for match in matches]

    return cleaned_matches


In [None]:

for func in functions:
  for ty in types:
    test_func_dic = result[func][ty]
    splited_output = []
    for string in test_func_dic["output"]:
      # print(type(string))

      new_str  = clean_str(string)
      splited_output.append(new_str)

    # check amount
    if not len(splited_output) == len(test_func_dic["output"]):
      print ("amount of splitted output and tests not match ")

    else:
      test_func_dic["output"] = splited_output

    # # check type
    for i in test_func_dic["output"]:
      if not isinstance(i, list):
        print(i)
        break

In [None]:
# check one trial

# test_trial_in = result["communication"]["emotion"]["input"][0]
# print(f"input is: {test_trial_in}")
# print()

# test_trial_out = result["communication"]["emotion"]["output"][0]
# print(f"output is: {test_trial_out}")

In [None]:
# save cleaned data
output_clean = '/content/drive/My Drive/thesis_vu/colab/Data/test_data/out_fq/out_clean/ftoutput_zero_clean.json'
with open (output_clean, 'w') as f:
    json.dump(result, f)
    print("Clean ft output data of zeroshot has been saved successfully!")

Clean ft output data of zeroshot has been saved successfully!


<a name="Save"></a>
# 5. Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

### save model

In [None]:
# model.save_pretrained("lora_model") # Local saving
# tokenizer.save_pretrained("lora_model")
model.push_to_hub(" ", token = " ") # Online saving
tokenizer.push_to_hub(" ", token = " ") # Online saving

  0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/168M [00:00<?, ?B/s]

Saved model to https://huggingface.co/YijingZZZ/thesis_llama3_train_zero


README.md:   0%|          | 0.00/5.18k [00:00<?, ?B/s]

## END