To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://github.com/unslothai/unsloth#installation-instructions---conda).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save) (eg for Llama.cpp).

**[NEW] Llama-3 8b is trained on a crazy 15 trillion tokens! Llama-2 was 2 trillion.**

Use our [Llama-3 8b Instruct](https://colab.research.google.com/drive/1XamvWYinY6FOSX9GLvnqSjjsNflxdhNc?usp=sharing) notebook for conversational style finetunes.

# 1. Load Model, Google Drive and Predefined Data

## Load model

In [None]:
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes

* We support Llama, Mistral, Phi-3, Gemma, Yi, DeepSeek, Qwen, TinyLlama, Vicuna, Open Hermes etc
* We support 16bit LoRA or 4bit QLoRA. Both 2x faster.
* `max_seq_length` can be set to anything, since we do automatic RoPE Scaling via [kaiokendev's](https://kaiokendev.github.io/til) method.
* With [PR 26037](https://github.com/huggingface/transformers/pull/26037), we support downloading 4bit models **4x faster**! [Our repo](https://huggingface.co/unsloth) has Llama, Mistral 4bit models.
* [**NEW**] We make Phi-3 Medium / Mini **2x faster**! See our [Phi-3 Medium notebook](https://colab.research.google.com/drive/1hhdhBa1j_hsymiW9m-WzxQtgqTH_NHqi?usp=sharing)

In [None]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/mistral-7b-v0.3-bnb-4bit",      # New Mistral v3 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/llama-3-8b-bnb-4bit",           # Llama-3 15 trillion tokens model 2x faster!
    "unsloth/llama-3-8b-Instruct-bnb-4bit",
    "unsloth/llama-3-70b-bnb-4bit",
    "unsloth/Phi-3-mini-4k-instruct",        # Phi-3 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/mistral-7b-bnb-4bit",
    "unsloth/gemma-7b-bnb-4bit",             # Gemma 2.2x faster!
] # More models at https://huggingface.co/unsloth



🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


In [None]:
# load default model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

==((====))==  Unsloth 2024.8: Fast Llama patching. Transformers = 4.44.1.
   \\   /|    GPU: NVIDIA L4. Max memory: 22.168 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.1+cu121. CUDA = 8.9. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.26.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/198 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2024.8 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


## Load Google Drive and Predefined Data

In [None]:
from google.colab import drive
import json

# Mount Google Drive
drive.mount('/content/drive')

# Adjust the file path as necessary
fq_direct= '/content/drive/My Drive/thesis_vu/colab/Data/train_data/clean_fq/' #change path
conver_direct= '/content/drive/My Drive/thesis_vu/colab/Data/train_data/split_conversations/' #change path

with open("/content/drive/My Drive/thesis_vu/colab/Data/predefined_data/icf_cate_comb.json", 'r', encoding='utf-8') as cfile, \
     open("/content/drive/My Drive/thesis_vu/colab/Data/predefined_data/examples.json", 'r', encoding='utf-8') as efile:
    ICF_cate_comb = json.load(cfile)
    examples = json.load(efile)


Mounted at /content/drive


# 2. Generate Output

## Test data Preparation

In [None]:
def format_query(data):
    formatted_string = "; ".join([f"role: {item['role']}, content: {item['content']}" for item in data])
    return formatted_string

def generate_prompt(activity, ICF_func, conversation, fq_type):
  """
  generate prompt

  Para::
  function: function type, being "communication", "self-care", "mobility"
  ICF_func: ICF category dictionary of one function
  conversation: conversation history
  fq_type: question type, being "emotion" or "function"
  """

  if "sub_activities" in ICF_func[activity]:
      sub_activity_str = f"""can ask more in details about {ICF_func[activity]["sub_activities"]} and """
  else:
      sub_activity_str = ""

  # for con_ls in convern_data[activity]:
  #   for n,conversation in enumerate(con_ls):
      # query about function level
  if fq_type == "function":
                query_0 = [
                    {"role": "system", "content": "You need to play the roles of a caretaker (C) and an elderly patient (P)."},  # roles to play
                    {"role": "system", "content": f"""You will be given a conversation history between a caretaker and a patient about one activity. You need to ask follow-up questions to continue the conversation. The questions you ask should have around 2 to 6 utterances, and each utterance should be completed and have less than 20 tokens.

                The format is as follows:
                C: utterance (asking follow-up questions)
                P: utterance (respond naturally)
                C: utterance (asking follow-up questions)
                P: utterance (respond naturally)
                ..."""
                    },
                    {"role": "system", "content": f"""Follow-up questions {sub_activity_str} must be able to evoke answers informing about the function level, such as questions evoking answers about mild, moderate, severe, or complete performance difficulty."""
                    },

                    {"role": "user", "content": f"The patients can answer naturally and describe the performance difficulty of performing {activity}. The performance difficulty can be slight, fair, severe, or complete."},  # shifting details in the conversations

                ]

            # query about emotional feedback
  elif fq_type == "emotion":
                query_0 = [
                    {"role": "system", "content": "You need to play the roles of a caretaker (C) and an elderly patient (P)."},  # roles to play
                    {"role": "system", "content": f"""You will be given a conversation history between a caretaker and a patient about one activity. You need to ask follow-up questions to continue the conversation. The questions you ask should have around 2 to 6 utterances, and each utterance should be completed and have less than 20 tokens.

                The format is as follows:
                C: utterance (asking follow-up questions)
                P: utterance (respond naturally)
                C: utterance (asking follow-up questions)
                P: utterance (respond naturally)
                ..."""
                    },
                    {"role": "system", "content": f"""Follow-up questions evoke answers informing about emotional feedback, such as questions evoking answers about positive or negative feelings about the activity."""
                    },

                    {"role": "user", "content": f"The patients can answer naturally and describe their feeling about performing {activity}. The feeling can be positive or negative."},  # shifting details in the conversations


                ]


  query_str = format_query(query_0)

  return query_str


In [None]:
def test_func(function, fq_type, prompt_method, conver_direct, ICF_cate_comb):
  """
  collect training data based on function, ty, prompt_method

  Return::
  dictionary of test data of instruction and input
  Para::
  function:function type, being "communication", "self-care", "mobility"
  fq_type: question type, being "emotion" or "function"
  prompt_method: "zeroshot" or "fewshot"
  conver_direct: directory to conversation data
  ICF_cate_comb:ICF category of ICF and examples and definition
  """

  test_func = {"instruction": [], "input": [], "output": []}

  conver_path0 = f"{conver_direct}{function}.json" #CHANGE TO TEST DATA!!!

  # Load JSON files
  with open(conver_path0) as converfile:

      d_conver = json.load(converfile)


  ICF_func = ICF_cate_comb[f"{function}"]

  # Iterate through the conver data
  for act, conver_ls in d_conver.items():

      for n, conversation in enumerate(conver_ls):
          test_func["input"].append(conversation)
          query_0 = generate_prompt(act, ICF_func, conversation, fq_type)
          test_func["instruction"].append(query_0)

      # check amount
          if not len(test_func["input"]) == len(test_func["instruction"]):
            print(f"amount not match of {act}, {n}th conversation")



  print(f"finish test data of {function}, {fq_type}, {prompt_method}")

  return test_func

In [None]:
prompt_method = "fewshot"
test_conver= '/content/drive/My Drive/thesis_vu/colab/Data/test_data/in_conver/'

result = {}
functions = ["communication", "self-care", "mobility"]
types = ["function", "emotion"]


for function in functions:
  result[function]={}
  for fq_type in types:
    test_func_dic = test_func(function, fq_type, prompt_method, test_conver, ICF_cate_comb)
    result[function][fq_type]=test_func_dic

finish test data of communication, function, fewshot
finish test data of communication, emotion, fewshot
finish test data of self-care, function, fewshot
finish test data of self-care, emotion, fewshot
finish test data of mobility, function, fewshot
finish test data of mobility, emotion, fewshot


In [None]:
print(result.keys())
print(result["communication"].keys())
print(result["communication"]["function"].keys())


dict_keys(['communication', 'self-care', 'mobility'])
dict_keys(['function', 'emotion'])
dict_keys(['instruction', 'input', 'output'])


In [None]:
# check conversation amount

## from results dict
instru_num = 0
for k,v in result.items():
    instru_num += len(v["function"]["instruction"])
    # print(k1)
    # print(len(v1["instruction"]))
print(f"amount of all instruction is: {instru_num}")
print()

## from dataset
al_conver = 0
for function in functions:
  conver_path0 = f"{test_conver}{function}.json" #CHANGE TO TEST DATA!!!

  with open(conver_path0) as converfile:
    d_conver = json.load(converfile)
  count = 0
  for act, conver_ls in d_conver.items():
    count += len(conver_ls)
  print(f"amount of {function} is: {count}")

  al_conver += count

print(f"amount of all conversations are: {al_conver}")

amount of all instruction is: 113

amount of communication is: 36
amount of self-care is: 25
amount of mobility is: 52
amount of all conversations are: 113


## Output Generation

In [None]:
def extract_response(text):
    response_marker = "### Response:"
    start_index = text.find(response_marker)

    # If the marker is found, extract the text after it
    if start_index != -1:
        # Adjust the start_index to get the text after the marker
        start_index += len(response_marker)
        # Extract and return the text after the marker, trimming any leading/trailing whitespace
        return text[start_index:].strip()
    else:
      return False

def generate_response(instruction, input, model):
  """
  generate response with given instruction and input

  Para::
  - instruction: prompt instruction
  - input:conversation history
  - model: fine-tuned model
  """


  FastLanguageModel.for_inference(model) # Enable native 2x faster inference
  inputs = tokenizer(
  [
      alpaca_prompt.format(
          instruction, # instruction
          input, # input
          "", # output - leave this blank for generation!
      )
  ], return_tensors = "pt").to("cuda")

  outputs = model.generate(**inputs, max_new_tokens = 128, use_cache = True)
  output = tokenizer.batch_decode(outputs)
  # print(output)
  # print(f"the type of output is {type(output)}")
  if len(output) == 1:
    cleaned_output = extract_response(output[0])
    if not cleaned_output:
      print( f"Marker '### Response:' not found in the resp of {input}.")
      return False
  else:
    print("output list has more than one responses")
    return False

  return cleaned_output



In [None]:
# generate whole fq conversations

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

from tqdm import tqdm

for func in functions:
  for ty in types:
    test_func_dic = result[func][ty]
    if test_func_dic["output"]:
      print(len(test_func_dic["output"]))
      test_func_dic["output"] = []


    for instruction, input_text in tqdm(zip(test_func_dic["instruction"], test_func_dic["input"]), total=len(test_func_dic["instruction"])):
      output = generate_response(instruction, input_text, model)
      if output:
        test_func_dic["output"].append(output)
      else:
        num = test_func_dic["input"].index(input)
        print( f"No valid response was generated:{func}_{ty}_{num}.")



100%|██████████| 36/36 [05:26<00:00,  9.08s/it]
100%|██████████| 36/36 [05:13<00:00,  8.71s/it]
100%|██████████| 25/25 [03:42<00:00,  8.88s/it]
100%|██████████| 25/25 [03:26<00:00,  8.27s/it]
100%|██████████| 52/52 [07:24<00:00,  8.55s/it]
100%|██████████| 52/52 [07:25<00:00,  8.56s/it]


In [None]:
output_raw= '/content/drive/My Drive/thesis_vu/colab/Data/test_data/out_fq/out_raw/output_zero_raw.json'
with open (output_raw, 'w') as f:
    json.dump(result, f)
    print("raw output data of zeroshot has been saved successfully!")

raw output data of zeroshot has been saved successfully!


## Data Process

In [None]:
# extract questions only

import re

def clean_str(string):
    """
    Extract only the questions in a list.

    Parameters:
    string (str): The input string containing multiple questions.

    Returns:
    list: A list of each question string [Q1, Q2,...] in the FQ set.
    """

    # Regular expression to find sentences starting with 'C:'
    pattern = r'(?:\n*\n*C: (.*?))(?=\n*\n*P:|$)'

    # Find all matches using the pattern
    matches = re.findall(pattern, string, re.DOTALL)

    # Clean up each match: strip leading/trailing whitespace and replace newlines with spaces
    cleaned_matches = [re.sub(r'<\|end_of_text\|>', '', match).strip().replace('\n', ' ') for match in matches]

    return cleaned_matches

In [None]:
for func in functions:
  for ty in types:
    test_func_dic = result[func][ty]
    splited_output = []
    for string in test_func_dic["output"]:
      # print(type(string))

      new_str  = clean_str(string)
      splited_output.append(new_str)

    # check amount
    if not len(splited_output) == len(test_func_dic["output"]):
      print ("amount of splitted output and tests not match ")

    else:
      test_func_dic["output"] = splited_output

    # # check type
    for i in test_func_dic["output"]:
      if not isinstance(i, list):
        print(i)
        break

In [None]:
# save cleaned data
output_clean = '/content/drive/My Drive/thesis_vu/colab/Data/test_data/out_fq/out_clean/output_zero_clean.json'
with open (output_clean, 'w') as f:
    json.dump(result, f)
    print("Clean output data of zeroshot has been saved successfully!")

Clean output data of zeroshot has been saved successfully!


## END

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/u54VK8m8tk) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Zephyr DPO 2x faster [free Colab](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing)
2. Llama 7b 2x faster [free Colab](https://colab.research.google.com/drive/1lBzz5KeZJKXjvivbYvmGarix9Ao6Wxe5?usp=sharing)
3. TinyLlama 4x faster full Alpaca 52K in 1 hour [free Colab](https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing)
4. CodeLlama 34b 2x faster [A100 on Colab](https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing)
5. Mistral 7b [free Kaggle version](https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook)
6. We also did a [blog](https://huggingface.co/blog/unsloth-trl) with 🤗 HuggingFace, and we're in the TRL [docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth)!
7. `ChatML` for ShareGPT datasets, [conversational notebook](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing)
8. Text completions like novel writing [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing)
9. [**NEW**] We make Phi-3 Medium / Mini **2x faster**! See our [Phi-3 Medium notebook](https://colab.research.google.com/drive/1hhdhBa1j_hsymiW9m-WzxQtgqTH_NHqi?usp=sharing)

<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Support our work if you can! Thanks!
</div>