## Mistral 7B Fine Tuning

### Installing and importing necessary packages

In [None]:
%%capture
import torch
major_version, minor_version = torch.cuda.get_device_capability()

!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
if major_version >= 7:
    print("new")

    !pip install --no-deps packaging ninja einops flash-attn xformers trl peft accelerate bitsandbytes
else:

    !pip install --no-deps xformers trl peft accelerate bitsandbytes
pass

In [None]:
!pip install weave
!pip install wandb

Collecting weave
  Obtaining dependency information for weave from https://files.pythonhosted.org/packages/55/f0/2e7d81f47440e8a839d8c2e737702c5f4c5f8c8050fec40e2ef83c3c83e7/weave-0.50.1-py3-none-any.whl.metadata
  Downloading weave-0.50.1-py3-none-any.whl.metadata (6.5 kB)
Collecting openai>=1.0.0 (from weave)
  Obtaining dependency information for openai>=1.0.0 from https://files.pythonhosted.org/packages/32/54/e50ba99d35dd951f5ca94c54cb7fe2f492c8a3a87e5979e21194cccd1977/openai-1.25.0-py3-none-any.whl.metadata
  Downloading openai-1.25.0-py3-none-any.whl.metadata (21 kB)
Collecting tiktoken>=0.4.0 (from weave)
  Obtaining dependency information for tiktoken>=0.4.0 from https://files.pythonhosted.org/packages/63/ec/3856d242f580d0d755c3be9024dd11b17b3363dd0c7c3000e3bdecb40d84/tiktoken-0.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
  Downloading tiktoken-0.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting pydantic>=2.0

In [None]:
# Setup Weights & Biases for experiment tracking.
import wandb
wandb.login()             # create account on wandb and copy paste the token here.

True

In [None]:
wandb.init(project="soap-comparision", name = "mistral-7b")

### Model Fine Tuning

In [None]:
# import packages
from unsloth import FastLanguageModel
import torch

# Define the configuration for the model including maximum sequence length and datatype.
max_seq_length = 2048
dtype = None
load_in_4bit = True


fourbit_models = [
    "unsloth/mistral-7b-bnb-4bit",
    "unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
    "unsloth/llama-2-7b-bnb-4bit",
    "unsloth/llama-2-13b-bnb-4bit",
    "unsloth/codellama-34b-bnb-4bit",
    "unsloth/tinyllama-bnb-4bit",
    "unsloth/gemma-7b-bnb-4bit",
    "unsloth/gemma-2b-bnb-4bit",
]

# Load a pretrained Large Language Model from the Unsloth library. This step initializes the Mistral 7B model,
# specifically configured to work with 4-bit quantized weights for efficient computation.
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/mistral-7b-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,

)

  from .autonotebook import tqdm as notebook_tqdm


==((====))==  Unsloth: Fast Mistral patching release 2024.4
   \\   /|    GPU: NVIDIA L4. Max memory: 21.964 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.1.0+cu121. CUDA = 8.9. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. Xformers = 0.0.22.post7. FA = True.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth


Unused kwargs: ['quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.


In [None]:
# Enhance the model using Parameter-Efficient Fine-Tuning (PEFT) technique
model = FastLanguageModel.get_peft_model(
    model,
    r = 8,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",

    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)

Unsloth 2024.4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [None]:
# Define a structured prompt template for generating AI responses based on a given task.
soap_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""


EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN

# Tokenizes and formats dataset examples into a structured text format for model training.
# This function takes in examples with 'Instruction', 'context', and 'response' fields,
# applies the predefined soap_prompt template, and appends an End-Of-Sequence token to signify the end of each input.
def formatting_prompts_func(examples):
    instructions = examples["Instruction"]
    inputs       = examples["context"]
    outputs      = examples["response"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):

        text = soap_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

# imports the dataset from hugging face
from datasets import load_dataset
dataset = load_dataset("m-masood/synthetic-soap-dataset", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)

Map: 100%|████████████████████████████████████████████████████████| 1000/1000 [00:00<00:00, 3638.03 examples/s]


In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments

# Initialize the SFTTrainer for fine-tuning the language model with specific training configurations.
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False,
    args = TrainingArguments(
        report_to = "wandb",
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 20,
        learning_rate = 2e-4,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

Map (num_proc=2): 100%|████████████████████████████████████████████| 1000/1000 [00:03<00:00, 279.45 examples/s]
max_steps is given, it will override any value given in num_train_epochs


In [None]:
# Monitoring gpu status
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")
wandb.log({"GPU Stats": gpu_stats.name,
           "Max Memory":max_memory,
           "Reserved Memory":start_gpu_memory })

GPU = NVIDIA L4. Max memory = 21.964 GB.
4.547 GB of memory reserved.


In [None]:
# Starting model fine tuning
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 1,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 20
 "-____-"     Number of trainable parameters = 20,971,520


Step,Training Loss
1,1.4795
2,1.5363
3,1.4746
4,1.4486
5,1.3639
6,1.3622
7,1.4582
8,1.3864
9,1.2242
10,1.3614


In [None]:
# Monitoring stats for GPU utilization during training

used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

wandb.log({"Training Time(Seconds)": trainer_stats.metrics['train_runtime'],
           "Max Memory":max_memory,
           "Memory Reserved for Training(GB)":used_memory_for_lora ,
          "Percentage of Max Memory Reserved for Training(%)":lora_percentage })

390.6233 seconds used for training.
6.51 minutes used for training.
Peak reserved memory = 6.076 GB.
Peak reserved memory for training = 1.529 GB.
Peak reserved memory % of max memory = 27.663 %.
Peak reserved memory for training % of max memory = 6.961 %.


### Model Inference

In [None]:
# Testing model by doing inference with sample data

FastLanguageModel.for_inference(model) # Enable native 2x faster inference

inputs = tokenizer(
[
    soap_prompt.format(
       "Below is an dialogue that shows conversation between doctor and patient, convert it into clinical soap format.", # instruction
        """D: How may I help you? P: I'm here because um I've been having some pain in my left knee for the past two months and it's not getting better. It feels stiff and um I just haven't been able to uh, you know, use it as well, as well as I was using it before um and it's just limited some of my daily activities. D: OK, um, and where, uh so it's, the pain is in your left knee. Where are you feeling this pain specifically? Is it at the front of the knee, the sides, or or the the back? Could you point to it? P: It feels like it's mostly on the front. D: OK. P: Like deep within that um kneecap. D: OK, and you said the pain started two months ago? P: Yes, well, it's always been a little like tender. Um but now it's more painful. D: OK. And so, so has it been getting worse? P: I would say so, slowly getting worse. D: OK. Uh and when you get uh pain in the left knee, how long does it typically last for? P: It usually hurts while I'm doing, while I'm moving it, or just after, but if I if I rest, the pain eventually goes away. Um but when I first wake up in the morning, that joint feels stiff. And then when I start using it, using it more, it's less stiff, but it becomes painful. D: OK, so you have some stiffness in the morning? P: I do. D: OK, and how long does it last for? Like 30 minutes, 60 minutes or or longer? P: The stiffness or pain? D: Yeah, the the stiffness. P: Uh the stiffness goes away in like yeah 15 to 30 minutes. D: OK, and how would you describe the pain, um in terms of its character? P: It feels, it feels uh, I guess most of the time it's like it's like a dull kind of pain, but it can be sharp. D: OK, and is there anything that makes the pain worse? P: Just with a lot of activity it gets worse. D: And you feel it radiate anywhere else? P: No. D: OK, and how would you describe the severity of your pain on a scale of 10 being the worst pain you've ever felt, and 1 being kind of very minimal pain. P: Uhm, I would give it maybe uh 7. D: OK. And have you had any injuries to your knee before? P: No, not that I can think of. D: No, OK. Um and have you been having any uh any weight loss recently? P: Uh no, weight gain. D: Weight gain, OK. How much weight have you gained over the last uh several months? P: Over the past six months, I'd say I've gained about 20 pounds. D: OK, have you had changes in your diet and or exercise? P: Um I guess I've been eating a little bit more, um but no changes in exercise. D: OK. Um have you been having any fevers or chills? P: No. D: OK, how about any night sweats? P: Uh, no night sweats. D: OK, um have you had any changes to your vision or hearing? P: No. D: OK. Have you had any changes to your uh sense of smell or sense of taste? P: No. D: OK, have you had a runny nose or or a sore throat? P: No. D: Have you had a cough or or any shortness of breath? P: Uh no nothing like that. D: OK, how about any uh wheezing? P: No wheezing. D: Alright, any chest pain or heart palpitations? P: No. D: Alright have you had any lightheadedness or dizziness? P: No. D: Alright, and any confusion or memory loss? P: No. D: Alright, and have you had any changes in appetite, like a loss of appetite? P: Uh no, I, if anything, had a gain in appetite. D: Alright, uh have you had any nausea or vomiting? P: No. D: How about any abdominal pain? P: No. D: Alright. Um and how about any urinary problems? P: Uh no urinary problems. D: Um any changes to your bowel habits, like diarrhea or blood in the stool? P: No. D: Alright, and have you had any rashes or skin changes or changes to your hair or nails? P: No, nothing like that. D: OK, and any other joint pains? Or have you or do you have any joint swelling? P: Uh I have some joint um swelling. It's it's not very visible, but it's it appears slightly more swollen than my left, sorry, than my right knee. D: OK. So just a little bit of swelling in your left knee, but no other joints? P: No. D: OK, um and have you been diagnosed with any medical conditions before? P: Um I have um diabetes, high blood pressure and high cholesterol. And I'm overweight. D: OK, do you take any medications for any of those conditions? P: I'm on insulin for diabetes. Um I'm on Ramipril for high blood pressure and I'm also on a statin for the cholesterol. D: OK, and do you have any allergies to medications? P: No. D: Alright, and uh, have you had any surgeries in the past? P: No. D: Alright, and um are your immunizations up to date? P: I think so. D: Alright, uh, and could you tell me a little bit about your living situation currently, like like who you're living with and whereabouts? P: Um I live with my husband in a house downtown. D: OK, um and are you working currently? P: No, I retired early. D: OK, um do you drink alcohol? P: Um I'll have a glass of wine every night. D: OK, so about 7 drinks per week? P: Yes. D: OK, and um, do you smoke cigarettes? P: Uh no, I don't. D: Alright, how about the use of any recreational drugs like cannabis or uh or anything else? P: No. D: Alright. Um and is there any, uh, like musculoskeletal or like autoimmune conditions that run in the family? P: Uhm no musculoskeletal issues uh that, I know diabetes runs in the family. D: OK, um alright, so that was everything I wanted to ask on history. So next I just wanted to do a physical exam, and just looking at the left knee, are you seeing any um swelling or redness uh on the knee? P: Um it appears slightly more swollen than my right knee. D: OK, but are you seeing any redness? P: No redness. D: OK, and are there any temperature changes? Like does the knee feel hot or or warm? P: No. D: OK, and if you um press along the uh joint line, do you feel any pain? P: Uhm yeah, it feels a little tender. D: OK. And how about pain over any of the muscles, like the thigh muscles or the hamstrings or the calf muscles? P: No, those are fine. D: OK, and then how about any pain over the patella or kneecap? P: That that's a little bit more painful. D: OK, and are you able to bend your knee uh like like flex it, straighten it? P: Uh I can do that, it's just painful when I do that. D: OK. Um and uh and then how about walking, are you um, do you have an antalgic gait or or or a limp? P: Uhm no, no, well, I guess I'm I'm putting more weight on my right knee so it might appear to some people that I do have a slight limp. D: OK. And any issues with moving your hip or your ankle? P: Uh no. D: OK, um so I think that was everything I wanted to ask and check today. Did did you have any questions? P: Um yeah, so um how do you, how can I treat my knee? D: Yeah, certainly, so it it sounds like um this could be most likely osteoarthritis um of the left knee, which uh is something that would be best treated with uh trying to stay as active as you can, but but also pacing your activities as well, so that you're not um doing so much that the next day you're in significant amount of pain, but it's important to remember that um activity will help um with keeping your knee mobile, but also um actually help with reducing pain as well, and then also uh you could use Tylenol um as a medication for it. I wouldn't recommend ibuprofen or NSAIDs on a long term basis, although you could take those if you're having like an acute flare up of your pain. Um and then also like topical agents such as um like Voltaren or something like that you could put on your knee and um if it's really bad and it keeps um continuing on, we can consider something like a steroid injection as well. P: OK, yeah, that sounds good. Thank you. D: Alright, thank you.""", # input
        "",
    )
], return_tensors = "pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 700, use_cache = True)
tokenizer.batch_decode(outputs)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


["<s> Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nBelow is an dialogue that shows conversation between doctor and patient, convert it into clinical soap format.\n\n### Input:\nD: How may I help you? P: I'm here because um I've been having some pain in my left knee for the past two months and it's not getting better. It feels stiff and um I just haven't been able to uh, you know, use it as well, as well as I was using it before um and it's just limited some of my daily activities. D: OK, um, and where, uh so it's, the pain is in your left knee. Where are you feeling this pain specifically? Is it at the front of the knee, the sides, or or the the back? Could you point to it? P: It feels like it's mostly on the front. D: OK. P: Like deep within that um kneecap. D: OK, and you said the pain started two months ago? P: Yes, well, it's always been a little lik

In [None]:
# Training Stats
wandb.finish()

0,1
Max Memory,▁▁
Memory Reserved for Training(GB),▁
Percentage of Max Memory Reserved for Training(%),▁
Reserved Memory,▁
Training Time(Seconds),▁
train/epoch,▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▇▇▇███
train/global_step,▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▇▇▇████
train/grad_norm,▆▆▆▃▃▅█▃▁▁▃▆▃▃▂▂▃▂▂▃
train/learning_rate,▂▄▅▇██▇▇▆▆▅▅▄▄▃▃▂▂▁▁
train/loss,▇█▇▆▅▄▆▅▂▄▃▅▃▅▁▃▃▁▃▂

0,1
GPU Stats,NVIDIA L4
Max Memory,21.964
Memory Reserved for Training(GB),1.529
Percentage of Max Memory Reserved for Training(%),6.961
Reserved Memory,4.547
Training Time(Seconds),390.6233
total_flos,1.402135828758528e+16
train/epoch,0.16
train/global_step,20
train/grad_norm,0.43518
