<a href="https://colab.research.google.com/github/tahirzaman303/arch-genai-task-1/blob/main/ARCH_TECH_TASK_02.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Library Installations
# Install Unsloth for fast finetuning of large language models.
!pip install unsloth
# Install additional libraries required for data handling, acceleration, quantization, and PEFT.
!pip install datasets accelerate bitsandbytes trl peft



In [2]:
# Import necessary libraries
# Import torch for tensor operations.
import torch
# Import load_dataset from datasets library for easy data loading.
from datasets import load_dataset
# Import FastLanguageModel from unsloth for efficient model loading and finetuning.
from unsloth import FastLanguageModel

ü¶• Unsloth: Will patch your computer to enable 2x faster free finetuning.
ü¶• Unsloth Zoo will now patch everything to make training faster!


In [3]:
# Load the pre-trained Llama-3-8b model
# Define the model name from Unsloth's pre-trained models.
model_name = "unsloth/llama-3-8b-bnb-4bit"

# Load the model and tokenizer using FastLanguageModel with 4-bit quantization and a specified sequence length.
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name,
    max_seq_length = 2048,
    load_in_4bit = True,
)

==((====))==  Unsloth 2026.1.4: Fast Llama patching. Transformers: 4.57.6.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.10.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.6.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.34. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


In [4]:
# Configure the model for Parameter-Efficient Finetuning (PEFT)
# Apply PEFT to the model with specified LoRA parameters (rank, alpha, dropout) and target modules.
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    lora_alpha = 16,
    lora_dropout = 0.05,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj"],
    use_gradient_checkpointing = False, # Changed to False to prevent device management conflicts.
)

Unsloth: Dropout = 0 is supported for fast patching. You are using dropout = 0.05.
Unsloth will patch all other layers, except LoRA matrices, causing a performance hit.
Unsloth 2026.1.4 patched 32 layers with 0 QKV layers, 0 O layers and 0 MLP layers.


In [5]:
# Load and format the dataset
# Load the 'medical_meadow_medqa' dataset from Hugging Face.
dataset = load_dataset("medalpaca/medical_meadow_medqa")
# Print the dataset structure.
print(dataset)
# Print the first example from the training set.
print(dataset['train'][0])
# Define a function to format the dataset into a conversational prompt structure.
def format_prompt(example):
    return {
        "text": f"""### Question:
{example['input']}

### Answer:
{example['output']}"""
    }

# Apply the formatting function to the dataset.
dataset = dataset.map(format_prompt)

DatasetDict({
    train: Dataset({
        features: ['input', 'instruction', 'output'],
        num_rows: 10178
    })
})
{'input': "Q:A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7¬∞F (36.5¬∞C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. Which of the following is the best treatment for this patient?? \n{'A': 'Ampicillin', 'B': 'Ceftriaxone', 'C': 'Ciprofloxacin', 'D': 'Doxycycline', 'E': 'Nitrofurantoin'},", 'instruction': 'Please answer with one of the option in the bracket', 'output': 'E: Nitrofurantoin'}


In [6]:
# Set up the Supervised Finetuning (SFT) Trainer
# Import SFTTrainer for supervised finetuning and TrainingArguments for configuration.
from trl import SFTTrainer
from transformers import TrainingArguments

# Initialize the SFTTrainer with the model, tokenizer, and training data.
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    # Shuffle and select a subset of the training dataset for demonstration.
    train_dataset = dataset["train"].shuffle(seed=42).select(range(2000)),
    dataset_text_field = "text",
    max_seq_length = 2048,
    # Configure training arguments such as batch size, learning rate, and logging.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 10,
        max_steps = 200,
        learning_rate = 2e-4,
        fp16 = True,
        logging_steps = 10,
        output_dir = "outputs",
        optim = "adamw_8bit",
    ),
)

In [7]:
# Train the model
# Start the training process for the configured model.
trainer.train()

The model is already on multiple devices. Skipping the move to device specified in `args`.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 2,000 | Num Epochs = 1 | Total steps = 200
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 13,631,488 of 8,043,892,736 (0.17% trained)
wandb: [wandb.login()] Loaded credentials for https://api.wandb.ai from /root/.netrc.
wandb: Currently logged in as: tahirzaman22487 (tahirzaman22487-uet-mardan-pakistan-) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin


wandb: Detected [huggingface_hub.inference, openai] in use.
wandb: Use W&B Weave for improved LLM call tracing. Install Weave with `pip install weave` then add `import weave` to the top of your script.
wandb: For more information, check out the docs at: https://weave-docs.wandb.ai/


Step,Training Loss
10,1.63
20,1.3631
30,1.3037
40,1.2854
50,1.2702
60,1.2536
70,1.2661
80,1.2539
90,1.246
100,1.2087


0,1
train/epoch,‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÇ‚ñÉ‚ñÉ‚ñÑ‚ñÑ‚ñÑ‚ñÖ‚ñÖ‚ñÖ‚ñÜ‚ñÜ‚ñá‚ñá‚ñá‚ñà‚ñà‚ñà
train/global_step,‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÇ‚ñÉ‚ñÉ‚ñÑ‚ñÑ‚ñÑ‚ñÖ‚ñÖ‚ñÖ‚ñÜ‚ñÜ‚ñá‚ñá‚ñá‚ñà‚ñà‚ñà
train/grad_norm,‚ñà‚ñÉ‚ñÇ‚ñÇ‚ñÇ‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÉ‚ñÇ‚ñÅ‚ñÇ‚ñÅ‚ñÇ‚ñÖ‚ñÅ‚ñÇ‚ñÅ‚ñÅ
train/learning_rate,‚ñà‚ñà‚ñà‚ñá‚ñá‚ñÜ‚ñÜ‚ñÜ‚ñÖ‚ñÖ‚ñÑ‚ñÑ‚ñÑ‚ñÉ‚ñÉ‚ñÉ‚ñÇ‚ñÇ‚ñÅ‚ñÅ
train/loss,‚ñà‚ñÑ‚ñÉ‚ñÉ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÅ‚ñÇ‚ñÅ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÅ‚ñÇ‚ñÅ‚ñÇ

0,1
total_flos,2.010846867259392e+16
train/epoch,0.8
train/global_step,200.0
train/grad_norm,0.37469
train/learning_rate,0.0
train/loss,1.2129
train_loss,1.26377
train_runtime,938.1863
train_samples_per_second,1.705
train_steps_per_second,0.213


TrainOutput(global_step=200, training_loss=1.2637697792053222, metrics={'train_runtime': 938.1863, 'train_samples_per_second': 1.705, 'train_steps_per_second': 0.213, 'total_flos': 2.010846867259392e+16, 'train_loss': 1.2637697792053222, 'epoch': 0.8})

In [8]:
# Save the finetuned model and tokenizer
# Save the finetuned LoRA adapter weights.
model.save_pretrained("medical_lora_adapter")
# Save the tokenizer to ensure consistency with the finetuned model.
tokenizer.save_pretrained("medical_lora_adapter")

('medical_lora_adapter/tokenizer_config.json',
 'medical_lora_adapter/special_tokens_map.json',
 'medical_lora_adapter/tokenizer.json')

In [9]:
# Perform inference with the finetuned model
# Prepare the model for inference, optimizing it for generation.
FastLanguageModel.for_inference(model)

# Define a sample prompt for generating a response.
prompt = """### Question:
What is diabetes?

### Answer:"""

# Tokenize the prompt and move it to the GPU.
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

# Clear CUDA cache to free up memory before generation.
torch.cuda.empty_cache()

# Generate a response from the model with a maximum of 50 new tokens to reduce memory usage.
outputs = model.generate(**inputs, max_new_tokens=50)

# Decode and print the generated output, skipping special tokens.
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

### Question:
What is diabetes?

### Answer: 
Diabetes is a condition where the body is unable to use sugar (glucose) properly. This causes the sugar to build up in the blood instead of being used by the cells. This can lead to serious complications. It is important to manage
