# Medical Finetuning With QLoRA Using Unsloth in Colab
This notebook demonstrates how to fine-tune a medical language model using QLoRA and Unsloth in Google Colab. We will use a domain-specific medical dataset, configure Unsloth for 4-bit quantized low-rank adaptation, and run the full training workflow.

## 1. Install Unsloth and Dependencies
Install Unsloth and the required libraries for training and dataset handling.

In [None]:
!pip install unsloth[torch] --upgrade
!pip install datasets

## 2. Load a Medical Dataset
Load a domain-specific medical dataset. You can use a public dataset from HuggingFace or upload your own clinical Q&A pairs.

In [None]:
from datasets import load_dataset

# Example: Replace with your own dataset if needed
# This uses a public medical QA dataset
medical_dataset = load_dataset("medal/medical-qa", split="train")
print(medical_dataset[0])

## 3. Load the Base Model with Unsloth
Load a base model such as Llama 3 or DeepSeek-R1 in 4-bit mode using Unsloth.

In [None]:
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit",  # or "unsloth/deepseek-llm-7b-bnb-4bit"
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

## 4. Prepare Data for Training
Format your dataset as instruction/response pairs for supervised fine-tuning.

In [None]:
def format_example(example):
    return {
        "instruction": example["question"],
        "output": example["answer"]
    }

train_data = [format_example(x) for x in medical_dataset]

## 5. Tokenize Data
Tokenize the formatted data for model training.

In [None]:
from unsloth import get_supervised_data_module

data_module = get_supervised_data_module(
    tokenizer = tokenizer,
    train_dataset = train_data,
    max_seq_length = 2048,
)

## 6. Configure QLoRA Training
Set up the QLoRA training configuration, including epochs, batch size, and learning rate.

In [None]:
from unsloth import FastTrainer

trainer = FastTrainer(
    model = model,
    train_dataset = data_module["train_dataset"],
    eval_dataset = None,
    tokenizer = tokenizer,
    epochs = 2,  # Adjust as needed
    batch_size = 2,
    gradient_accumulation_steps = 8,
    lr = 2e-4,
    save_steps = 100,
    output_dir = "qlora-medical-adapter",
)

## 7. Train the Model
Start the training process. Monitor GPU memory and performance using Colab's resource panel.

In [None]:
trainer.train()

## 8. Save the Fine-Tuned Adapter
Save the trained adapter for later use or deployment.

In [None]:
trainer.save_model("qlora-medical-adapter")

## 9. Test the Fine-Tuned Model
Test the model's response to new medical queries.

In [None]:
prompt = "What are the symptoms of diabetes?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

---
**Tips:**
- Adjust batch size and gradient accumulation for Colab memory limits.
- Use `!nvidia-smi` in a code cell to monitor GPU memory.
- For custom datasets, ensure your data is in instruction/output format.