#Build your MedBot
© 2023, Zaka AI, Inc. All Rights Reserved.

---
The goal of this colab is to get you more familiar with LLM fine-tuning by creating a simple QA LLM that can answer medical questions. By the end of it you will be able to customize this LLM with any dataset.

**Just to give you a heads up:** We won't be having a model performing like ChatGPT or Bard, but at least we will have an idea about how we can create our own smaller versions of such powerful LLMs.  

## Importing and Installing Libraries/Packages
We will start by installing our necessary packages.

**bitsandbytes**: This package will allow us to run 4bit quantization on our model

**transformers**: This Hugging Face package will allow us to load state-of-the-art models easily into our notebook

**peft**: This package allows us to add PEFT techniques easily to our model, such as LoRA

**accelerate**: Accelerate is a handy package that allows us to run boiler plate code with a few lines of code

**datasets**: This package allows us to easily import datasets from the Hugging Face platform to be directly used

Install (tiny/CPU path)

In [1]:
# Remove conflicting packages first
!pip uninstall -y torchvision torchaudio fastai

# Install torch 2.6.0 and NLP stack (no bitsandbytes needed)
!pip install -q "torch==2.6.0" transformers peft accelerate datasets

Found existing installation: torchvision 0.21.0+cu124
Uninstalling torchvision-0.21.0+cu124:
  Successfully uninstalled torchvision-0.21.0+cu124
Found existing installation: torchaudio 2.6.0+cu124
Uninstalling torchaudio-2.6.0+cu124:
  Successfully uninstalled torchaudio-2.6.0+cu124
Found existing installation: fastai 2.7.19
Uninstalling fastai-2.7.19:
  Successfully uninstalled fastai-2.7.19
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m766.7/766.7 MB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m88.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m27.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m59.0 MB/s[0m eta [36m0:00:00[0m
[2K 

In [2]:
import os, json, torch, transformers
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, DataCollatorForLanguageModeling
from peft import LoraConfig, get_peft_model

## Loading our model

Load tiny base model

In [None]:
#hf_model = "EleutherAI/gpt-neox-20b"

In [3]:
TINY_ID = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
print("CUDA available?", torch.cuda.is_available())

CUDA available? True


In [4]:
tokenizer = AutoTokenizer.from_pretrained(TINY_ID)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    TINY_ID,
    torch_dtype=torch.float32,
    device_map="auto"   # GPU if present, else CPU
)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/608 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Add LoRA (lightweight fine-tuning)

In [5]:
def print_trainable_parameters(m):
    t = sum(p.numel() for p in m.parameters() if p.requires_grad)
    a = sum(p.numel() for p in m.parameters())
    print(f"Trainable: {t} || All: {a} || Trainable %: {100*t/a:.4f}%")

lora_cfg = LoraConfig(
    r=8, lora_alpha=32,
    target_modules=["q_proj","k_proj","v_proj","o_proj"],
    lora_dropout=0.05, bias="none",
    task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_cfg)
print_trainable_parameters(model)


Trainable: 2252800 || All: 1102301184 || Trainable %: 0.2044%


Dataset & tokenization


In [6]:
ds = load_dataset("medalpaca/medical_meadow_wikidoc_patient_information")

def tok(batch):
    return tokenizer(batch["output"], padding=False, truncation=True, max_length=512)

tok_ds = ds.map(tok, batched=True, remove_columns=[c for c in ds["train"].column_names if c != "output"])
data_collator = DataCollatorForLanguageModeling(tokenizer, mlm=False)


README.md: 0.00B [00:00, ?B/s]

medical_meadow_wikidoc_patient_info.json:   0%|          | 0.00/3.49M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/5942 [00:00<?, ? examples/s]

Map:   0%|          | 0/5942 [00:00<?, ? examples/s]

Training

In [8]:
training_args = transformers.TrainingArguments(
    per_device_train_batch_size=1,
    gradient_accumulation_steps=2,
    warmup_steps=1,
    max_steps=5,
    learning_rate=2e-4,
    fp16=torch.cuda.is_available(),
    logging_steps=1,
    output_dir="outputs",
    optim="adamw_torch",
    report_to="none"
)

trainer = transformers.Trainer(
    model=model,
    train_dataset=tok_ds["train"],
    args=training_args,
    data_collator=data_collator
)

model.config.use_cache = False
trainer.train()


Step,Training Loss
1,2.0886
2,2.6455
3,1.7736
4,1.3846
5,2.5245


TrainOutput(global_step=5, training_loss=2.0833667278289796, metrics={'train_runtime': 4.2325, 'train_samples_per_second': 2.363, 'train_steps_per_second': 1.181, 'total_flos': 9809872171008.0, 'train_loss': 2.0833667278289796, 'epoch': 0.001682935038707506})

Quick generation

In [17]:
prompt = "What are the common causes of allergies?"

if hasattr(tokenizer, "apply_chat_template"):
    formatted_prompt = tokenizer.apply_chat_template(
        [
            {"role": "system", "content": "You are a helpful medical assistant."},
            {"role": "user", "content": prompt}
        ],
        tokenize=False,
        add_generation_prompt=True
    )
else:
    formatted_prompt = f"<|system|>\nYou are a helpful medical assistant.\n<|user|>\n{prompt}\n<|assistant|>"

inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)

with torch.inference_mode():
    out = model.generate(
        **inputs,
        min_new_tokens=120,
        max_new_tokens=300,
        temperature=0.5,
        top_p=0.9,
        repetition_penalty=1.15,  # discourages repeating phrases
        do_sample=True,
        eos_token_id=tokenizer.eos_token_id
    )

# Slice off the prompt
gen_tokens = out[0, inputs["input_ids"].shape[-1]:]
generated_text = tokenizer.decode(gen_tokens, skip_special_tokens=True).strip()
# Trim to last full stop or newline if the end is partial
for stop in [". ", "\n\n", "\n"]:
    if stop in generated_text[-60:]:
        last = generated_text.rfind(stop)
        if last != -1:
            text = generated_text[: last + len(stop.strip())]
            break

print("Model output:", generated_text if generated_text else "[No output generated]")


Model output: There are several common causes of allergies, including:

1. IgE antibodies: These are produced by the immune system and bind to specific types of allergen (such as pollen or dust mite) in the skin, airways, or gut. This triggers an inflammatory response that leads to symptoms such as hives, itching, swelling, and difficulty breathing.

2. IgG antibodies: These are produced by the body's natural defense system and neutralize the effects of IgE antibodies. However, if too many IgG antibodies are present, they can trigger allergic reactions in some people.

3. Environmental factors: Allergens like pollen, pet dander, and dust mites are often found in homes, workplaces, and outdoor environments. Exposure to these substances can cause allergic responses.

4. Genetics: Some individuals may be predisposed to developing allergies due to genetic mutations or environmental factors.

5. Immune system dysfunction: Allergies can also be caused by immune system dysfunction, which can 

EXPORT (LoRA + tokenizer + metadata) and ZIP

In [18]:
import pathlib, shutil
from peft import PeftModel

export_dir = "/content/medbot_model"
pathlib.Path(export_dir).mkdir(parents=True, exist_ok=True)

# Save LoRA adapter + tokenizer
peft_model = trainer.model
peft_model.save_pretrained(export_dir)
tokenizer.save_pretrained(export_dir)

# Generation defaults
gen_cfg = {"max_new_tokens": 128, "temperature": 0.7, "top_p": 0.9}
with open(os.path.join(export_dir, "generation_config.json"), "w") as f:
    json.dump(gen_cfg, f, indent=2)

# Base model reference (tiny CPU model)
with open(os.path.join(export_dir, "BASE_MODEL.txt"), "w") as f:
    f.write(TINY_ID)

# Zip for download
zip_path = "/content/medbot_model.zip"
if os.path.exists(zip_path): os.remove(zip_path)
shutil.make_archive("/content/medbot_model", "zip", export_dir)
print("Export complete:", zip_path)


Export complete: /content/medbot_model.zip
