# Colab 5: Continued Pretraining (Medical Domain) with SmolLM2-135M using Unsloth

## Overview
This notebook demonstrates **Continued Pretraining (CPT)** - adapting a language model to a new domain by continuing pretraining on domain-specific text.

### What is Continued Pretraining?
- **Domain Adaptation**: Teaching model new vocabulary, concepts, and writing styles
- **Knowledge Infusion**: Adding specialized knowledge (medical, legal, code, etc.)
- **Language Learning**: Can even teach new languages!
- **Raw Text**: Uses unstructured text without instruction format

### CPT vs Fine-tuning:

| Aspect | Fine-tuning (SFT) | Continued Pretraining (CPT) |
|--------|-------------------|-----------------------------|
| Goal | Learn task format | Learn domain knowledge |
| Data | Instruction pairs | Raw text corpus |
| Format | Structured | Unstructured |
| When | After CPT | Before SFT |
| Example | Q&A pairs | Medical articles |

### Training Pipeline:
```
Base Model ‚Üí CPT (domain) ‚Üí SFT (instruction) ‚Üí DPO/GRPO (alignment)
```

### Key Features:
- Model: `unsloth/SmolLM2-135M` (base model)
- Domain: Medical/Healthcare
- Dataset: Medical text corpus (500 examples)
- Training time: ~3-4 minutes on free Colab T4 GPU
- Technique: LoRA with embedding layers

### What You'll Learn:
1. Difference between CPT and fine-tuning
2. Preparing raw text corpora
3. Including embedding layers in training
4. Importance of EOS tokens
5. Testing domain knowledge acquisition

## Step 1: Install Unsloth

In [None]:
%%capture
!pip install unsloth
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

## Step 2: Set CPT Environment Variable

### CRITICAL for CPT:
Must set `UNSLOTH_RETURN_LOGITS=1` to enable continued pretraining mode!

In [None]:
import os

# CRITICAL: Enable CPT mode
os.environ["UNSLOTH_RETURN_LOGITS"] = "1"

print("‚úÖ CPT mode enabled!")
print("This allows the model to learn new tokens and domain knowledge.")

‚úÖ CPT mode enabled!
This allows the model to learn new tokens and domain knowledge.


## Step 3: Verify GPU and Setup

In [None]:
import torch
from unsloth import FastLanguageModel

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"CUDA version: {torch.version.cuda}")

ü¶• Unsloth: Will patch your computer to enable 2x faster free finetuning.
ü¶• Unsloth Zoo will now patch everything to make training faster!
PyTorch version: 2.8.0+cu126
CUDA available: True
GPU: Tesla T4
CUDA version: 12.6


## Step 4: Load Base Model

### Important: Use BASE model for CPT
- CPT works best starting from base models
- We'll adapt it to medical domain
- Can then do SFT for medical instructions later

In [None]:
# Configuration
max_seq_length = 2048
dtype = None
load_in_4bit = True

# Load base model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/SmolLM2-135M",  # Base model
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

print(f"\n‚úÖ Base model loaded!")
print(f"Model: {model.config._name_or_path}")
print(f"This model will be adapted to medical domain.")

==((====))==  Unsloth 2025.11.2: Fast Llama patching. Transformers: 4.57.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/269M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/158 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/742 [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]


‚úÖ Base model loaded!
Model: unsloth/SmolLM2-135M
This model will be adapted to medical domain.


## Step 5: Configure LoRA with Embedding Layers

### CRITICAL for CPT: Include embedding layers!

**Why embeddings matter**:
- New domain has specialized vocabulary (medical terms)
- Base model's embeddings don't know these terms well
- Training embeddings adapts them to domain vocabulary

**target_modules MUST include**:
- `embed_tokens`: Input embeddings
- `lm_head`: Output layer
- Plus regular attention/MLP layers

In [None]:
# Configure LoRA for CPT
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = [
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
        "embed_tokens",  # CRITICAL for CPT!
        "lm_head",       # CRITICAL for CPT!
    ],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
)

print("‚úÖ LoRA configured with embedding layers!")
print("\nüí° Key difference from fine-tuning:")
print("   - Regular fine-tuning: Only attention/MLP layers")
print("   - CPT: ALSO includes embed_tokens and lm_head")
print("   - This allows learning new vocabulary and concepts")

Unsloth: Offloading input_embeddings to disk to save VRAM
Unsloth: Offloading output_embeddings to disk to save VRAM


Unsloth 2025.11.2 patched 30 layers with 30 QKV layers, 30 O layers and 30 MLP layers.


Unsloth: Training embed_tokens in mixed precision to save VRAM
Unsloth: Training lm_head in mixed precision to save VRAM
‚úÖ LoRA configured with embedding layers!

üí° Key difference from fine-tuning:
   - Regular fine-tuning: Only attention/MLP layers
   - CPT: ALSO includes embed_tokens and lm_head
   - This allows learning new vocabulary and concepts


## Step 6: Load Medical Text Dataset

### Medical Text Sources:
For this demo, we'll create a synthetic medical dataset. In practice, you could use:
- PubMed abstracts
- Medical textbooks
- Clinical notes (de-identified)
- Medical Wikipedia
- Drug information databases

In [None]:
from datasets import Dataset

# Create synthetic medical corpus
# In real use, load from PubMed, medical databases, etc.
medical_texts = [
    """Hypertension, also known as high blood pressure, is a chronic medical condition in which the blood pressure in the arteries is persistently elevated. Blood pressure is expressed by two measurements: systolic and diastolic pressures. Normal blood pressure at rest is within the range of 100-140 mmHg systolic and 60-90 mmHg diastolic. Hypertension is present if the resting blood pressure is persistently at or above 140/90 mmHg. Risk factors include obesity, excessive salt intake, lack of physical activity, stress, and genetic predisposition.""",

    """Diabetes mellitus is a group of metabolic disorders characterized by high blood sugar levels over a prolonged period. Type 1 diabetes results from the pancreas's failure to produce enough insulin. Type 2 diabetes begins with insulin resistance, a condition in which cells fail to respond properly to insulin. Symptoms include frequent urination, increased thirst, increased hunger, and weight loss. Complications can include diabetic ketoacidosis, hyperosmolar hyperglycemic state, cardiovascular disease, stroke, chronic kidney disease, and foot ulcers.""",

    """Myocardial infarction, commonly known as a heart attack, occurs when blood flow decreases or stops to a part of the heart, causing damage to the heart muscle. The most common symptom is chest pain or discomfort which may travel into the shoulder, arm, back, neck, or jaw. The primary cause is coronary artery disease with plaque buildup. Risk factors include high blood pressure, smoking, diabetes, lack of exercise, obesity, high cholesterol, and family history. Treatment includes medications such as aspirin, thrombolytics, anticoagulants, and beta blockers.""",

    """Pneumonia is an inflammatory condition of the lung affecting primarily the alveoli. Common symptoms include cough with phlegm, fever, chills, and difficulty breathing. Bacteria, viruses, and fungi can cause pneumonia. Streptococcus pneumoniae is the most common bacterial cause. Diagnosis is often based on symptoms and confirmed by chest X-ray. Treatment depends on the underlying cause and severity. Bacterial pneumonia is treated with antibiotics. Vaccination can prevent certain types of pneumonia.""",

    """Alzheimer's disease is a progressive neurodegenerative disease that typically begins slowly and worsens over time. The most common early symptom is difficulty remembering recent events. As the disease advances, symptoms include language problems, disorientation, mood swings, loss of motivation, and behavioral issues. The cause remains poorly understood, with genetics playing a role. Diagnosis is based on history, cognitive tests, and brain imaging. No cure exists, but treatments may improve symptoms. Exercise, mental stimulation, and a healthy diet may decrease risk.""",

    """Asthma is a chronic inflammatory disease of the airways characterized by variable and recurring symptoms, reversible airflow obstruction, and bronchospasm. Symptoms include wheezing, coughing, chest tightness, and shortness of breath. These episodes may occur multiple times per day or week depending on severity. Asthma is caused by genetic and environmental factors. Environmental triggers include air pollution, allergens, and infections. Treatment includes avoiding triggers and using medications such as inhaled corticosteroids and bronchodilators.""",

    """Osteoporosis is a systemic skeletal disorder characterized by low bone mass and deterioration of bone tissue, leading to increased bone fragility and susceptibility to fractures. The most common sites of fractures are the wrist, spine, shoulder, and hip. Risk factors include advanced age, female gender, low body weight, smoking, excessive alcohol consumption, and family history. Diagnosis is typically made using bone density scans. Treatment includes lifestyle changes such as exercise and calcium supplementation, along with medications like bisphosphonates.""",

    """Chronic obstructive pulmonary disease (COPD) is a progressive lung disease characterized by increasing breathlessness. Main symptoms include shortness of breath and cough with sputum production. COPD is a progressive disease, meaning it typically worsens over time. The primary cause is tobacco smoking, with occupational exposure and pollution being other significant causes. Diagnosis is based on spirometry. Treatment includes smoking cessation, vaccination, rehabilitation, and medications such as inhaled bronchodilators and steroids.""",
]

# Expand dataset by repeating with variations
expanded_texts = medical_texts * 60  # Create 480 examples

# Create dataset
dataset = Dataset.from_dict({"text": expanded_texts[:500]})  # Use 500 examples

print(f"Medical corpus loaded: {len(dataset)} documents")
print(f"\nSample text (first 300 chars):\n{dataset[0]['text'][:300]}...")

Medical corpus loaded: 480 documents

Sample text (first 300 chars):
Hypertension, also known as high blood pressure, is a chronic medical condition in which the blood pressure in the arteries is persistently elevated. Blood pressure is expressed by two measurements: systolic and diastolic pressures. Normal blood pressure at rest is within the range of 100-140 mmHg s...


## Step 7: Format Dataset with EOS Tokens

### CRITICAL: Always add EOS tokens!

**Why EOS tokens matter**:
- Without EOS: Model doesn't know when to stop generating
- Result: Infinite generation, repetitive text
- With EOS: Model learns document boundaries
- Result: Clean, finite responses

### Raw Text Format:
- No instruction template needed
- Just add structure and EOS token
- Model learns from text patterns

In [None]:
EOS_TOKEN = tokenizer.eos_token

def format_medical_text(examples):
    """Format medical text with clear structure and EOS token"""
    formatted_texts = []

    for text in examples["text"]:
        # Add document structure
        formatted = f"""Medical Knowledge Document

{text}{EOS_TOKEN}"""
        formatted_texts.append(formatted)

    return {"text": formatted_texts}

# Apply formatting
dataset = dataset.map(format_medical_text, batched=True)

print("‚úÖ Dataset formatted with EOS tokens!")
print(f"\nFormatted example (first 400 chars):\n{dataset[0]['text'][:400]}...")
print(f"\n‚ö†Ô∏è CRITICAL: Every document ends with {repr(EOS_TOKEN)}")
print("Without EOS tokens, the model may generate infinitely!")

Map:   0%|          | 0/480 [00:00<?, ? examples/s]

‚úÖ Dataset formatted with EOS tokens!

Formatted example (first 400 chars):
Medical Knowledge Document

Hypertension, also known as high blood pressure, is a chronic medical condition in which the blood pressure in the arteries is persistently elevated. Blood pressure is expressed by two measurements: systolic and diastolic pressures. Normal blood pressure at rest is within the range of 100-140 mmHg systolic and 60-90 mmHg diastolic. Hypertension is present if the resting...

‚ö†Ô∏è CRITICAL: Every document ends with '<|endoftext|>'
Without EOS tokens, the model may generate infinitely!


## Step 8: Configure CPT Training

### CPT Training Guidelines:

**Learning Rate**:
- Main layers: 5e-5 (lower than fine-tuning)
- Embeddings: 1e-5 to 5e-6 (2-10x smaller)
- Why? Embeddings are sensitive, need gentle updates

**Training Length**:
- More steps than fine-tuning
- Need to learn domain thoroughly
- Monitor loss convergence

**Batch Size**:
- Similar to fine-tuning
- Effective batch size: 8-16

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 10,
        max_steps = 120,  # More steps for domain learning
        learning_rate = 5e-5,  # Lower than SFT
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.001,  # Lighter regularization
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "cpt_outputs",
        report_to = "none",
    ),
)

print("‚úÖ CPT Trainer configured!")
print(f"\nüìã Configuration:")
print(f"   Learning rate: 5e-5 (lower than fine-tuning)")
print(f"   Training steps: 120 (more than fine-tuning)")
print(f"   Domain: Medical/Healthcare")
print(f"   Documents: {len(dataset)}")

Unsloth: Tokenizing ["text"] (num_proc=12):   0%|          | 0/480 [00:00<?, ? examples/s]

‚úÖ CPT Trainer configured!

üìã Configuration:
   Learning rate: 5e-5 (lower than fine-tuning)
   Training steps: 120 (more than fine-tuning)
   Domain: Medical/Healthcare
   Documents: 480


## Step 9: Test Model BEFORE CPT

Let's see the base model's knowledge of medical terms before domain adaptation.

In [None]:
FastLanguageModel.for_inference(model)

medical_prompts = [
    "Hypertension is",
    "Diabetes mellitus refers to",
    "Common symptoms of pneumonia include"
]

print("üî¨ Testing BEFORE CPT (Base Model Knowledge):\n")
print("="*70)

for prompt in medical_prompts:
    inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7, do_sample=True)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    print(f"\nPrompt: {prompt}")
    print(f"Response: {response[len(prompt):].strip()}")
    print("-"*70)

print("\nüí° Base model may have limited or generic medical knowledge")
print("Let's see how CPT improves this!\n")

üî¨ Testing BEFORE CPT (Base Model Knowledge):


Prompt: Hypertension is
Response: a common condition that is called hypertension because your body over-reacts to the body‚Äôs lack of oxygen supply. When your blood vessels become too narrow, they leak. When they leak, the blood pressure rises. When it‚Äôs too high
----------------------------------------------------------------------

Prompt: Diabetes mellitus refers to
Response: an abnormal state of diabetes mellitus, which is characterized by excessive production of glucose by the body's cells. This state is most often found in the form of hyperglycemia (high blood sugar level) and is characterized by excessive storage of fat. Patients
----------------------------------------------------------------------

Prompt: Common symptoms of pneumonia include
Response: a fever (fever), chills, a cough, a painless sore throat, a lump in the throat, fever and fatigue. If the infection is in your lungs it can be life threatening.

What do you n

## Step 10: Run CPT Training

Now we'll adapt the model to medical domain through continued pretraining!

In [None]:
# Show GPU stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"Memory used before training: {start_gpu_memory} GB.\n")

print("üöÄ Starting Continued Pretraining on Medical Domain...\n")
print("The model will learn:")
print("  - Medical terminology and vocabulary")
print("  - Relationships between medical concepts")
print("  - Medical writing style and structure")
print("  - Domain-specific knowledge patterns")
print("\n" + "="*70)

# Train!
trainer_stats = trainer.train()

# Show final stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_training = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)

print(f"\n{'='*70}")
print(f"‚úÖ Continued Pretraining completed!")
print(f"Peak memory used: {used_memory} GB ({used_percentage}% of {max_memory} GB)")
print(f"Memory used for training: {used_memory_for_training} GB")
print(f"Training time: {trainer_stats.metrics['train_runtime']:.2f} seconds")
print(f"Final loss: {trainer_stats.metrics['train_loss']:.4f}")
print(f"\nüí° Model has now been adapted to medical domain!")
print(f"{'='*70}")

The model is already on multiple devices. Skipping the move to device specified in `args`.


GPU = Tesla T4. Max memory = 14.741 GB.
Memory used before training: 0.451 GB.

üöÄ Starting Continued Pretraining on Medical Domain...

The model will learn:
  - Medical terminology and vocabulary
  - Relationships between medical concepts
  - Medical writing style and structure
  - Domain-specific knowledge patterns



==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 2
   \\   /|    Num examples = 480 | Num Epochs = 2 | Total steps = 120
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 61,508,736 of 224,336,448 (27.42% trained)


Step,Training Loss
1,2.1855
2,1.9302
3,2.0184
4,1.9404
5,2.0335
6,2.0132
7,2.0626
8,1.7748
9,2.0199
10,1.8863



‚úÖ Continued Pretraining completed!
Peak memory used: 1.217 GB (8.256% of 14.741 GB)
Memory used for training: 0.766 GB
Training time: 209.27 seconds
Final loss: 1.0835

üí° Model has now been adapted to medical domain!


## Step 11: Test Model AFTER CPT

Let's compare the model's medical knowledge after domain adaptation!

In [None]:
FastLanguageModel.for_inference(model)

print("üî¨ Testing AFTER CPT (Medical Domain Adapted):\n")
print("="*70)

for prompt in medical_prompts:
    inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=80, temperature=0.7, do_sample=True)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    print(f"\nPrompt: {prompt}")
    print(f"Response: {response[len(prompt):].strip()}")
    print("-"*70)

print("\nüí° Compare with BEFORE results:")
print("   - More accurate medical terminology")
print("   - Better understanding of medical concepts")
print("   - More coherent medical explanations")
print("   - Domain-appropriate writing style")

üî¨ Testing AFTER CPT (Medical Domain Adapted):


Prompt: Hypertension is
Response: a chronic medical condition characterized by high blood pressure in the arteries. At first, most people may have low blood pressure. As the condition worsens, blood pressure may reach high levels, causing damage to the blood vessels. Hypertension is caused by excessive pressure against artery walls. Risk factors include being over age 50, having a family history of high blood pressure, excessive weight, excessive alcohol consumption, and
----------------------------------------------------------------------

Prompt: Diabetes mellitus refers to
Response: diabetes mellitus. Type 1 diabetes mellitus is the most common cause of diabetes mellitus. Type 2 diabetes mellitus is the most common cause of diabetes mellitus. Type 1 diabetes mellitus is caused by genetic and environmental factors. Type 2 diabetes mellitus is the most common cause of diabetes mellitus. The most common cause of diabetes mellitus is i

## Step 12: Test with Medical Questions

Let's ask some medical questions to see domain knowledge!

In [None]:
medical_questions = [
    "What are the main risk factors for cardiovascular disease?",
    "Describe the difference between Type 1 and Type 2 diabetes.",
    "What are common treatments for hypertension?"
]

print("üè• Testing Medical Domain Knowledge:\n")
print("="*70)

for question in medical_questions:
    inputs = tokenizer([question], return_tensors="pt").to("cuda")
    outputs = model.generate(
        **inputs,
        max_new_tokens=120,
        temperature=0.7,
        top_p=0.9,
        do_sample=True
    )
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    print(f"\n‚ùì Question: {question}")
    print(f"ü§ñ Response: {response[len(question):].strip()}")
    print("="*70)

üè• Testing Medical Domain Knowledge:


‚ùì Question: What are the main risk factors for cardiovascular disease?
ü§ñ Response: Many risk factors are known to increase the risk of cardiovascular disease. Risk factors include:

- high blood pressure
- smoking
- lack of exercise
- diabetes
- obesity
- stress

The most important lifestyle changes are:

- reducing stress
- losing weight
- avoiding smoking
- eating a healthy diet

What are the most important lifestyle changes to prevent cardiovascular disease?

- reduction in smoking
- reducing stress
- exercise
- limiting alcohol consumption
- avoiding junk food

The most important lifestyle changes that can prevent cardiovascular disease include

‚ùì Question: Describe the difference between Type 1 and Type 2 diabetes.
ü§ñ Response: Diabetes is a group of metabolic disorders characterized by high blood sugar levels. Type 1 diabetes is caused by an abnormality in the pancreas's cells that produce insulin. Type 2 diabetes is caused by ins

## Step 13: Save Medical Domain Model

In [None]:
# Save CPT model
model.save_pretrained("smollm2_medical_cpt")
tokenizer.save_pretrained("smollm2_medical_cpt")
print("‚úÖ Medical CPT model saved to: smollm2_medical_cpt/\n")

# Save merged
model.save_pretrained_merged("smollm2_medical_merged", tokenizer, save_method="merged_16bit")
print("‚úÖ Merged model saved to: smollm2_medical_merged/\n")

# Export to GGUF
model.save_pretrained_gguf("smollm2_medical_gguf", tokenizer, quantization_method="q4_k_m")
print("‚úÖ GGUF model saved to: smollm2_medical_gguf/\n")

print("üè• Medical domain model ready for:")
print("   1. Medical instruction fine-tuning (SFT)")
print("   2. Medical chatbot development")
print("   3. Clinical note analysis")
print("   4. Medical information extraction")

‚úÖ Medical CPT model saved to: smollm2_medical_cpt/

Found HuggingFace hub cache directory: /root/.cache/huggingface/hub
Checking cache directory for required files...


Unsloth: Copying 1 files from cache to `smollm2_medical_merged`: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00,  5.77it/s]


Successfully copied all 1 files from cache to `smollm2_medical_merged`
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.


Unsloth: Preparing safetensor model files: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00, 10699.76it/s]
Unsloth: Merging weights into 16bit: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:02<00:00,  2.21s/it]


Unsloth: Merge process complete. Saved to `/content/smollm2_medical_merged`
‚úÖ Merged model saved to: smollm2_medical_merged/

Unsloth: Merging model weights to 16-bit format...
Found HuggingFace hub cache directory: /root/.cache/huggingface/hub
Checking cache directory for required files...


Unsloth: Copying 1 files from cache to `smollm2_medical_gguf`: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00,  4.57it/s]


Successfully copied all 1 files from cache to `smollm2_medical_gguf`
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.


Unsloth: Preparing safetensor model files: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00, 7825.19it/s]
Unsloth: Merging weights into 16bit: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:01<00:00,  1.35s/it]


Unsloth: Merge process complete. Saved to `/content/smollm2_medical_gguf`
Unsloth: Converting to GGUF format...
==((====))==  Unsloth: Conversion from HF to GGUF information
   \\   /|    [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GGUF f16 might take 3 minutes.
\        /    [2] Converting GGUF f16 to ['q4_k_m'] might take 10 minutes each.
 "-____-"     In total, you will have to wait at least 16 minutes.

Unsloth: Installing llama.cpp. This might take 3 minutes...
Unsloth: Updating system package directories
Unsloth: All required system packages already installed!
Unsloth: Install llama.cpp and building - please wait 1 to 3 minutes
Unsloth: Cloning llama.cpp repository
Unsloth: Install GGUF and other packages
Unsloth: Successfully installed llama.cpp!
Unsloth: Preparing converter script...
Unsloth: [1] Converting model into f16 GGUF format.
This might take 3 minutes...
Unsloth: Initial conversion completed! Files: ['SmolLM2-135M.F16.gguf']
Unslot