# Lightweight Fine-tuning: Gemma-2B on TinyStories

This notebook demonstrates fine-tuning Google's Gemma-2B model on the TinyStories dataset using QLoRA. This is a lightweight example that should train quickly on most GPUs.

## 1. Setup and Installation

First, check GPU availability and install dependencies.

In [None]:
# Check GPU availability
!nvidia-smi

In [None]:
# Clone the repository
!git clone https://github.com/vmm/llm-trainer.git
%cd llm-trainer

In [None]:
# Install dependencies
!pip install -r requirements.txt

In [None]:
# Fix module import issues
import os
import sys

# Check and fix the working directory
if not os.path.exists('src'):
    # If we're not in the repo root, try to find it
    if os.path.exists('llm-trainer'):
        %cd llm-trainer
    else:
        # If we can't find it, raise an error
        raise FileNotFoundError("Cannot find repository root directory with 'src' folder")

# Add the current directory to Python's path
sys.path.append('.')
print(f"Working directory: {os.getcwd()}")
print(f"Python path includes current directory: {'./' in sys.path or '.' in sys.path}")

## 2. Authenticate with Hugging Face

Authenticate to access the Gemma model from Hugging Face.

In [None]:
# Authenticate with Hugging Face
import os
from huggingface_hub import login

# Replace with your actual token
HF_TOKEN = "your_huggingface_token_here"  

# Log in to Hugging Face
login(token=HF_TOKEN)

# Set environment variable for other libraries
os.environ["HUGGING_FACE_HUB_TOKEN"] = HF_TOKEN
os.environ["HF_TOKEN"] = HF_TOKEN

## 3. Process the TinyStories Dataset

Process the dataset and prepare it for training.

In [None]:
# Process the dataset
!python -m src.data_processors.tinystories_processor --config configs/gemma_tinystories.yaml

# Verify the processed dataset
from datasets import load_from_disk

# Load the processed dataset
try:
    dataset = load_from_disk("data/TinyStories_processed")
    
    # Print info about the dataset
    print(f"Dataset splits: {dataset.keys()}")
    if 'train' in dataset:
        print(f"Train size: {len(dataset['train'])}")
    if 'validation' in dataset:
        print(f"Validation size: {len(dataset['validation'])}")
    
    # See the first example
    print("\nExample data:")
    print(dataset[list(dataset.keys())[0]][0])
except Exception as e:
    print(f"Error loading dataset: {e}")

## 4. Fine-tune with QLoRA

Fine-tune the Gemma-2B model using QLoRA.

In [None]:
# Clean up memory before training
import gc
import torch

# Clear CUDA cache
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    print("CUDA cache cleared")
    
# Run garbage collection
gc.collect()
print("Garbage collection completed")

# Show current GPU memory usage
if torch.cuda.is_available():
    print(f"GPU memory allocated: {torch.cuda.memory_allocated() / 1024**2:.2f} MB")
    print(f"GPU memory reserved: {torch.cuda.memory_reserved() / 1024**2:.2f} MB")
    
# Print current GPU usage
!nvidia-smi | grep MiB

In [None]:
# Start the training process
print("Starting fine-tuning process (this should take 1-2 hours)...")
print("Model will be saved to ./output/gemma_tinystories")

!python -m src.trainers.qlora_trainer configs/gemma_tinystories.yaml

## 5. Test the Fine-tuned Model

Try out the fine-tuned model by generating some stories.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel, PeftConfig

# Load the adapter config
config = PeftConfig.from_pretrained("./output/gemma_tinystories")

# Load base model with authentication
base_model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    load_in_8bit=True,
    device_map="auto",
    trust_remote_code=True,
    token=HF_TOKEN
)

# Load adapter model
model = PeftModel.from_pretrained(base_model, "./output/gemma_tinystories", is_trainable=False)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    config.base_model_name_or_path, 
    trust_remote_code=True,
    token=HF_TOKEN
)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Create text generation pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=200,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
)

In [None]:
# Test the model with some story starters
story_starters = [
    "Once upon a time, there was a little rabbit who",
    "The small dog was very happy because",
    "In a tiny house at the edge of the forest"
]

for starter in story_starters:
    print(f"Prompt: {starter}")
    result = pipe(starter, return_full_text=True)[0]["generated_text"]
    print(f"Generated story:\n{result}")
    print("-" * 80)

## 6. Compare with Base Model

Compare the fine-tuned model with the base model.

In [None]:
# Load the base model to compare
base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2b-it",
    device_map="auto",
    load_in_8bit=True,
    trust_remote_code=True,
    token=HF_TOKEN
)

# Create a pipeline for the base model
base_pipe = pipeline(
    "text-generation",
    model=base_model,
    tokenizer=tokenizer,
    max_new_tokens=200,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
)

In [None]:
# Compare base model and fine-tuned model
test_prompt = "Once upon a time, there was a little rabbit who"

print("BASE MODEL OUTPUT:")
base_result = base_pipe(test_prompt, return_full_text=True)[0]["generated_text"]
print(base_result)

print("\n" + "-"*80 + "\n")

print("FINE-TUNED MODEL OUTPUT:")
ft_result = pipe(test_prompt, return_full_text=True)[0]["generated_text"]
print(ft_result)

## 7. Save the Fine-tuned Model

Package the fine-tuned model for reuse.

In [None]:
# Create a zip file of the adapter
import shutil

adapter_path = "./output/gemma_tinystories/adapter_model"
if os.path.exists(adapter_path):
    shutil.make_archive("gemma_tinystories_adapter", 'zip', adapter_path)
    print("Adapter packaged as gemma_tinystories_adapter.zip")
else:
    print("Adapter model not found.")

## 8. Summary

This notebook demonstrated a lightweight fine-tuning of Gemma-2B on TinyStories data. Key highlights:

1. Successfully fine-tuned Gemma-2B using QLoRA in 1-2 hours
2. Used a small dataset subset for quick training
3. Enabled proper validation during training
4. Compared base model vs fine-tuned model outputs
5. Packaged the adapter for reuse

This approach can be adapted for other lightweight fine-tuning tasks.