<a href="https://colab.research.google.com/github/zmuhls/jeopardy-lm/blob/main/jeopardy_finetune.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Jeopardy LM: TinyLlama Fine-tuning with LoRA
# Google Colab Implementation

This notebook demonstrates how to:
1. Download Jeopardy questions from a public dataset
2. Fine-tune TinyLlama-1.1B using PEFT/LoRA
3. Create a simple API server to serve the model
4. Evaluate the model's performance on Jeopardy questions

This implementation is designed for Google Colab with GPU acceleration.

In [1]:
# @title Check for GPU and Colab Environment
# @markdown Ensure we're running in Colab with GPU acceleration

import os
import sys

# Check if running in Colab
IN_COLAB = 'google.colab' in sys.modules
print(f"Running in Google Colab: {IN_COLAB}")

if not IN_COLAB:
    print("Warning: This notebook is optimized for Google Colab. Some features may not work elsewhere.")

Running in Google Colab: True


In [None]:
# @title Install required packages
# @markdown Run this cell to install all necessary dependencies

!pip install torch==2.0.1 transformers==4.30.2 peft==0.4.0 bitsandbytes==0.40.2 accelerate==0.20.3
!pip install datasets==2.13.1 tqdm pandas flask ipywidgets matplotlib tensorboard

# Check for GPU availability
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU device: {torch.cuda.get_device_name(0)}")
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print("WARNING: No GPU detected. Training will be slow.")
    print("Go to Runtime > Change runtime type > Hardware accelerator > GPU to enable GPU.")

In [None]:
# @title Mount Google Drive (Optional)
# @markdown Run this to save/load data and models from Google Drive

from google.colab import drive
use_drive = False  # Set to True to use Google Drive for storage

if use_drive:
    drive.mount('/content/drive')
    output_dir = "/content/drive/MyDrive/jeopardy-lm"
else:
    # Create directory for output in Colab local storage
    output_dir = "/content/jeopardy-lm"

import os
os.makedirs(output_dir, exist_ok=True)
print(f"Model and data will be saved to: {output_dir}")

In [None]:
# @title Download Jeopardy Dataset
# @markdown This cell downloads a Jeopardy dataset from Hugging Face Datasets

import pandas as pd
from datasets import load_dataset

print("Downloading Jeopardy dataset from Hugging Face...")
try:
    jeopardy_dataset = load_dataset("jeopardy", split="train")
    print(f"Downloaded {len(jeopardy_dataset)} Jeopardy questions")
except Exception as e:
    print(f"Error downloading dataset: {e}")
    print("Attempting alternative download method...")
    # Alternative: direct download from GitHub
    !wget -q https://raw.githubusercontent.com/dw/scratch/master/jeopardy/j.json -O jeopardy.json
    import json
    with open('jeopardy.json', 'r') as f:
        data = json.load(f)
    jeopardy_dataset = pd.DataFrame(data)
    print(f"Downloaded {len(jeopardy_dataset)} Jeopardy questions from alternative source")

# Convert to DataFrame and save to CSV
if not isinstance(jeopardy_dataset, pd.DataFrame):
    df = pd.DataFrame(jeopardy_dataset)
else:
    df = jeopardy_dataset

jeopardy_csv_path = os.path.join(output_dir, "jeopardy_data.csv")
df.to_csv(jeopardy_csv_path, index=False)

print(f"Saved data to {jeopardy_csv_path}")
print("Sample data:")
df.head()

In [None]:
# @title Prepare Jeopardy Dataset
# @markdown Convert Jeopardy dataset to instruction tuning format

import json

def prepare_jeopardy_dataset(jeopardy_df, output_jsonl_path, split_ratio=0.9, max_samples=20000):
    """
    Convert Jeopardy DataFrame to instruction-tuning format
    """
    print(f"Preparing Jeopardy data for fine-tuning")

    # Format the data as instruction-following examples
    formatted_data = []

    for _, row in jeopardy_df.iterrows():
        category = row['category']
        question = row['question']
        answer = row['answer']

        # Format as prompt-completion pair
        prompt = f"Category: {category}\nClue: {question}\nAnswer in the form of a question:"

        # Check if answer already has "what is" format
        answer_lower = answer.lower()
        if answer_lower.startswith("what is") or answer_lower.startswith("who is"):
            completion = answer
        else:
            # Decide between "What is" and "Who is" based on simple heuristics
            if any(keyword in answer_lower for keyword in ['person', 'actor', 'actress', 'director', 'author', 'president', 'king', 'queen']):
                completion = f"Who is {answer}?"
            else:
                completion = f"What is {answer}?"

        # Format for instruction tuning
        formatted_data.append({
            "prompt": prompt,
            "completion": completion
        })

    # Take a subset of the data to speed up training (adjust as needed)
    if len(formatted_data) > max_samples:
        print(f"Limiting dataset to {max_samples} samples for faster training")
        formatted_data = formatted_data[:max_samples]

    # Split into train and validation
    train_size = int(len(formatted_data) * split_ratio)
    train_data = formatted_data[:train_size]  # Fixed missing slice index
    val_data = formatted_data[train_size:]

    # Save to JSONL files
    os.makedirs(os.path.dirname(output_jsonl_path), exist_ok=True)

    train_path = output_jsonl_path.replace('.jsonl', '_train.jsonl')
    val_path = output_jsonl_path.replace('.jsonl', '_val.jsonl')

    with open(train_path, 'w') as f:
        for item in train_data:
            f.write(json.dumps(item) + '\n')

    with open(val_path, 'w') as f:
        for item in val_data:
            f.write(json.dumps(item) + '\n')

    print(f"Saved {len(train_data)} training examples to {train_path}")
    print(f"Saved {len(val_data)} validation examples to {val_path}")

    return train_path, val_path

# Create data directory
data_dir = os.path.join(output_dir, "data")
os.makedirs(data_dir, exist_ok=True)
output_jsonl_path = os.path.join(data_dir, "jeopardy.jsonl")

# Prepare dataset
train_path, val_path = prepare_jeopardy_dataset(df, output_jsonl_path)

In [None]:
# @title Load Dataset for Fine-tuning
# @markdown Load the prepared datasets

from datasets import load_dataset

# Load datasets
train_dataset = load_dataset('json', data_files=train_path, split='train')
val_dataset = load_dataset('json', data_files=val_path, split='train')

print(f"Loaded {len(train_dataset)} training examples")
print(f"Loaded {len(val_dataset)} validation examples")

# Show a sample example
print("\nSample training example:")
print(f"Prompt: {train_dataset[0]['prompt']}")
print(f"Completion: {train_dataset[0]['completion']}")

In [None]:
# @title Setup Model for Fine-tuning
# @markdown Configure TinyLlama with LoRA adapters

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, TaskType

# Model parameters
base_model = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
max_length = 256

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model)
tokenizer.pad_token = tokenizer.eos_token

# Check if GPU is available, adjust settings accordingly
if torch.cuda.is_available():
    print("Loading model with 8-bit quantization on GPU...")
    load_in_8bit = True
    device_map = "auto"
else:
    print("GPU not available. Loading smaller model configuration...")
    # If no GPU, use CPU with less memory
    load_in_8bit = False
    device_map = None

# Load base model with appropriate settings
try:
    model = AutoModelForCausalLM.from_pretrained(
        base_model,
        load_in_8bit=load_in_8bit,
        device_map=device_map,
        torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    )
except Exception as e:
    print(f"Error loading model with 8-bit quantization: {e}")
    print("Falling back to standard loading...")
    model = AutoModelForCausalLM.from_pretrained(
        base_model,
        torch_dtype=torch.float32,
    )

# Prepare for PEFT fine-tuning (only if loaded with quantization)
if load_in_8bit:
    model = prepare_model_for_kbit_training(model)

# Configure LoRA
lora_config = LoraConfig(
    r=16,               # Rank of the update matrices
    lora_alpha=32,      # Parameter for scaling
    lora_dropout=0.05,  # Dropout probability
    bias="none",
    task_type=TaskType.CAUSAL_LM,
    # Target attention modules for TinyLlama
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
)

# Get PEFT model
model = get_peft_model(model, lora_config)

# Print trainable parameters info
def print_trainable_parameters(model):
    trainable_params = 0
    all_params = 0
    for _, param in model.named_parameters():
        all_params += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params:,d} || "
        f"all params: {all_params:,d} || "
        f"trainable%: {100 * trainable_params / all_params:.2f}%"
    )

print_trainable_parameters(model)

In [None]:
# @title Prepare Data for Training
# @markdown Tokenize and format data for the trainer

from transformers import DataCollatorForLanguageModeling

# Define data preprocessing function
def preprocess_function(examples):
    # Combine prompt and completion for training
    texts = [
        f"{prompt}\n{completion}"
        for prompt, completion in zip(examples['prompt'], examples['completion'])
    ]

    # Tokenize
    result = tokenizer(
        texts,
        truncation=True,
        max_length=max_length,
        padding="max_length",
    )

    # Create labels (for causal LM, labels are the same as input_ids)
    result["labels"] = result["input_ids"].copy()

    return result

# Preprocess datasets
tokenized_train = train_dataset.map(
    preprocess_function,
    batched=True,
    remove_columns=train_dataset.column_names,
)

tokenized_val = val_dataset.map(
    preprocess_function,
    batched=True,
    remove_columns=val_dataset.column_names,
)

# Data collator
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False  # We're not doing masked language modeling
)

print(f"Prepared {len(tokenized_train)} training examples")
print(f"Prepared {len(tokenized_val)} validation examples")

In [None]:
# @title Fine-tune the Model
# @markdown Start the training process

from transformers import TrainingArguments, Trainer

# Training parameters - adjust based on available resources
num_epochs = 3

# Adjust batch size based on GPU availability
if torch.cuda.is_available():
    batch_size = 8
    gradient_accumulation_steps = 4
else:
    # Use smaller batches on CPU
    batch_size = 2
    gradient_accumulation_steps = 8
    print("WARNING: Training on CPU will be very slow. Consider using a GPU runtime.")

learning_rate = 2e-4

# Training arguments
training_args = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_epochs,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    evaluation_strategy="steps",
    eval_steps=100,
    logging_steps=50,
    learning_rate=learning_rate,
    weight_decay=0.01,
    fp16=torch.cuda.is_available(),  # Only use fp16 if GPU is available
    bf16=False,
    optim="adamw_torch",
    warmup_steps=100,
    save_steps=100,
    save_total_limit=3,
    load_best_model_at_end=True,
    report_to="tensorboard",  # Enable tensorboard for visualization
)

# Initialize trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_val,
    data_collator=data_collator,
)

# Launch TensorBoard to monitor training (Colab-specific)
try:
    %load_ext tensorboard
    %tensorboard --logdir {output_dir}
except Exception as e:
    print(f"TensorBoard could not be started: {e}")
    print("You can still view training progress via the logs.")

# Train model
print("Starting training...")
trainer.train()

# Save the fine-tuned model
model_path = f"{output_dir}/final"
model.save_pretrained(model_path)
tokenizer.save_pretrained(model_path)

print(f"Model fine-tuning complete. Saved to {model_path}")

In [None]:
# @title Evaluate the Model
# @markdown Test the model on Jeopardy questions

import matplotlib.pyplot as plt
from peft import PeftModel
from tqdm.notebook import tqdm

# Create test questions from validation set
test_questions = [
    {
        "category": item["prompt"].split("\n")[0].replace("Category: ", ""),
        "question": item["prompt"].split("\n")[1].replace("Clue: ", ""),
        "answer": item["completion"]
    }
    for item in val_dataset[:100]  # Evaluate on 100 questions
]

# Load fine-tuned model for evaluation
try:
    # Load the same model and apply the fine-tuned weights
    print("Loading base model...")
    base_model = AutoModelForCausalLM.from_pretrained(
        "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
        device_map="auto" if torch.cuda.is_available() else None,
        torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    )

    print(f"Loading fine-tuned adapters from {model_path}...")
    model = PeftModel.from_pretrained(base_model, model_path)
    print("Model loaded successfully")
except Exception as e:
    print(f"Error loading model: {e}")
    print("Please make sure training was completed successfully.")
    # Exit the evaluation if model loading failed
    raise e

correct = 0
results = []

print("Evaluating model on test questions...")
for question in tqdm(test_questions):
    category = question["category"]
    clue = question["question"]
    correct_answer = question["answer"].lower()

    # Format the prompt
    prompt = f"Category: {category}\nClue: {clue}\nAnswer in the form of a question:"

    # Tokenize and generate
    inputs = tokenizer(prompt, return_tensors="pt")
    # Move inputs to the same device as model
    if torch.cuda.is_available():
        inputs = inputs.to("cuda")

    # Generate with error handling
    try:
        outputs = model.generate(
            inputs.input_ids,
            max_new_tokens=50,
            temperature=0.7,
            top_p=0.9,
            do_sample=True
        )

        # Decode and extract answer
        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
        answer = response.replace(prompt, "").strip().lower()

        # Check if answer is correct (simple string matching)
        is_correct = correct_answer.lower() in answer or answer in correct_answer.lower()
        if is_correct:
            correct += 1

        results.append({
            "category": category,
            "question": clue,
            "correct_answer": correct_answer,
            "model_answer": answer,
            "is_correct": is_correct
        })
    except Exception as e:
        print(f"Error generating answer for question '{clue}': {e}")
        continue

accuracy = correct / len(test_questions) if test_questions else 0
print(f"Evaluation complete. Accuracy: {accuracy:.2%} ({correct}/{len(test_questions)})")

# Visualize results
plt.figure(figsize=(10, 5))
plt.bar(['Correct', 'Incorrect'], [correct, len(test_questions) - correct])
plt.title('Model Evaluation Results')
plt.ylabel('Number of Questions')
plt.show()

# Show some example results
print("\nSample results:")
import random
sample_results = random.sample(results, min(5, len(results)))
for i, result in enumerate(sample_results):
    print(f"\nExample {i+1}:")
    print(f"Category: {result['category']}")
    print(f"Question: {result['question']}")
    print(f"Correct answer: {result['correct_answer']}")
    print(f"Model answer: {result['model_answer']}")
    print(f"Correct: {'✓' if result['is_correct'] else '✗'}")

In [None]:
# @title Create Demo Interface
# @markdown Create a simple demo interface in Colab

from IPython.display import display, HTML
import ipywidgets as widgets

# Check if we have a model loaded
try:
    model
except NameError:
    print("ERROR: Model not found. Please run the training and evaluation cells first.")
    model = None

if model is not None:
    # Create input widgets
    category_input = widgets.Text(
        value='',
        placeholder='Enter category',
        description='Category:',
        disabled=False
    )

    clue_input = widgets.Textarea(
        value='',
        placeholder='Enter clue',
        description='Clue:',
        disabled=False
    )

    submit_button = widgets.Button(
        description='Get Answer',
        disabled=False,
        button_style='primary',
        tooltip='Click to get the answer',
        icon='check'
    )

    output_area = widgets.Output()

    # Define submit function
    def on_submit_clicked(b):
        with output_area:
            output_area.clear_output()

            category = category_input.value
            clue = clue_input.value

            if not category or not clue:
                print("Please enter both category and clue.")
                return

            print("Generating answer...")

            # Format the prompt
            prompt = f"Category: {category}\nClue: {clue}\nAnswer in the form of a question:"

            try:
                # Tokenize and generate
                inputs = tokenizer(prompt, return_tensors="pt")
                if torch.cuda.is_available():
                    inputs = inputs.to("cuda")

                outputs = model.generate(
                    inputs.input_ids,
                    max_new_tokens=50,
                    temperature=0.7,
                    top_p=0.9,
                    do_sample=True
                )

                # Decode and extract answer
                response = tokenizer.decode(outputs[0], skip_special_tokens=True)
                answer = response.replace(prompt, "").strip()

                print(f"\nAnswer: {answer}")
            except Exception as e:
                print(f"Error generating answer: {e}")

    # Connect button to function
    submit_button.on_click(on_submit_clicked)

    # Display widgets
    print("Jeopardy Question Answering Demo")
    display(category_input)
    display(clue_input)
    display(submit_button)
    display(output_area)

In [None]:
# @title Download Fine-tuned Model (Optional)
# @markdown Download the fine-tuned model to your local machine

from google.colab import files
import shutil

try:
    # Check if model_path exists
    if 'model_path' not in globals() or not os.path.exists(model_path):
        print("Model not found. Please complete the training process first.")
    else:
        # Compress the model directory
        print(f"Compressing model from {model_path}...")
        !zip -r {output_dir}/jeopardy_model.zip {model_path}

        # Download the compressed file
        print("Downloading model zip file...")
        files.download(f"{output_dir}/jeopardy_model.zip")

        print("Model downloaded. To use this model locally:")
        print("1. Extract the ZIP file")
        print("2. Load it with PeftModel.from_pretrained()")
        print("3. Use the provided API server code in the next cell")
except Exception as e:
    print(f"Error downloading model: {e}")
    print("If you're not in Colab, you can manually copy the model files from the output directory.")

## API Server Code (For Local Use)

This code can be used to create a simple API server for the fine-tuned model when running locally. Save this to a file named `api_server.py`:

```python
# Flask API Server Implementation for Local Use
# Save this to a file named "api_server.py" on your local machine

from flask import Flask, request, jsonify
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

app = Flask(__name__)

# Update these paths to match your local setup
base_model_path = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
adapter_path = "./jeopardy_model"  # Path to extracted model adapters

# Load the tokenizer and model
print("Loading tokenizer and model...")
tokenizer = AutoTokenizer.from_pretrained(base_model_path)

# Check for GPU and load model accordingly
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_path,
    torch_dtype=torch.float16 if device == "cuda" else torch.float32,
    device_map="auto" if device == "cuda" else None,
)

# Load fine-tuned model
model = PeftModel.from_pretrained(base_model, adapter_path)
model.eval()  # Set to evaluation mode

print("Model loaded successfully")

@app.route('/predict', methods=['POST'])
def predict():
    # Get JSON data from request
    data = request.json
    if not data or 'category' not in data or 'clue' not in data:
        return jsonify({'error': 'Please provide both category and clue fields'}), 400
    
    category = data['category']
    clue = data['clue']
    
    # Format the prompt
    prompt = f"Category: {category}\nClue: {clue}\nAnswer in the form of a question:"
    
    # Generate answer
    try:
        inputs = tokenizer(prompt, return_tensors="pt")
        if device == "cuda":
            inputs = inputs.to("cuda")
        
        with torch.no_grad():
            outputs = model.generate(
                inputs.input_ids,
                max_new_tokens=50,
                temperature=0.7,
                top_p=0.9,
                do_sample=True
            )
        
        # Decode the output
        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
        answer = response.replace(prompt, "").strip()
        
        return jsonify({
            'category': category,
            'clue': clue,
            'answer': answer
        })
    except Exception as e:
        return jsonify({'error': str(e)}), 500

@app.route('/', methods=['GET'])
def home():
    return """
    <h1>Jeopardy API Server</h1>
    <p>Make a POST request to /predict with JSON data containing 'category' and 'clue'.</p>
    <p>Example:</p>
    <pre>{
  "category": "HISTORY",
  "clue": "This document, signed in 1776, announced that the 13 American colonies were no longer part of Great Britain"
}</pre>
    """

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000, debug=False)
```

### Usage Instructions:

1. Install the required packages: `pip install flask torch transformers peft`
2. Update the `adapter_path` to point to your extracted model
3. Run the server: `python api_server.py`
4. Access the API at `http://localhost:5000/predict`

### Example cURL request:

```bash
curl -X POST http://localhost:5000/predict \
  -H "Content-Type: application/json" \
  -d '{"category":"SCIENCE", "clue":"This force keeps planets in orbit around the sun"}'
```

## Final Notes

This notebook demonstrates a complete pipeline for fine-tuning TinyLlama-1.1B on Jeopardy questions using LoRA. Here's a summary of what we've accomplished:

1. **Data Preparation**: Downloaded and processed Jeopardy questions into an instruction tuning format
2. **Model Fine-tuning**: Used PEFT/LoRA to efficiently fine-tune TinyLlama without requiring extensive computational resources
3. **Evaluation**: Tested the model's performance on Jeopardy questions
4. **Interactive Demo**: Created a simple interface to interact with the model
5. **Deployment**: Provided code for deploying the model as an API server

### Improving the Model

To improve the model's performance, you could try:

- Using a larger base model (if you have more computational resources)
- Fine-tuning for more epochs
- Adjusting hyperparameters (learning rate, batch size, etc.)
- Enhancing the data preprocessing (better handling of question format, more data)
- Using a more sophisticated evaluation metric

### Next Steps

- Try creating a more comprehensive interactive application
- Experiment with different prompt formats
- Apply the same technique to other trivia or question-answering datasets
- Deploy the model to a production environment

Happy fine-tuning!