# Fine-tune Llama-3.1-8B on AO3 Fragments

This notebook fine-tunes Llama-3.1-8B on 133k emotionally intense story fragments using LoRA.

**Requirements:**
- Google Colab Pro (T4 or A100 GPU)
- Upload train.jsonl, val.jsonl, test.jsonl to Colab
- Hugging Face account (for model access)

## Setup and Installation

In [None]:
# Install required packages
!pip install -q -U transformers datasets accelerate peft bitsandbytes trl wandb

In [None]:
# Login to Hugging Face (needed for Llama-3.1 access)
from huggingface_hub import notebook_login
notebook_login()

In [None]:
# # Login to Weights & Biases for experiment tracking (optional)
# import wandb
# wandb.login()

## Upload Training Data

Upload these files to Colab:
1. `train.jsonl` (~106k fragments)
2. `val.jsonl` (~13k fragments)
3. `test.jsonl` (~13k fragments)

Click the folder icon on the left sidebar, then upload the files.

## Load Data

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
from datasets import load_dataset

path = 'drive/MyDrive/NEW APP/'

# Load JSONL files
dataset = load_dataset(
    "json",
    data_files={
        "train": path + "train.jsonl",
        "validation": path + "val.jsonl",
        "test": path + "test.jsonl"
    }
)

print(f"Train: {len(dataset['train']):,} examples")
print(f"Validation: {len(dataset['validation']):,} examples")
print(f"Test: {len(dataset['test']):,} examples")

## Load Model with 4-bit Quantization

We use 4-bit quantization to fit Llama-3.1-8B in GPU memory.

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_name = "meta-llama/Meta-Llama-3.1-8B"

# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Load model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

model.config.use_cache = False
model.config.pretraining_tp = 1

print("✓ Model loaded")

## Configure LoRA

LoRA allows efficient fine-tuning by training only a small number of parameters.

In [None]:
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

# Prepare model for LoRA training
model = prepare_model_for_kbit_training(model)

# LoRA config
lora_config = LoraConfig(
    r=16,  # Rank of LoRA matrices
    lora_alpha=32,  # Scaling factor
    target_modules=[  # Which layers to apply LoRA to
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

print("✓ LoRA configured")

## Tokenize Data

In [None]:
def tokenize_function(examples):
    # Tokenize text
    outputs = tokenizer(
        examples["text"],
        truncation=True,
        max_length=80,
        padding="max_length",
    )

    # For causal LM, labels are the same as input_ids
    outputs["labels"] = outputs["input_ids"].copy()

    return outputs

# Tokenize all splits
tokenized_dataset = dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=dataset["train"].column_names,
    desc="Tokenizing"
)

print("✓ Data tokenized")

## Training Configuration

In [None]:
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling

# FASTER training arguments
training_args = TrainingArguments(
    output_dir="./llama-3.1-ao3-fragments",
    num_train_epochs=2,  # Changed from 3 to 2
    per_device_train_batch_size=8,  # Changed from 4 to 8
    per_device_eval_batch_size=8,  # Changed from 4 to 8
    gradient_accumulation_steps=2,  # Changed from 4 to 2

    # Learning rate
    learning_rate=2e-4,
    lr_scheduler_type="cosine",
    warmup_ratio=0.03,

    # Optimization
    optim="paged_adamw_32bit",
    weight_decay=0.01,
    max_grad_norm=0.3,

    # Logging and evaluation - REDUCED
    logging_steps=200,  # Changed from 50
    eval_strategy="epoch",  # Changed from steps - only eval at end of each epoch
    save_strategy="epoch",  # Changed from steps
    save_total_limit=2,  # Changed from 3

    # Performance
    fp16=False,
    bf16=True,
    group_by_length=True,
    dataloader_num_workers=2,  # Added for faster data loading
)

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)

## Train Model

In [None]:
# Initialize trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    data_collator=data_collator,
)

# Train
print("Starting training...")
trainer.train()

print("✓ Training complete!")

## Save Model

In [None]:
# Save LoRA adapter
model.save_pretrained("./llama-3.1-ao3-lora-adapter")
tokenizer.save_pretrained("./llama-3.1-ao3-lora-adapter")

print("✓ Model saved to ./llama-3.1-ao3-lora-adapter")

## Evaluate on Test Set

In [None]:
# Evaluate on test set
test_results = trainer.evaluate(tokenized_dataset["test"])

print("Test results:")
for key, value in test_results.items():
    print(f"  {key}: {value:.4f}")

## Test Generation

Generate some sample stories to see how the model performs.

In [None]:
# Generate samples
model.eval()

prompts = [
    "Sarah and Emma",
    "The moment when",
    "In the darkness,",
    "Alex couldn't believe",
    "They finally"
]

print("Generated stories:\n" + "="*80)

for prompt in prompts:
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    outputs = model.generate(
        **inputs,
        max_new_tokens=50,
        do_sample=True,
        temperature=0.9,
        top_p=0.95,
        repetition_penalty=1.1,
        pad_token_id=tokenizer.eos_token_id
    )
    
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(f"Prompt: {prompt}")
    print(f"Generated: {generated_text}")
    print("-" * 80)

## Download Model

Download the LoRA adapter to use locally.

In [None]:
# Zip the adapter folder
!zip -r llama-3.1-ao3-lora-adapter.zip llama-3.1-ao3-lora-adapter

# Download the zip file
from google.colab import files
files.download('llama-3.1-ao3-lora-adapter.zip')

print("✓ Download started")

## Usage Instructions

After downloading the adapter, you can load it locally:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3.1-8B")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "./llama-3.1-ao3-lora-adapter")

# Generate
inputs = tokenizer("Sarah and Emma", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))
```