# Fine-tuning Llama 3 for Reasoning with QLoRA

This notebook demonstrates fine-tuning Llama 3 8B using QLoRA for improved reasoning capabilities.

## 1. Setup and Installation

First, let's check if we have a GPU available and install the required dependencies.

In [None]:
# Check GPU availability
!nvidia-smi

In [None]:
# Clone the repository
!git clone https://github.com/yourusername/llm-trainer.git
%cd llm-trainer

In [None]:
# Install dependencies
!pip install -r requirements.txt

## 2. Data Processing

Process the Natural Reasoning dataset for fine-tuning.

In [None]:
# Process the reasoning dataset
!python -m src.data_processors.reasoning_processor --config configs/llama3_reasoning.yaml

## 3. Fine-tuning with QLoRA

Fine-tune the Llama 3 model using QLoRA.

In [None]:
# Set up Hugging Face access token
import os

# Set your Hugging Face token for accessing the model
os.environ["HF_TOKEN"] = "your_huggingface_token_here"  # Replace with your actual token

In [None]:
# Fine-tune the model
!python -m src.trainers.qlora_trainer configs/llama3_reasoning.yaml

## 4. Evaluation

Evaluate the fine-tuned model on the LogiQA benchmark.

In [None]:
# Set output directory for evaluation results
output_dir = "./evaluation_results"
model_path = "./output/llama3_reasoning"  # Path to the trained model

# Evaluate the model
!python -m src.evaluators.reasoning_evaluator --config configs/llama3_reasoning.yaml \
    --model_path {model_path} \
    --output_dir {output_dir}

## 5. Compare Models

Compare the performance of the base model vs. the fine-tuned model.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Load evaluation results
base_model_results = {"accuracy": 0.65}  # Replace with actual base model results
finetuned_results = {"accuracy": 0.75}  # Replace with actual fine-tuned results

# Create comparison dataframe
df = pd.DataFrame({
    "Model": ["Base Llama 3 8B", "Fine-tuned Llama 3 8B"],
    "Accuracy": [base_model_results["accuracy"], finetuned_results["accuracy"]]
})

# Plot comparison
plt.figure(figsize=(10, 6))
ax = df.plot.bar(x="Model", y="Accuracy", rot=0)
ax.set_ylim(0, 1.0)
ax.set_title("Reasoning Performance Comparison")

for i, v in enumerate(df["Accuracy"]):
    ax.text(i, v + 0.02, f"{v:.2f}", ha="center")

plt.tight_layout()
plt.savefig("model_comparison.png")
plt.show()

## 6. Export LoRA Adapter

Export the fine-tuned LoRA adapter for later use.

In [None]:
import shutil

# Copy adapter weights to a specific directory for easy download
adapter_path = "./output/llama3_reasoning/adapter_model"
export_path = "./lora_adapter"

if os.path.exists(adapter_path):
    # Create export directory
    os.makedirs(export_path, exist_ok=True)
    
    # Copy adapter files
    for file in os.listdir(adapter_path):
        shutil.copy(os.path.join(adapter_path, file), export_path)
    
    print(f"Adapter exported to {export_path}")
else:
    print(f"Adapter not found at {adapter_path}")

## 7. Test the Fine-tuned Model

Try out the fine-tuned model on a few custom reasoning questions.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel, PeftConfig

# Load the base model and adapter
model_path = "./output/llama3_reasoning"

# Load adapter config
config = PeftConfig.from_pretrained(model_path)

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    load_in_8bit=True,
    device_map="auto",
    trust_remote_code=True
)

# Load adapter model
model = PeftModel.from_pretrained(base_model, model_path, is_trainable=False)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path, trust_remote_code=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Create text generation pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=128,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
)

In [None]:
# Test on some custom questions
test_questions = [
    "If all roses are flowers and some flowers fade quickly, can we conclude that some roses fade quickly?",
    "If no mammals can fly, and all bats can fly, what can we conclude about bats?",
    "If all A are B, and all B are C, what can we conclude about the relationship between A and C?"
]

for question in test_questions:
    prompt = f"Question: {question}\n\nAnswer: "
    result = pipe(prompt, return_full_text=False)[0]["generated_text"]
    print(f"Question: {question}")
    print(f"Answer: {result}")
    print("-" * 80)