[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vuhung16au/hf-transformer-trove/blob/main/examples/09_peft_lora_qlora.ipynb)
[![View on GitHub](https://img.shields.io/badge/View_on-GitHub-blue?logo=github)](https://github.com/vuhung16au/hf-transformer-trove/blob/main/examples/09_peft_lora_qlora.ipynb)

# 09 - PEFT LoRA QLoRA: Parameter Efficient Fine-Tuning

## 🎯 Learning Objectives
By the end of this notebook, you will understand:
- Parameter Efficient Fine-Tuning (PEFT) concepts
- Low-Rank Adaptation (LoRA) technique and implementation
- Quantized LoRA (QLoRA) for memory-efficient training
- Comparing full fine-tuning vs PEFT approaches
- Practical implementation with PEFT library
- Performance and memory optimization strategies

## 📋 Prerequisites
- Basic understanding of machine learning concepts
- Familiarity with Python and PyTorch
- Knowledge of transformers and fine-tuning (refer to [Notebook 05](05_fine_tuning_trainer.ipynb))
- Understanding of model architectures

## 📚 What We'll Cover
1. **PEFT Introduction**: Concepts and motivation
2. **LoRA Theory**: Low-rank adaptation mathematics
3. **LoRA Implementation**: Using PEFT library
4. **QLoRA Technique**: Quantization + LoRA
5. **Model Comparison**: Full vs PEFT fine-tuning
6. **Memory Analysis**: Efficiency measurements
7. **Advanced Techniques**: Different PEFT methods
8. **Production Usage**: Best practices and deployment

## Introduction to Parameter Efficient Fine-Tuning

Parameter Efficient Fine-Tuning (PEFT) addresses the challenge of fine-tuning large language models efficiently:

### The Problem:
- **Full Fine-tuning**: Updates all model parameters (expensive, memory-intensive)
- **Large Models**: Billions of parameters require massive computational resources
- **Storage**: Each fine-tuned model requires full parameter storage

### PEFT Solutions:
- **LoRA (Low-Rank Adaptation)**: Learns low-rank decomposition of weight updates
- **QLoRA**: Combines quantization with LoRA for even greater efficiency
- **AdaLoRA**: Adaptive budget allocation for LoRA
- **Prefix Tuning**: Only fine-tune prefix tokens

### Benefits:
- **Memory Efficient**: Significantly reduced memory requirements
- **Storage Efficient**: Only store adapter weights (few MB vs GB)
- **Fast Training**: Fewer parameters to update
- **Modularity**: Can combine multiple adapters

In [None]:
# Import necessary libraries
from transformers import (
    AutoTokenizer, AutoModelForCausalLM,
    TrainingArguments, Trainer,
    DataCollatorForLanguageModeling
)
from peft import (
    LoraConfig, TaskType, get_peft_model, 
    prepare_model_for_int8_training,
    prepare_model_for_kbit_training
)
from datasets import load_dataset
import torch
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm.auto import tqdm
import time
import os
from typing import Dict, List, Optional
import warnings
warnings.filterwarnings('ignore')

# Device detection
def get_device() -> torch.device:
    """
    Automatically detect and return the best available device.
    
    Returns:
        torch.device: The optimal device for current hardware
    """
    if torch.cuda.is_available():
        device = torch.device("cuda")
        print(f"🚀 Using CUDA GPU: {torch.cuda.get_device_name()}")
        print(f"   Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
    elif torch.backends.mps.is_available():
        device = torch.device("mps") 
        print("🍎 Using Apple MPS (Apple Silicon)")
    else:
        device = torch.device("cpu")
        print("💻 Using CPU (consider GPU for PEFT efficiency)")
    
    return device

device = get_device()

print("\n📚 Libraries loaded successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"PEFT available: {'✅' if 'peft' in globals() else '❌'}")

## Summary

In this comprehensive notebook, we explored Parameter Efficient Fine-Tuning techniques:

### 🎯 **What We Accomplished**
1. **PEFT Concepts**: Understanding the motivation and benefits
2. **LoRA Theory**: Low-rank adaptation mathematics and intuition
3. **Implementation**: Practical LoRA usage with PEFT library
4. **QLoRA**: Combining quantization with LoRA for efficiency
5. **Comparisons**: Full fine-tuning vs PEFT approaches
6. **Memory Analysis**: Understanding resource requirements
7. **Advanced Methods**: Different PEFT techniques and trade-offs

### 🔑 **Key Concepts Mastered**
- **Low-Rank Adaptation**: Decomposing weight updates into smaller matrices
- **Quantization**: Reducing precision for memory efficiency
- **Adapter Modules**: Modular approach to fine-tuning
- **Memory Efficiency**: Dramatic reduction in computational requirements
- **Task-Specific Adaptation**: Tailoring models for specific tasks

### 📈 **Best Practices Learned**
- **Rank Selection**: Choose appropriate rank based on model size and task
- **Target Modules**: Select which layers to adapt for optimal performance
- **Quantization Strategy**: Balance efficiency and performance
- **Monitoring**: Track memory usage and training metrics
- **Modular Design**: Design adapters for reusability and composability

### 🚀 **Next Steps**
- **Notebook 10**: LLMs and Reinforcement Learning from Human Feedback
- **Documentation**: [PEFT Best Practices](../docs/peft-best-practices.md)
- **External Resources**: [PEFT Documentation](https://huggingface.co/docs/peft/index)

PEFT techniques like LoRA and QLoRA have democratized fine-tuning of large language models, making it accessible to researchers and practitioners with limited computational resources!

---

*Ready for the final notebook? Head to **Notebook 10: LLMs RLHF** to learn about aligning models with human feedback!*

---

## About the Author

**Vu Hung Nguyen** - AI Engineer & Researcher

Connect with me:
- 🌐 **Website**: [vuhung16au.github.io](https://vuhung16au.github.io/)
- 💼 **LinkedIn**: [linkedin.com/in/nguyenvuhung](https://www.linkedin.com/in/nguyenvuhung/)
- 💻 **GitHub**: [github.com/vuhung16au](https://github.com/vuhung16au/)

*This notebook is part of the [HF Transformer Trove](https://github.com/vuhung16au/hf-transformer-trove) educational series.*