# LLM Fine-tuning Methods Comparison

This notebook demonstrates various parameter-efficient fine-tuning (PEFT) methods for large language models.

**Note:** All implementations are abstracted with a shared `CoreFineTuner` class that handles common training, validation, and evaluation logic. Each method only implements its specific configuration and setup.

**Methods covered:**
1. **Full Fine-tuning** - Updates all parameters (baseline)
2. **LoRA** - Low-rank adaptation with trainable A and B matrices
3. **LoRA-FA** - LoRA with frozen A matrix (only B trainable)
4. **LoRA+** - LoRA with different learning rates for A and B
5. **Delta-LoRA** - LoRA with base weight updates via approximation
6. **AdaLoRA** - Adaptive LoRA with dynamic rank allocation
7. **QLoRA** - Quantized LoRA with 4-bit base weights
8. **VeRA** - Vector-based random matrix adaptation
9. **Prompt Tuning** - Simple learnable prompt tokens
10. **P-Tuning** - Prompt tuning with encoder



In [2]:
from src.finetuner import (
    FullFineTuner,
    LoRAFineTuner,
    LoRAFAFineTuner,
    LoRAPlusFineTuner,
    DeltaLoRAFineTuner,
    AdaLoRAFineTuner,
    QLoRAFineTuner,
    VeRAFineTuner,
    PromptTuningFineTuner,
    PTuningFineTuner
)

## Full Fine-tuning

Updates all model parameters during training.

- Achieves high accuracy. But requires most memory
- All weights are trainable
- Best for when you have sufficient computational resources

In [3]:
full_finetuner = FullFineTuner()
full_finetuner.run(save_model=True)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Full Fine-tuning

Loading IMDB dataset...

Starting training...


Epoch 1/3: 100%|██████████| 1563/1563 [04:36<00:00,  5.66it/s]


Epoch 1 - Training Loss: 0.3088
Epoch 1 - Validation Loss: 0.1774


Epoch 2/3: 100%|██████████| 1563/1563 [04:37<00:00,  5.64it/s]


Epoch 2 - Training Loss: 0.1617
Epoch 2 - Validation Loss: 0.2169


Epoch 3/3: 100%|██████████| 1563/1563 [04:37<00:00,  5.64it/s]


Epoch 3 - Training Loss: 0.0895
Epoch 3 - Validation Loss: 0.3429
Model saved to models/full_finetuner

Evaluating the model...


Evaluating: 100%|██████████| 782/782 [01:37<00:00,  7.99it/s]


Evaluation Results
Loss: 0.2871
Accuracy: 0.9332 (93.32%)
Precision: 0.9333
Recall: 0.9332
F1 Score: 0.9332






## LoRA (Low-Rank Adaptation)

Adds trainable low-rank matrices (A and B) to specific layers while keeping base weights frozen.

- Weight update: W' = W + BA
- Only ~1-2% of parameters are trainable
- Memory efficient with competitive performance

In [4]:
lora = LoRAFineTuner()
lora.run(save_model=True)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 887,042 || all params: 67,842,052 || trainable%: 1.3075
LoRA Fine-tuning

Loading IMDB dataset...

Starting training...


Epoch 1/3: 100%|██████████| 1563/1563 [03:52<00:00,  6.74it/s]


Epoch 1 - Training Loss: 0.2986
Epoch 1 - Validation Loss: 0.2036


Epoch 2/3: 100%|██████████| 1563/1563 [03:52<00:00,  6.72it/s]


Epoch 2 - Training Loss: 0.2103
Epoch 2 - Validation Loss: 0.1836


Evaluating: 100%|██████████| 782/782 [01:50<00:00,  7.05it/s]]


Evaluation Results
Loss: 0.1996
Accuracy: 0.9262 (92.62%)
Precision: 0.9263
Recall: 0.9262
F1 Score: 0.9262






## LoRA-FA (LoRA with Frozen-A)

Only trains the B matrix while keeping A frozen, reducing trainable parameters by 50% compared to standard LoRA.

In [5]:
lora_fa = LoRAFAFineTuner()
lora_fa.run(save_model=True)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 739,586 || all params: 67,842,052 || trainable%: 1.0902
LoRA-FA Fine-tuning
Key Feature: Only matrix B is trained, matrix A is frozen

Loading IMDB dataset...

Starting LoRA-FA training...


Epoch 1/3: 100%|██████████| 1563/1563 [03:47<00:00,  6.86it/s]


Epoch 1 - Training Loss: 0.3292
Epoch 1 - Validation Loss: 0.2582


Epoch 2/3: 100%|██████████| 1563/1563 [03:47<00:00,  6.86it/s]


Epoch 2 - Training Loss: 0.2412
Epoch 2 - Validation Loss: 0.2750


Epoch 3/3: 100%|██████████| 1563/1563 [03:47<00:00,  6.86it/s]


Epoch 3 - Training Loss: 0.2206
Epoch 3 - Validation Loss: 0.2216
Model saved to models/lora_fa

Evaluating the model...


Evaluating: 100%|██████████| 782/782 [01:50<00:00,  7.06it/s]


Evaluation Results
Loss: 0.2178
Accuracy: 0.9120 (91.20%)
Precision: 0.9121
Recall: 0.9120
F1 Score: 0.9120






## LoRA+

- Different learning rates for A and B matrices
- B matrix uses much higher learning rate (16-32x)
- Improved training stability and performance

In [14]:
lora_plus = LoRAPlusFineTuner()
lora_plus.run(save_model=True)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 887,042 || all params: 67,842,052 || trainable%: 1.3075
LoRA+ Fine-tuning

Loading IMDB dataset...

Starting LoRA+ training...


Epoch 1/3: 100%|██████████| 1563/1563 [03:52<00:00,  6.72it/s]


Epoch 1 - Training Loss: 0.3006
Epoch 1 - Validation Loss: 0.2805


Epoch 2/3: 100%|██████████| 1563/1563 [03:53<00:00,  6.70it/s]


Epoch 2 - Training Loss: 0.2118
Epoch 2 - Validation Loss: 0.2317


Epoch 3/3: 100%|██████████| 1563/1563 [03:53<00:00,  6.70it/s]


Epoch 3 - Training Loss: 0.1647
Epoch 3 - Validation Loss: 0.2372
Model saved to models/lora_plus

Evaluating the model...


Evaluating: 100%|██████████| 782/782 [01:51<00:00,  7.04it/s]


Evaluation Results
Loss: 0.2103
Accuracy: 0.9280 (92.80%)
Precision: 0.9281
Recall: 0.9280
F1 Score: 0.9280






## Delta-LoRA - Approximation

Updates both LoRA adapters and base weights. The difference between the product of low-rank matrices A and B in two consecutive steps is added to W. This implementation used an approximation method with different learning rate for base weights instead of directly computing the delta.

- Combines LoRA efficiency with direct base weight updates
- Base weights are trainable
- Approximates delta mechanism using a smaller learning rate for base weights

In [7]:
delta_lora = DeltaLoRAFineTuner()
delta_lora.run(save_model=True)

Epoch 1/3: 100%|██████████| 1563/1563 [05:05<00:00,  5.12it/s]


Epoch 1 - Training Loss: 0.3268
Epoch 1 - Validation Loss: 0.2114


Epoch 2/3: 100%|██████████| 1563/1563 [05:05<00:00,  5.12it/s]


Epoch 2 - Training Loss: 0.2174
Epoch 2 - Validation Loss: 0.2245


Epoch 3/3: 100%|██████████| 1563/1563 [05:05<00:00,  5.12it/s]


Epoch 3 - Training Loss: 0.1876
Epoch 3 - Validation Loss: 0.1865
Model saved to models/delta_lora

Evaluating the model...


Evaluating: 100%|██████████| 782/782 [01:50<00:00,  7.07it/s]


Evaluation Results
Loss: 0.2194
Accuracy: 0.9231 (92.31%)
Precision: 0.9235
Recall: 0.9231
F1 Score: 0.9231






## AdaLoRA (Adaptive LoRA)

Dynamically adjusts the rank of LoRA adapters during training.

- Adaptive rank allocation based on importance
- Prunes less important adapters during training
- More efficient parameter usage than standard LoRA

In [8]:
ada_lora = AdaLoRAFineTuner()
ada_lora.run(save_model=True)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 1,034,786 || all params: 67,989,820 || trainable%: 1.5220
AdaLoRA Fine-tuning

Loading IMDB dataset...

Starting AdaLoRA training...


Epoch 1/3: 100%|██████████| 1563/1563 [04:08<00:00,  6.30it/s]


Epoch 1 - Training Loss: 0.5214
Epoch 1 - Validation Loss: 0.2677


Epoch 2/3: 100%|██████████| 1563/1563 [04:08<00:00,  6.30it/s]


Epoch 2 - Training Loss: 0.2529
Epoch 2 - Validation Loss: 0.2335


Evaluating: 100%|██████████| 782/782 [01:55<00:00,  6.75it/s]]


Evaluation Results
Loss: 0.2211
Accuracy: 0.9110 (91.10%)
Precision: 0.9110
Recall: 0.9110
F1 Score: 0.9110






## QLoRA (Quantized LoRA)

Combines 4-bit quantization with LoRA for memory-efficient fine-tuning of large models.

- Base model weights quantized to 4-bit (NF4)
- LoRA adapters remain in full precision
- Significantly reduces memory requirements

**Note:** Requires x86_64 Linux with CUDA (NVIDIA GPU)

In [16]:
qlora = QLoRAFineTuner()
qlora.run(save_model=True)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 296,450 || all params: 67,249,922 || trainable%: 0.4408
QLoRA Fine-tuning

Loading IMDB dataset...

Starting QLoRA training...


Epoch 1/3: 100%|██████████| 1563/1563 [04:18<00:00,  6.06it/s]


Epoch 1 - Training Loss: 0.3266
Epoch 1 - Validation Loss: 0.2134


Epoch 2/3: 100%|██████████| 1563/1563 [04:18<00:00,  6.05it/s]


Epoch 2 - Training Loss: 0.2159
Epoch 2 - Validation Loss: 0.2045


Epoch 3/3: 100%|██████████| 1563/1563 [04:18<00:00,  6.05it/s]


Epoch 3 - Training Loss: 0.1861




Epoch 3 - Validation Loss: 0.1871
Model saved to models/qlora

Evaluating after fine-tuning...


Evaluating: 100%|██████████| 782/782 [01:28<00:00,  8.84it/s]


Evaluation Results
Loss: 0.2036
Accuracy: 0.9256 (92.56%)
Precision: 0.9258
Recall: 0.9256
F1 Score: 0.9256


QLoRA fine-tuning completed!





## VeRA (Vector-based Random Matrix Adaptation)

Uses shared frozen random matrices (A and B) with trainable scaling vectors (b and d).

- Only two trainable vectors per module (b and d)
- A and B matrices are frozen shared random matrices
- Extremely parameter efficient

In [13]:
vera = VeRAFineTuner()
vera.run(save_model=True)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 616,706 || all params: 67,571,716 || trainable%: 0.9127
VeRA Fine-tuning

Loading IMDB dataset...

Starting VeRA training...


Epoch 1/3: 100%|██████████| 1563/1563 [04:14<00:00,  6.13it/s]


Epoch 1 - Training Loss: 0.3515
Epoch 1 - Validation Loss: 0.3329


Epoch 3/3: 100%|██████████| 1563/1563 [04:15<00:00,  6.11it/s]


Epoch 3 - Training Loss: 0.2315
Epoch 3 - Validation Loss: 0.2231
Model saved to models/vera

Evaluating the model...


Evaluating: 100%|██████████| 782/782 [02:01<00:00,  6.42it/s]



Evaluation Results
Loss: 0.2224
Accuracy: 0.9102 (91.02%)
Precision: 0.9104
Recall: 0.9102
F1 Score: 0.9102



## Prompt Tuning

Adds learnable prompt tokens to input sequences that guide model behavior without modifying base weights.

- Base model weights remain completely frozen
- Learnable prompt tokens prepended to inputs
- Highly parameter-efficient (only prompt embeddings are trainable)

In [18]:
prompt_tuning = PromptTuningFineTuner()
prompt_tuning.run(save_model=True)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 607,490 || all params: 67,562,500 || trainable%: 0.8992
Prompt Tuning Fine-tuning

Loading IMDB dataset...

Starting Prompt Tuning training...


Epoch 1/3: 100%|██████████| 1563/1563 [03:27<00:00,  7.52it/s]


Epoch 1 - Training Loss: 0.5345
Epoch 1 - Validation Loss: 0.4441


Epoch 2/3: 100%|██████████| 1563/1563 [03:28<00:00,  7.51it/s]


Epoch 2 - Training Loss: 0.4078
Epoch 2 - Validation Loss: 0.3985


Epoch 3/3: 100%|██████████| 1563/1563 [03:28<00:00,  7.51it/s]


Epoch 3 - Training Loss: 0.3760
Epoch 3 - Validation Loss: 0.3791
Model saved to models/prompt_tuning

Evaluating the model...


Evaluating: 100%|██████████| 782/782 [01:38<00:00,  7.95it/s]


Evaluation Results
Loss: 0.3425
Accuracy: 0.8522 (85.22%)
Precision: 0.8523
Recall: 0.8522
F1 Score: 0.8522






## P-Tuning

Uses a prompt encoder (MLP/LSTM) to generate prompt representations that are optimized across transformer layers.

- Base model weights remain frozen
- Prompt encoder (MLP/LSTM) generates prompt representations
- More complex than simple prompt tuning, typically better performance

In [12]:
p_tuning = PTuningFineTuner()
p_tuning.run(save_model=True)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 2,379,266 || all params: 69,334,276 || trainable%: 3.4316
P-Tuning Fine-tuning

Loading IMDB dataset...

Starting P-Tuning training...


Epoch 1/3: 100%|██████████| 1563/1563 [03:28<00:00,  7.50it/s]


Epoch 1 - Training Loss: 0.3692
Epoch 1 - Validation Loss: 0.2619


Epoch 2/3: 100%|██████████| 1563/1563 [03:28<00:00,  7.50it/s]


Epoch 2 - Training Loss: 0.2818
Epoch 2 - Validation Loss: 0.2596


Epoch 3/3: 100%|██████████| 1563/1563 [03:28<00:00,  7.51it/s]


Epoch 3 - Training Loss: 0.2606
Epoch 3 - Validation Loss: 0.2447
Model saved to models/p_tuning

Evaluating the model...


Evaluating: 100%|██████████| 782/782 [01:38<00:00,  7.97it/s]


Evaluation Results
Loss: 0.2440
Accuracy: 0.8996 (89.96%)
Precision: 0.8998
Recall: 0.8996
F1 Score: 0.8995




