# Application of Parameter-Efficient Fine-Tuning using LoRA

## Objective
* Load pre-trained LLM model (HuggingFace's Distilbert SST-2) and evaluate its performance against IMDb reviews sentiment data
* Perform Parameter-Efficient Fine Tuning on the pre-trained model to improve its effectiveness on IMDb review sentiment analysis
* Perform inference using the fine-tuned model and compare its performance gains against the original pre-trained model

### Loading packages

In [21]:
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, DataCollatorWithPadding, \
Trainer, TrainingArguments
import numpy as np
from evaluate import load as load_metric
from transformers.utils import logging
import torch
import os
from peft import LoraConfig, get_peft_model, PeftModel

### Evaluating environment compute

In [2]:
if torch.cuda.is_available():
    gpu_idx = 0
    total = torch.cuda.get_device_properties(gpu_idx).total_memory / (1024**3)
    print(f"GPU: {torch.cuda.get_device_name(gpu_idx)}")
    print(f"Total VRAM: {total:.2f} GB")
else:
    print("No GPU detected.")

print(f'Workers: {min(4, os.cpu_count() // 2)}') # Cap at 4 given 4 workers is sufficient for NLP tasks

GPU: Tesla T4
Total VRAM: 14.57 GB
Workers: 4


### Loading train/test splits of dataset and shuffling

In [3]:
# Initialize dataset splits
splits = ['train', 'test']

# Load train/test splits
ds = {split: ds for split, ds in zip(splits, load_dataset('imdb', split=splits))}

# Shuffle data (to avoid potential of sequenced label data)
for split in splits:
    ds[split] = ds[split].shuffle() 
    
ds

{'train': Dataset({
     features: ['text', 'label'],
     num_rows: 25000
 }),
 'test': Dataset({
     features: ['text', 'label'],
     num_rows: 25000
 })}

### Pre-Processing data

In [4]:
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased')

# Function to tokenize a provided list of natural language data
#    Truncating data per model restrictions (Distilbert accepts 512 tokens max)
#    Will be using data collator, thus no padding parameter added
def preprocess_function(batch):
    return tokenizer(batch['text'], truncation=True)

# Build dataset of tokenized id's
tokenized_ds = {}
for split in splits:
    tokenized_ds[split] = ds[split].map(
        preprocess_function,
        batched=True,
        remove_columns=['text']
    )
    
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

### Loading HuggingFace sentiment model

In [5]:
model = \
AutoModelForSequenceClassification.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')

### Evaluating HuggingFace sentiment model as-is

In [6]:
# Loading accuracy and F1 metrics
accuracy = load_metric('accuracy')
f1 = load_metric('f1')

# Function for inputting predictions and calculation accuracy/f1
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = np.argmax(logits, axis=1)
    return {
        'accuracy': accuracy.compute(predictions=preds, references=labels)['accuracy'],
        'f1': f1.compute(predictions=preds, references=labels, average='weighted')['f1']
    }

# Define arguments for model evaluation 
args = TrainingArguments(
    output_dir='./imdb_distilbert_sst2_baseline',
    per_device_eval_batch_size=32, # based on environment compute (14VRAM can handle 32)
    dataloader_num_workers=4, # based on environment compute (32 CPU cores can easily handle 4 workers)
    fp16=True, # environment has GPU, thus can use half-precision float for faster compute
)

# Define Trainer for model evaluation
trainer = Trainer(
    model=model,
    args=args,
    eval_dataset=tokenized_ds['test'],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

# Silence warnings
os.environ["TOKENIZERS_PARALLELISM"] = "false"
logging.set_verbosity_error()

# Run evaluation
baseline_metrics = trainer.evaluate()
print(baseline_metrics)

{'eval_loss': 0.41226479411125183, 'eval_accuracy': 0.89076, 'eval_f1': 0.8906695686894387, 'eval_runtime': 144.4961, 'eval_samples_per_second': 173.015, 'eval_steps_per_second': 5.412}


Model did reasonably well.

* **89% Accuracy** => out of all test samples, model got 89% of the labels correctly.

* **89% F1** => Model did a good job balancing precision (out of all predicted positive, how many actually came in as positive) and recall (out of all that were actually positive, how many were predicted positive)

## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

### Initialize LoRA

In [11]:
lora_config = LoraConfig(
    r=8, # Controls how much capacity LoRA has to learn new information - 8 is standard (good balance between # of parameters and meaningfulness)
    lora_alpha=16, # Amplifies or dampens how strong the LoRA update is - 16 is standard (ensures updates aren't too weak - underfitting - or too strong - destabilizing / catastrophic forgetting)
    target_modules=['q_lin', 'v_lin'], # Distributes attention only to query and value projection layers 
    lora_dropout=0.05, # Regularization to avoid overfitting - given we only have 25K samples to train, keeping it at 5%
    bias='none', # Finetuning of bias terms in PEFT layer - bias updates adds little benefit for large pre-trained models, thus leaving them frozen keeps the model stable and memory-efficient
    task_type='SEQ_CLS' # Given this is a sequence classification problem
)

### Wrapping model with LoRA adapters

In [13]:
peft_model = get_peft_model(model, lora_config)

peft_model.print_trainable_parameters()

trainable params: 147,456 || all params: 67,694,596 || trainable%: 0.21782536378531603


### Train the fine-tuned model

In [16]:
training_args = TrainingArguments(
    output_dir='./imdb_distilbert_sst2_lora',
    learning_rate=2e-4, # Standard given we are only training a tiny fraction of parameters, thus can afford faster learning rate without destabilizing the base model
    per_device_train_batch_size=32, # Environment compute should be able to handle it
    per_device_eval_batch_size=64, # Environment compute should be able to handle it
    num_train_epochs=3, # To have an idea if the model is still learning and requires more epochs
    weight_decay=0.01, # Regularization by adding penalty to large weights - 0.01 is standard to avoid overfitting/destabilization 
    evaluation_strategy='epoch',
    save_strategy='epoch',
    load_best_model_at_end=True,
    fp16=True, # environment has GPU, thus can use half-precision float for faster compute
    report_to='none'
)

trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=tokenized_ds['train'].select(range(5000)),
    eval_dataset=tokenized_ds['test'].select(range(2000)),
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

trainer.train()

{'eval_loss': 0.24531924724578857, 'eval_accuracy': 0.9035, 'eval_f1': 0.9033622970979527, 'eval_runtime': 12.8087, 'eval_samples_per_second': 156.144, 'eval_steps_per_second': 2.498, 'epoch': 1.0}
{'eval_loss': 0.23148700594902039, 'eval_accuracy': 0.91, 'eval_f1': 0.9099355724238746, 'eval_runtime': 12.8345, 'eval_samples_per_second': 155.83, 'eval_steps_per_second': 2.493, 'epoch': 2.0}
{'eval_loss': 0.22654685378074646, 'eval_accuracy': 0.9145, 'eval_f1': 0.9145189333517494, 'eval_runtime': 12.8719, 'eval_samples_per_second': 155.377, 'eval_steps_per_second': 2.486, 'epoch': 3.0}
{'train_runtime': 280.1194, 'train_samples_per_second': 53.549, 'train_steps_per_second': 1.681, 'train_loss': 0.2867354212799396, 'epoch': 3.0}


TrainOutput(global_step=471, training_loss=0.2867354212799396, metrics={'train_runtime': 280.1194, 'train_samples_per_second': 53.549, 'train_steps_per_second': 1.681, 'train_loss': 0.2867354212799396, 'epoch': 3.0})

### Model is still learning after 3 epochs, let's run 2 more epochs

In [17]:
training_args = TrainingArguments(
    output_dir='./imdb_distilbert_sst2_lora',
    learning_rate=2e-4, 
    per_device_train_batch_size=32, 
    per_device_eval_batch_size=64, 
    num_train_epochs=5, # Adjusting to 5 epochs so we can run 2 more 
    weight_decay=0.01, 
    evaluation_strategy='epoch',
    save_strategy='epoch',
    load_best_model_at_end=True,
    fp16=True,
    report_to='none'
)

trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=tokenized_ds['train'].select(range(5000)),
    eval_dataset=tokenized_ds['test'].select(range(2000)),
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

trainer.train(resume_from_checkpoint=True) # Resuming from where we left off (3 epochs in)

{'loss': 0.2531, 'learning_rate': 7.261146496815286e-05, 'epoch': 3.18}
{'eval_loss': 0.22198882699012756, 'eval_accuracy': 0.9165, 'eval_f1': 0.9165098013745926, 'eval_runtime': 12.7815, 'eval_samples_per_second': 156.476, 'eval_steps_per_second': 2.504, 'epoch': 4.0}
{'eval_loss': 0.22097976505756378, 'eval_accuracy': 0.9185, 'eval_f1': 0.9185141338993617, 'eval_runtime': 12.8654, 'eval_samples_per_second': 155.455, 'eval_steps_per_second': 2.487, 'epoch': 5.0}
{'train_runtime': 186.99, 'train_samples_per_second': 133.697, 'train_steps_per_second': 4.198, 'train_loss': 0.0990384751824057, 'epoch': 5.0}


TrainOutput(global_step=785, training_loss=0.0990384751824057, metrics={'train_runtime': 186.99, 'train_samples_per_second': 133.697, 'train_steps_per_second': 4.198, 'train_loss': 0.0990384751824057, 'epoch': 5.0})

This looks good. Although the model is still learning, it's with diminishing returns. We are at a decent point to stop.

### Saving model parameters

In [23]:
peft_model.save_pretrained('./tmp/imdb_distilbert_sst2_lora')
tokenizer.save_pretrained('./tmp/imdb_distilbert_sst2_lora_tokenizer')

('./tmp/imdb_distilbert_sst2_lora_tokenizer/tokenizer_config.json',
 './tmp/imdb_distilbert_sst2_lora_tokenizer/special_tokens_map.json',
 './tmp/imdb_distilbert_sst2_lora_tokenizer/vocab.txt',
 './tmp/imdb_distilbert_sst2_lora_tokenizer/added_tokens.json',
 './tmp/imdb_distilbert_sst2_lora_tokenizer/tokenizer.json')

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

### Reload pre-trained model

In [19]:
base_model = \
AutoModelForSequenceClassification.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')

### Load fine-tuned model and tokenizer

In [24]:
peft_model = PeftModel.from_pretrained(base_model, './tmp/imdb_distilbert_sst2_lora') # Load base model and trained adapters
peft_model.eval()

PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): DistilBertForSequenceClassification(
      (distilbert): DistilBertModel(
        (embeddings): Embeddings(
          (word_embeddings): Embedding(30522, 768, padding_idx=0)
          (position_embeddings): Embedding(512, 768)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (transformer): Transformer(
          (layer): ModuleList(
            (0-5): 6 x TransformerBlock(
              (attention): MultiHeadSelfAttention(
                (dropout): Dropout(p=0.1, inplace=False)
                (q_lin): Linear(
                  in_features=768, out_features=768, bias=True
                  (lora_dropout): ModuleDict(
                    (default): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (default): Linear(in_features=768, out_features=8, bias=Fal

### Evaluate fine-tuned model performance

In [25]:
# Define arguments for model evaluation 
args = TrainingArguments(
    output_dir='./imdb_distilbert_sst2_lora',
    per_device_eval_batch_size=32, # based on environment compute (14VRAM can handle 32)
    dataloader_num_workers=4, # based on environment compute (32 CPU cores can easily handle 4 workers)
    fp16=True, # environment has GPU, thus can use half-precision float for faster compute
)

# Define Trainer for model evaluation
trainer = Trainer(
    model=model,
    args=args,
    eval_dataset=tokenized_ds['test'],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

# Silence warnings
os.environ["TOKENIZERS_PARALLELISM"] = "false"
logging.set_verbosity_error()

# Run evaluation
peft_metrics = trainer.evaluate()
print(peft_metrics)

{'eval_loss': 0.23352672159671783, 'eval_accuracy': 0.90968, 'eval_f1': 0.9096768344982126, 'eval_runtime': 158.5949, 'eval_samples_per_second': 157.634, 'eval_steps_per_second': 4.931}
{'eval_loss': 0.23352672159671783, 'eval_accuracy': 0.90968, 'eval_f1': 0.9096768344982126, 'eval_runtime': 158.5949, 'eval_samples_per_second': 157.634, 'eval_steps_per_second': 4.931}


### Compare the two model results

In [26]:
print("\nBaseline (pre-FT):", {k: baseline_metrics[k] for k in ["eval_accuracy","eval_f1","eval_loss"]})
print("PEFT (post-FT):   ", {k: peft_metrics[k] for k in ["eval_accuracy","eval_f1","eval_loss"]})


Baseline (pre-FT): {'eval_accuracy': 0.89076, 'eval_f1': 0.8906695686894387, 'eval_loss': 0.41226479411125183}
PEFT (post-FT):    {'eval_accuracy': 0.90968, 'eval_f1': 0.9096768344982126, 'eval_loss': 0.23352672159671783}


This was a win! We were able to improve accuracy by almost 2%. Especially given we were already at 89% to begin with where marginal gains having a higher marginal cost, we were still able to reduce errors from 11% to 9%!