# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: LoRa - low-rank adaption, LoRa adds low-rank matrics to the model's layers, reducing the number of parameters that need to be updated
* Model: gpt-2, A transformer-based model, it's relatively lighweight compared to other large models for sequence classification
* Evaluation approach: Hugging Face Trainer API with Compute Metrics (acuracy-based) to assess fine-tuning performance
* Fine-tuning dataset: IMDB dataset for sentiment classification, a popular dataset with movie reviews labeled as positive or negative

<a name='hugging-face-peft-library'>Hugging Face PEFT Library</a>:

<ul>
  <li><a href="https://huggingface.co/docs/peft/index" style="text-decoration:none">What is PEFT?</a>: PEFT is a method for fine-tuning large language models with a focus on efficiency. Traditional fine-tuning requires modifying and storing all the model parameters, which can be extremely resource-intensive, especially with very large models. PEFT aims to reduce the amount of memory and computation required by fine-tuning only a small subset of parameters, instead of the entire model.</li>
  <li><a href="https://huggingface.co/docs/peft/main/en/conceptual_guides/lora" style="text-decoration:none">Using LoRA as a PEFT Technique</a>:LoRA is a specific PEFT technique that uses low-rank matrix adaptation to achieve efficient fine-tuning without altering the core model parameters. </li>
   <li><a href="https://huggingface.co/docs/bitsandbytes/main/en/index" style="text-decoration:none">What is QLoRa?</a>:QLoRA (Quantized LoRA) combines quantization with LoRA to further reduce the memory footprint and computation costs. Quantization is a process where the model's weights are converted from higher-precision floating-point numbers (like FP16 or FP32) to lower-precision numbers (like 4-bit or 8-bit integers). This reduces the model's size and speeds up computations, while LoRA provides a way to perform fine-tuning on a small subset of parameters (low-rank matrices).</li>
    <li><a style="text-decoration:none">Why use QLoRa with bitsandbytes?</a>:Using QLoRA with bitsandbytes for inference instead of LoRA allows you to leverage both quantization and low-rank adaptation. This combination offers significant memory and speed advantages, enabling large language models to be efficiently deployed on limited hardware while maintaining high accuracy and performance.</li>
</ul>


## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
!pip install transformers datasets peft evaluate scikit-learn bitsandbytes accelerate

Defaulting to user installation because normal site-packages is not writeable


In [2]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments, BitsAndBytesConfig
from datasets import load_dataset
import torch
model_name='gpt2'
dataset=load_dataset('imdb')
#load a smaller portion of the dataset by using the shuffle() and select() methods to take a random sample
sample_size=5000
train_dataset = dataset['train'].shuffle(seed=12).select(range(sample_size))
test_dataset = dataset['test'].shuffle(seed=12).select(range(sample_size))
tokenizer=AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name,num_labels=2)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [3]:
model

GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=2, bias=False)
)

In [4]:
print(f'Training dataset size: {len(train_dataset)}')
print(f'Test dataset size: {len(test_dataset)}')
train_dataset[:1]

Training dataset size: 5000
Test dataset size: 5000


{'text': ['Buffs of the adult western that flourished in the 1950s try and trace its origins to the film that kicked off the syndrome. Of course, we can go back to Howard Hawks\'s Red River (1948) or further still to John Ford\'s My Darling Clementine (1946), but if we want to stick with this single decade, then it has to be one of a couple of films made in that era\'s initial year. One is "The Gunfighter," an exquisitely grim tale of a famed gunslinger (Ringo) facing his last shootout. Another from that same year is "Winchester \'73," and it\'s worth noting that Millard Mitchell appears in both as grim, mustached, highly realistic range riders. In The Gunfighter, he\'s the town marshal expected to arrest Ringo but once rode with him in an outlaw gang. In Winchester, he\'s the sidekick to Jimmy Stewart, a kind of Horatio to Stewart\'s Hamlet in this epic/tragic tale. The plot is simple enough: Stewart\'s lonesome cowpoke wins a remarkable Winchester in a shooting match, beating the mea

In [5]:
#check for data imbalance
from collections import Counter
Counter(train_dataset['label']), Counter(test_dataset['label'])

(Counter({1: 2467, 0: 2533}), Counter({1: 2467, 0: 2533}))

<a>data is balanced.</a>

In [6]:
def tokenize_data(sample):
    return tokenizer(sample['text'],truncation=True, padding='max_length', max_length=128)
tokenizer.pad_token=tokenizer.eos_token  #end-of-sequence token
# tokenizer.add_special_tokens({'pad_token':'[PAD]'})
#ensure the model configuration recognizes the padding token
model.config.pad_token_id = tokenizer.pad_token_id
train_dataset = train_dataset.map(tokenize_data, batched=True)
test_dataset = test_dataset.map(tokenize_data, batched=True)

Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

In [7]:
#renaming the column from label to labels is necessary to ensure compatibility with the HF Trainer API>=
train_dataset = train_dataset.rename_column('label','labels')
test_dataset = test_dataset.rename_column('label','labels')

In [8]:
from evaluate import load
accuracy_metric= load('accuracy')

def compute_metrics(pred):
    #pred is generated by the Trainer during evaluation.  
    #pred.label_ids: the true labels of the dataset.
    # pred.predictions: the model's raw output predictions for each sample in the dataset.
    labels = pred.label_ids  #extract labels
    #converts the raw model predictions into actual class predictions by taking the index with the highest probability
    preds = pred.predictions.argmax(-1) 
    #return a dictionary with the accuracy score
    return accuracy_metric.compute(predictions=preds, references=labels)

In [9]:
training_args=TrainingArguments(output_dir='outputs',evaluation_strategy='epoch',
                                logging_steps=10,report_to='all', per_device_eval_batch_size=1)
#creating the trainer
trainer = Trainer(
    model = model,
    args = training_args,
    train_dataset = train_dataset,
    eval_dataset = test_dataset, #generating pred: preditions and label_ids
    compute_metrics = compute_metrics
)
#initial evaluation
initial_results = trainer.evaluate()  #calls the model on eval_dataset
print('Initial evaluation results: ', initial_results)

Initial evaluation results:  {'eval_loss': 0.8709287047386169, 'eval_accuracy': 0.4792, 'eval_runtime': 76.8252, 'eval_samples_per_second': 65.083, 'eval_steps_per_second': 65.083}


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [10]:
from peft import LoraConfig, get_peft_model
peft_config = LoraConfig(
    lora_alpha = 20,  #increaing alpha boosts the effect of the new parameters, while lowering it reduces their impact.
    lora_dropout=0.1,  #preventing overfitting by randomly dropping some connections during trainings
    r = 4,  #lower ranks reduce the number of new parameters added to the model, it can speed up training
    task_type='SEQ_CLS', # ensure the fine-tuning aligns with sequence classification needs
    bias='lora_only' #only bias associated with LoRa layers are trainable
)
#set up the BitsAndBytesConfig for 4-bit quantization
# quantization_config = BitsAndBytesConfig(load_in_8bit=True,bnb_8bit_compute_dtype=torch.float16)
# model = AutoModelForSequenceClassification.from_pretrained(model_name,num_labels=2, 
#                                                           device_map='auto',
#                                                           quantization_config=quantization_config)
# model.config.pad_token_id = tokenizer.pad_token_id
peft_model = get_peft_model(model, peft_config)



In [11]:
#train the LoRa model
trainer = Trainer(
    model = peft_model,
    args = training_args,
    train_dataset = train_dataset,
    eval_dataset = test_dataset,
    compute_metrics = compute_metrics
)
#starts the fine-tuning process, updating only the parameters introduced by LoRa 
#while keeping the core pre-trained model mostly unchanged
# duing training, the model learns the task-specific patterns in the training data 
trainer.train()
#save the fine-tuned model
peft_model.save_pretrained('fine_tuned_model')

Epoch,Training Loss,Validation Loss,Accuracy
1,0.5954,0.641472,0.6346
2,0.2933,0.388185,0.8362
3,0.4437,0.380486,0.838


Checkpoint destination directory outputs/checkpoint-500 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory outputs/checkpoint-1000 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory outputs/checkpoint-1500 already exists and is non-empty.Saving will proceed but saved results may be invalid.


## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

<a name='notes-about-peft-model'> Why need to use PEFT-specific model class? </a>
* The AutoPeftModelForSequenceClassification if specifically designed to load models fine-tuned with PEFT, like LoRa,QLoRa. This is necessary because PEFT methods like LoRA only save and load adapter weights, not the full model weights.
* Using this PEFT-specific class ensures the model knows to use the adapter weights (rather than expecting full model weights) and lets you perform efficient inference with your lightweight, fine-tuned model.

In [12]:
#load fine-tuned model
from peft import AutoPeftModelForSequenceClassification
#loading a saved PEFT model
peft_model = AutoPeftModelForSequenceClassification.from_pretrained('fine_tuned_model')
#update the trainer with the fine-tuned model
trainer = Trainer(
    model = peft_model,
    args = training_args,
    eval_dataset = test_dataset,
    compute_metrics = compute_metrics,
)

#final evaluation
tuned_results = trainer.evaluate()
print('Fine-tuned model evaluation results: ', tuned_results)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Fine-tuned model evaluation results:  {'eval_loss': 0.4047755002975464, 'eval_accuracy': 0.8348, 'eval_runtime': 88.5558, 'eval_samples_per_second': 56.462, 'eval_steps_per_second': 56.462}


In [13]:
print('Initail model accuracy: ', initial_results['eval_accuracy'])
print('Fine-tuned model accuracy: ', tuned_results['eval_accuracy'])

Initail model accuracy:  0.4792
Fine-tuned model accuracy:  0.8348
