# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: LoRa - low-rank adaption, LoRa adds low-rank matrics to the model's layers, reducing the number of parameters that need to be updated
* Model: gpt-2, A transformer-based model, it's relatively lighweight compared to other large models for sequence classification
* Evaluation approach: Hugging Face Trainer API with Compute Metrics (acuracy-based) to assess fine-tuning performance
* Fine-tuning dataset: IMDB dataset for sentiment classification, a popular dataset with movie reviews labeled as positive or negative

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
!pip install transformers datasets peft evaluate scikit-learn

Defaulting to user installation because normal site-packages is not writeable
Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl (84 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.0/84.0 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hCollecting scikit-learn
  Downloading scikit_learn-1.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.3/13.3 MB[0m [31m80.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting threadpoolctl>=3.1.0
  Downloading threadpoolctl-3.5.0-py3-none-any.whl (18 kB)
Collecting joblib>=1.2.0
  Downloading joblib-1.4.2-py3-none-any.whl (301 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m301.8/301.8 kB[0m [31m40.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: threadpoolctl, joblib, scikit-learn, evaluate
[0mSuccessfully installed evaluate-0.4.3 joblib-1.4.2 scikit-learn-1.5.2 threadp

In [2]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset
model_name='gpt2'
dataset=load_dataset('imdb')
train_dataset = dataset['train']
test_dataset = dataset['test']
tokenizer=AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name,num_labels=2)

Downloading readme:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

Downloading data: 100%|██████████| 21.0M/21.0M [00:00<00:00, 23.9MB/s]
Downloading data: 100%|██████████| 20.5M/20.5M [00:00<00:00, 27.4MB/s]
Downloading data: 100%|██████████| 42.0M/42.0M [00:01<00:00, 30.1MB/s]


Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [3]:
model

GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=2, bias=False)
)

In [4]:
train_dataset[:1]

{'text': ['I rented I AM CURIOUS-YELLOW from my video store because of all the controversy that surrounded it when it was first released in 1967. I also heard that at first it was seized by U.S. customs if it ever tried to enter this country, therefore being a fan of films considered "controversial" I really had to see this for myself.<br /><br />The plot is centered around a young Swedish drama student named Lena who wants to learn everything she can about life. In particular she wants to focus her attentions to making some sort of documentary on what the average Swede thought about certain political issues such as the Vietnam War and race issues in the United States. In between asking politicians and ordinary denizens of Stockholm about their opinions on politics, she has sex with her drama teacher, classmates, and married men.<br /><br />What kills me about I AM CURIOUS-YELLOW is that 40 years ago, this was considered pornographic. Really, the sex and nudity scenes are few and far b

In [5]:
def tokenize_data(sample):
    return tokenizer(sample['text'],truncation=True, padding='max_length', max_length=128)
tokenizer.pad_token=tokenizer.eos_token  #end-of-sequence token
# tokenizer.add_special_tokens({'pad_token':'[PAD]'})
#ensure the model configuration recognizes the padding token
model.config.pad_token_id = tokenizer.pad_token_id
train_dataset = train_dataset.map(tokenize_data, batched=True)
test_dataset = test_dataset.map(tokenize_data, batched=True)

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

In [6]:
#renaming the column from label to labels is necessary to ensure compatibility with the HF Trainer API>=
train_dataset = train_dataset.rename_column('label','labels')
test_dataset = test_dataset.rename_column('label','labels')

In [7]:
from evaluate import load
accuracy_metric= load('accuracy')

def compute_metrics(pred):
    #pred is generated by the Trainer during evaluation.  
    #pred.label_ids: the true labels of the dataset.
    # pred.predictions: the model's raw output predictions for each sample in the dataset.
    labels = pred.label_ids  #extract labels
    #converts the raw model predictions into actual class predictions by taking the index with the highest probability
    preds = pred.predictions.argmax(-1) 
    #return a dictionary with the accuracy score
    return accuracy_metric.compute(predictions=preds, references=labels)

Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

In [9]:
training_args=TrainingArguments(output_dir='outputs',evaluation_strategy='epoch', per_device_eval_batch_size=1)
#creating the trainer
trainer = Trainer(
    model = model,
    args = training_args,
    train_dataset = train_dataset,
    eval_dataset = test_dataset, #generating pred: preditions and label_ids
    compute_metrics = compute_metrics
)
#initial evaluation
initial_results = trainer.evaluate()  #calls the model on eval_dataset
print('Initial evaluation results: ', initial_results)

Initial evaluation results:  {'eval_loss': 5.047942161560059, 'eval_accuracy': 0.50092, 'eval_runtime': 378.8349, 'eval_samples_per_second': 65.992, 'eval_steps_per_second': 65.992}


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [11]:
from peft import LoraConfig, get_peft_model
peft_config = LoraConfig(
    lora_alpha = 16,  #increaing alpha boosts the effect of the new parameters, while lowering it reduces their impact.
    lora_dropout=0.1,  #preventing overfitting by randomly dropping some connections during trainings
    r = 8,  #lower ranks reduce the number of new parameters added to the model
    task_type='SEQ_CLS' # ensure the fine-tuning aligns with sequence classification needs
)
peft_model = get_peft_model(model, peft_config)



In [13]:
#train the LoRa model
trainer = Trainer(
    model = peft_model,
    args = training_args,
    train_dataset = train_dataset,
    eval_dataset = test_dataset,
    compute_metrics = compute_metrics
)
#starts the fine-tuning process, updating only the parameters introduced by LoRa 
#while keeping the core pre-trained model mostly unchanged
# duing training, the model learns the task-specific patterns in the training data 
trainer.train()
#save the fine-tuned model
peft_model.save_pretrained('fine_tuned_model')

Epoch,Training Loss,Validation Loss,Accuracy
1,0.3718,0.348218,0.85696
2,0.3681,0.334946,0.86644
3,0.3466,0.33293,0.86772


## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [14]:
#load fine-tuned model
peft_model = AutoModelForSequenceClassification.from_pretrained('fine_tuned_model')
#update the trainer with the fine-tuned model
trainer = Trainer(
    model = peft_model,
    args = training_args,
    eval_dataset = test_dataset,
    compute_metrics = compute_metrics,
)

#final evaluation
tuned_results = trainer.evaluate()
print('Fine-tuned model evaluation results: ', tuned_results)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Fine-tuned model evaluation results:  {'eval_loss': 0.3343519866466522, 'eval_accuracy': 0.8662, 'eval_runtime': 448.5046, 'eval_samples_per_second': 55.741, 'eval_steps_per_second': 55.741}


In [16]:
print('Initail model accuracy: ', initial_results['eval_accuracy'])
print('Fine-tuned model accuracy: ', tuned_results['eval_accuracy'])

Initail model accuracy:  0.50092
Fine-tuned model accuracy:  0.8662
