# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: **LoRA**
* Model: **GPT-2**
* Evaluation approach: **The `evaluate` method with the Hugging Face `trainer`**
* Fine-tuning dataset: The **emotion** dataset from **DAIR-AI**

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
from transformers import AutoModelForSequenceClassification
from datasets import load_dataset
from transformers import AutoTokenizer, Trainer, TrainingArguments
import numpy as np

Load and tokenize ***emotion*** dataset

In [2]:
dataset = load_dataset("emotion")
splits = ["train", "validation", "test"]

tokenizer = AutoTokenizer.from_pretrained("gpt2", 
                                          num_labels=6,
                                          id2label={0: "sadness", 1: "joy", 2: "love", 3: "anger", 4: "fear", 5: "surprise"},
                                          label2id={"sadness": 0, "joy": 1, "love": 2, "anger": 3, "fear": 4, "surprise": 5}
                                         )

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    
tokenized_dataset = {}
for split in splits:
    tokenized_dataset[split] = dataset[split].map(lambda x: tokenizer(x["text"], padding="max_length", truncation=True), batched=True)

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

Load ***gpt2*** model

In [3]:
model = AutoModelForSequenceClassification.from_pretrained("gpt2", 
                                                           num_labels=6,
                                                           id2label={0: "sadness", 1: "joy", 2: "love", 3: "anger", 4: "fear", 5: "surprise"},
                                                           label2id={"sadness": 0, "joy": 1, "love": 2, "anger": 3, "fear": 4, "surprise": 5})

# set the pad token of the model's configuration
model.config.pad_token_id = model.config.eos_token_id

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Define ***Hugging Face trainer*** and evlauate the default model on ***test*** dataset

In [4]:
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./results_without_finetuning",
        evaluation_strategy="epoch",  # Evaluate at the end of each epoch
        per_device_eval_batch_size=16,
    ),
    eval_dataset=tokenized_dataset['test'],
    compute_metrics = compute_metrics
)

eval_results = trainer.evaluate()

# Print evaluation results
print(f"Evaluation results: {eval_results}")

Evaluation results: {'eval_loss': 2.8372974395751953, 'eval_accuracy': 0.0925, 'eval_runtime': 166.9142, 'eval_samples_per_second': 11.982, 'eval_steps_per_second': 0.749}


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [5]:
from peft import LoraConfig, get_peft_model

Load ***lora*** configuration and create ***PEFT*** model

In [6]:
config = LoraConfig(
        r=16,  # Rank of the low-rank adaptation
        lora_alpha=32,  # Scaling factor for the LoRA updates
        lora_dropout=0.1,  # Dropout rate for LoRA layers
        bias="none",  # Specify how to handle biases in the model
        task_type="SEQ_CLS"  # Task type for sequence classification
    )

lora_model = get_peft_model(model, config)



Train the ***PEFT*** model

In [9]:
trainer_lora = Trainer(
    model=lora_model,
    args=TrainingArguments(
        output_dir="./results_with_finetuning",
        learning_rate=2e-5,
        evaluation_strategy="epoch",  # Evaluate at the end of each epoch
        save_strategy="epoch",
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        num_train_epochs=1,
        weight_decay=0.01,
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_dataset['train'],
    eval_dataset=tokenized_dataset['validation'],
#     compute_metrics = compute_metrics
)

trainer_lora.train()

Epoch,Training Loss,Validation Loss
1,1.1758,1.113386


TrainOutput(global_step=4000, training_loss=1.4518275909423828, metrics={'train_runtime': 3445.3641, 'train_samples_per_second': 4.644, 'train_steps_per_second': 1.161, 'total_flos': 8420233052160000.0, 'train_loss': 1.4518275909423828, 'epoch': 1.0})

In [10]:
lora_model.save_pretrained("gpt2-lora")

trainer_lora.evaluate()

{'eval_loss': 1.1133863925933838,
 'eval_runtime': 174.7337,
 'eval_samples_per_second': 11.446,
 'eval_steps_per_second': 2.861,
 'epoch': 1.0}

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [12]:
from peft import AutoPeftModelForSequenceClassification

lora_model_eval = AutoPeftModelForSequenceClassification.from_pretrained("gpt2-lora", num_labels=6)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [14]:
# set the pad token of the model's configuration
lora_model_eval.config.pad_token_id = lora_model_eval.config.eos_token_id

trainer = Trainer(
    model=lora_model_eval,
    args=TrainingArguments(
        output_dir="./results_with_finetuning1",
        evaluation_strategy="epoch",  # Evaluate at the end of each epoch
        per_device_eval_batch_size=16,
    ),
    eval_dataset=tokenized_dataset['test'],
    compute_metrics = compute_metrics
)

eval_results = trainer.evaluate()

# Print evaluation results
print(f"Evaluation results: {eval_results}")

Evaluation results: {'eval_loss': 1.1009926795959473, 'eval_accuracy': 0.596, 'eval_runtime': 172.4179, 'eval_samples_per_second': 11.6, 'eval_steps_per_second': 0.725}


Accuracy on test dataset ***before*** fine tuning: ***0.0925*** \
Accuracy on test dataset ***after*** fine tuning: ***0.596***

After only 1 epoch of fine tuning, the evaluation accuracy on test dataset has been improved from ***0.0925*** to ***0.596***