##### Vanessa Trujillo 
##### Submission date 02/18/24
##### Udacity Generative AI Nanodegree project: Apply Lightweight Fine-Tuning to a Foundation Model

# Lightweight Fine-Tuning Project

* PEFT technique: The chosen PEFT (Parameter-Efficient Fine-Tuning) technique involves unfreezing all the model parameters after the initial training of the base model. This allows the model to adapt more closely to the task during fine-tuning. The base model is trained for one epoch with frozen parameters and then fine-tuned for an additional two epochs with unfrozen parameters during PEFT.
* Model: The base model used is distilbert-base-uncased for sequence classification. The same base model is utilized for both the initial training and the PEFT process.
* Evaluation approach: The evaluation is performed using the Trainer class from the Hugging Face transformers library. The evaluation strategy is set to "epoch," meaning evaluation is performed after each training epoch. The evaluation metrics include loss, accuracy, runtime, samples per second, steps per second, and epoch.
* Fine-tuning dataset: The fine-tuning dataset is based on the rotten_tomatoes dataset, with train and test splits.  To expedite the example, only a subset of 500 samples from each split is used. The dataset is pre-processed using the distilbert-base-uncased tokenizer.

## Loading and Evaluating a Foundation Model

In [1]:
import torch
from transformers import AutoTokenizer, DataCollatorWithPadding, TrainingArguments, Trainer, AutoModelForSequenceClassification
from datasets import load_dataset
import pandas as pd
import numpy as np

# Load the train and test splits of the rotten_tomatoes dataset, Attributed to the Udacity course code and Huggingface code snippets.
splits = ["train", "test"]
ds = {split: ds for split, ds in zip(splits, load_dataset("rotten_tomatoes", split=splits))}

# Thin out the dataset to make it run faster for this example
for split in splits:
    ds[split] = ds[split].shuffle(seed=42).select(range(500))

# Pre-process dataset
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

def preprocess_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_ds = {}
for split in splits:
    tokenized_ds[split] = ds[split].map(preprocess_function, batched=True)

# Load and set up the base model
base_model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=2,
    id2label={0: "NEGATIVE", 1: "POSITIVE"},
    label2id={"NEGATIVE": 0, "POSITIVE": 1},
)

# Freeze all the parameters of the base model
for param in base_model.base_model.parameters():
    param.requires_grad = False

    

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [3]:
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}
    
    
# Training the base model, Attributed to the Udacity course code.
trainer_base = Trainer(
    model=base_model,
    args=TrainingArguments(
        output_dir="./data/sentiment_analysis_base",
        learning_rate=2e-3,
        per_device_train_batch_size=6,
        per_device_eval_batch_size=6,
        num_train_epochs=2,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

# Training the base model
trainer_base.train()

# Evaluate the base model
base_model_evaluation = trainer_base.evaluate()



Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.573705,0.708
2,No log,0.54821,0.73


## Performing Parameter-Efficient Fine-Tuning

In [4]:
# Performing Parameter-Efficient Fine-Tuning (PEFT)

# Unfreeze all the model parameters.
for param in base_model.parameters():
    param.requires_grad = True

# Training the PEFT model
trainer_peft = Trainer(
    model=base_model,  # Use the base model with unfrozen parameters
    args=TrainingArguments(
        output_dir="./data/sentiment_analysis_peft", # Output directory for saving the PEFT model
        learning_rate=2e-5,
        per_device_train_batch_size=12,
        per_device_eval_batch_size=12,
        num_train_epochs=4,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer_peft.train()


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.505375,0.776
2,No log,0.810996,0.77
3,No log,0.897101,0.778
4,No log,0.97327,0.776


TrainOutput(global_step=168, training_loss=0.19035461970738002, metrics={'train_runtime': 788.9237, 'train_samples_per_second': 2.535, 'train_steps_per_second': 0.213, 'total_flos': 264934797312000.0, 'train_loss': 0.19035461970738002, 'epoch': 4.0})

## Performing Inference with a PEFT Model

In [5]:
# Evaluate the PEFT model
peft_model_evaluation = trainer_peft.evaluate()

# Compare the results of the base model and PEFT model
print("Base Model Evaluation:")
print(base_model_evaluation)

print("\nPEFT Model Evaluation:")
print(peft_model_evaluation)


Base Model Evaluation:
{'eval_loss': 0.548210084438324, 'eval_accuracy': 0.73, 'eval_runtime': 45.873, 'eval_samples_per_second': 10.9, 'eval_steps_per_second': 1.831, 'epoch': 2.0}

PEFT Model Evaluation:
{'eval_loss': 0.5053754448890686, 'eval_accuracy': 0.776, 'eval_runtime': 42.8268, 'eval_samples_per_second': 11.675, 'eval_steps_per_second': 0.981, 'epoch': 4.0}


## Results

#### Comparing the results of the base model and the PEFT model:

#### Base Model Evaluation:
{'eval_loss': 0.548210084438324, 'eval_accuracy': 0.73, 'eval_runtime': 45.873, 'eval_samples_per_second': 10.9, 'eval_steps_per_second': 1.831, 'epoch': 2.0}

#### PEFT Model Evaluation:
{'eval_loss': 0.5053754448890686, 'eval_accuracy': 0.776, 'eval_runtime': 42.8268, 'eval_samples_per_second': 11.675, 'eval_steps_per_second': 0.981, 'epoch': 4.0}

####  Conclusion

The evaluation results indicate improvements in the performance of the PEFT (Parameter-Efficient Fine-Tuned) model compared to the Base Model. Specifically, the PEFT Model achieved a lower evaluation loss (0.5054 vs. 0.5482) and a higher accuracy (0.776 vs. 0.73) over the test dataset. Additionally, the PEFT Model demonstrated slightly faster evaluation runtimes with higher samples and steps processed per second. These improvements suggest that the fine-tuning process led to enhancements in both loss reduction and predictive accuracy, showcasing the effectiveness of parameter-efficient fine-tuning in optimizing the model using the rotten_tomatoes dataset for sentiment analysis.
