# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: For PEFT i am using LORA. I used for a simple reason that it has lower trainable parameters compared to other fine tuning techinques while maintaing performance
* Model: I used bert-base-uncased model. No specific reason but i wanted to use a relatively large model so that peft technique is required as full fine tuning would be very expensive and bert-base-uncased is popular for text classification
* Evaluation approach: For evaluation i am testing the mean accuracy on the evaluation dataset
* Fine-tuning dataset: the dataset is sms_spam dataset. It seemed perfect for classification tasks as the text(sms) is of relatively smaller sizes than any other datasets like reviews or tweets so i decided to use this.

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
!pip install datasets
!pip install transformers
!pip install numpy
!pip install peft
!pip install torch
!pip install scikit-learn

Collecting datasets
  Downloading datasets-3.0.1-py3-none-any.whl.metadata (20 kB)
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Downloading datasets-3.0.1-py3-none-any.whl (471 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m471.6/471.6 kB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m11.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl (39.9 MB)
[2K

In [2]:

from datasets import load_dataset
from transformers import AutoTokenizer,AutoModelForSequenceClassification,Trainer, TrainingArguments
import numpy as np
from peft import LoraConfig,get_peft_model,AutoPeftModelForSequenceClassification,PeftModel
import torch
from sklearn.metrics import accuracy_score

In [3]:
dataset = load_dataset('sms_spam',split = "train[:50%]").train_test_split(test_size = 0.2,shuffle = True)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/4.98k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/359k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/5574 [00:00<?, ? examples/s]

In [4]:
dataset = dataset.rename_column("label", "labels")

In [28]:
def tokenize_data(dataset,tokenizer):
  tokenized_dataset = {}
  for i in dataset:
    tokenized_dataset[i] = dataset[i].map(lambda x: tokenizer(x['sms'],truncation = True,padding = 'max_length',return_tensors = 'pt'),batched = True)
  return tokenized_dataset


In [11]:
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

tokenized_dataset = {}
tokenized_dataset = tokenize_data(dataset,tokenizer)
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased",num_labels = 2,
    id2label={0: "neg", 1: "pos"},
    label2id={"neg": 0, "pos": 1})
for param in model.base_model.parameters():
    param.requires_grad = False



Map:   0%|          | 0/558 [00:00<?, ? examples/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [12]:
def compute_metrics(eval_pred):
       logits, labels = eval_pred
       predictions = np.argmax(logits, axis=-1)
       accuracy = (predictions == labels).astype(np.float32).mean().item()
       return {'eval_accuracy': accuracy}

In [13]:
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    eval_steps=500,
    save_strategy="epoch",
    logging_dir="./logs",
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    metric_for_best_model="eval_accuracy",
    num_train_epochs=3,
    learning_rate=0.002,
    weight_decay=0.01,
    load_best_model_at_end=True
)



In [14]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset['train'],
    eval_dataset=tokenized_dataset['test'],
    compute_metrics=compute_metrics,
)

eval_results = trainer.evaluate()
print(f"Pre-fine-tuning evaluation results: {eval_results}")

Pre-fine-tuning evaluation results: {'eval_accuracy': 0.13799282908439636, 'eval_loss': 0.8668806552886963, 'eval_model_preparation_time': 0.0015, 'eval_runtime': 10.3306, 'eval_samples_per_second': 54.014, 'eval_steps_per_second': 3.388}


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [15]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

lora_config = LoraConfig(r = 16,lora_alpha=32,lora_dropout =  0.1,target_modules = ['q_lin','out_lin'],task_type='SEQ_CLS')
peft_model = get_peft_model(model, lora_config)

peft_model.to(device)

PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): DistilBertForSequenceClassification(
      (distilbert): DistilBertModel(
        (embeddings): Embeddings(
          (word_embeddings): Embedding(30522, 768, padding_idx=0)
          (position_embeddings): Embedding(512, 768)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (transformer): Transformer(
          (layer): ModuleList(
            (0-5): 6 x TransformerBlock(
              (attention): MultiHeadSelfAttention(
                (dropout): Dropout(p=0.1, inplace=False)
                (q_lin): lora.Linear(
                  (base_layer): Linear(in_features=768, out_features=768, bias=True)
                  (lora_dropout): ModuleDict(
                    (default): Dropout(p=0.1, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (default): Linear(in_features=768

In [16]:
peft_model.print_trainable_parameters()

trainable params: 887,042 || all params: 67,842,052 || trainable%: 1.3075


In [17]:
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    eval_steps=500,
    save_strategy="epoch",
    logging_dir="./logs",
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    metric_for_best_model="eval_accuracy",
    num_train_epochs=3,
    learning_rate=0.002,
    weight_decay=0.01,
    load_best_model_at_end=True
)




In [18]:
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=tokenized_dataset['train'],
    eval_dataset=tokenized_dataset['test'],
    compute_metrics=compute_metrics,
)

In [19]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.088972,0.976703
2,No log,0.090833,0.982079
3,No log,0.138513,0.982079


TrainOutput(global_step=420, training_loss=0.041058258783249625, metrics={'train_runtime': 268.718, 'train_samples_per_second': 24.885, 'train_steps_per_second': 1.563, 'total_flos': 904031523164160.0, 'train_loss': 0.041058258783249625, 'epoch': 3.0})

In [20]:
peft_model.save_pretrained("bert-lora")

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [29]:
peft_model = AutoPeftModelForSequenceClassification.from_pretrained("bert-lora")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
dataset = load_dataset('sms_spam',split = "train[:50%]").train_test_split(test_size = 0.2,shuffle = True)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [30]:
def compute_metrics(eval_pred):
       logits, labels = eval_pred
       predictions = np.argmax(logits, axis=-1)
       accuracy = (predictions == labels).astype(np.float32).mean().item()
       return {'eval_accuracy': accuracy}
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    eval_steps=500,
    save_strategy="epoch",
    logging_dir="./logs",
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    metric_for_best_model="eval_accuracy",
    num_train_epochs=3,
    learning_rate=0.002,
    weight_decay=0.01,
    load_best_model_at_end=True
)



In [31]:
tokenized_dataset = tokenize_data(dataset,tokenizer)
print (tokenized_dataset)

Map:   0%|          | 0/2229 [00:00<?, ? examples/s]

Map:   0%|          | 0/558 [00:00<?, ? examples/s]

{'train': Dataset({
    features: ['sms', 'label', 'input_ids', 'attention_mask'],
    num_rows: 2229
}), 'test': Dataset({
    features: ['sms', 'label', 'input_ids', 'attention_mask'],
    num_rows: 558
})}


In [32]:
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=tokenized_dataset['test'],
    eval_dataset=tokenized_dataset['test'],
    compute_metrics=compute_metrics,
)

eval_results = trainer.evaluate()
print(f"Pre-fine-tuning evaluation results: {eval_results}")

Pre-fine-tuning evaluation results: {'eval_accuracy': 0.9964157938957214, 'eval_loss': 0.016037125140428543, 'eval_model_preparation_time': 0.0049, 'eval_runtime': 10.4001, 'eval_samples_per_second': 53.654, 'eval_steps_per_second': 3.365}


The accuracy of the model after fine tuning commes out to be **99.64%** whereas the accuracy before fine tuning was **13.79%**. We can see the difference in accuracy is quite massive.