# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: LoRA
* Model: GPT-2
* Evaluation approach:  Hugging Face Trainer
* Fine-tuning dataset: yelp_review_full

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [2]:
!pip install peft
!pip install pandas
!pip install numpy
!pip install scikit-learn
!pip install tqdm

Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/


In [9]:
import pandas as pd
import numpy as np
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TrainingArguments, Trainer, EvalPrediction, DataCollatorWithPadding,GPT2LMHeadModel
from datasets import load_dataset, Dataset
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
from peft import LoraConfig, PeftModelForSequenceClassification, TaskType, AutoPeftModelForSequenceClassification
import torch
import tqdm

In [10]:
print(torch.cuda.is_available())

True


In [11]:
class YelpDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length=128):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = self.texts[idx]
        label = self.labels[idx]
        encoding = self.tokenizer(
            text, 
            padding='max_length', 
            truncation=True, 
            max_length=self.max_length, 
            return_tensors='pt'
        )
        encoding['labels'] = torch.tensor(label, dtype=torch.long)
        return encoding

In [12]:
dataset = load_dataset("yelp_review_full")

In [13]:
tokenizer = AutoTokenizer.from_pretrained('gpt2')
tokenizer.pad_token = tokenizer.eos_token



In [26]:
def preprocess_function(examples):
    return tokenizer(examples['text'], padding="max_length", truncation=True, max_length=256)

tokenized_ds = {}
splits = ["train", "test"]      

for split in splits:
    tokenized_ds[split] = dataset[split].map(preprocess_function, batched=True)

Map:   0%|          | 0/650000 [00:00<?, ? examples/s]

Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

In [27]:
model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels=len(set(tokenized_ds['train']['label'])))
model.config.pad_token_id = tokenizer.pad_token_id

# Unfreeze all the model parameters.
for param in model.parameters():
    param.requires_grad = True
print(model)

loading configuration file config.json from cache at C:\Users\yuzha/.cache\huggingface\hub\models--gpt2\snapshots\607a30d783dfa663caf39e06633721c8d4cfcd7e\config.json
Model config GPT2Config {
  "_name_or_path": "gpt2",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4"
  },
  "initializer_range": 0.02,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4
  },
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_d

GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=5, bias=False)
)


In [16]:
import random
def get_random_subset(dataset, fraction=0.1):
    total_size = len(dataset)
    subset_size = int(total_size * fraction)
    indices = random.sample(range(total_size), subset_size)
    return dataset.select(indices)

In [6]:
from typing import Dict
def compute_metrics(p: EvalPrediction) -> Dict[str, float]:
    preds = np.argmax(p.predictions, axis=1)
    precision, recall, f1, _ = precision_recall_fscore_support(p.label_ids, preds, average='weighted')
    accuracy = accuracy_score(p.label_ids, preds)
    return {
        "accuracy": accuracy,
        "f1": f1,
        "precision": precision,
        "recall": recall
    }

In [28]:
training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=3e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    logging_dir='./logs',
    save_strategy="epoch",
    evaluation_strategy="epoch",

    num_train_epochs=3,
    weight_decay=0.01,
    load_best_model_at_end=True,
    logging_steps=10,
    warmup_ratio=0.1,
    fp16=True,
    dataloader_num_workers=16,

)

# Initialize the Trainer with compute_metrics
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=get_random_subset(tokenized_ds['train']),
    eval_dataset=get_random_subset(tokenized_ds['test']),
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
)


PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
Using cuda_amp half precision backend


In [29]:
trainer.train()

The following columns in the training set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 65000
  Num Epochs = 3
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 12189
  Number of trainable parameters = 124443648


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.7693,0.848414,0.6328,0.629972,0.630031,0.6328
2,0.7036,0.805206,0.6466,0.646214,0.648136,0.6466
3,0.5874,0.82729,0.652,0.651874,0.652019,0.652


The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 5000
  Batch size = 16
Saving model checkpoint to ./results\checkpoint-4063
Configuration saved in ./results\checkpoint-4063\config.json
Model weights saved in ./results\checkpoint-4063\pytorch_model.bin
tokenizer config file saved in ./results\checkpoint-4063\tokenizer_config.json
Special tokens file saved in ./results\checkpoint-4063\special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 5000
  Batch size = 16
Sav

TrainOutput(global_step=12189, training_loss=0.8468306853466268, metrics={'train_runtime': 2082.4964, 'train_samples_per_second': 93.638, 'train_steps_per_second': 5.853, 'total_flos': 2.54771232768e+16, 'train_loss': 0.8468306853466268, 'epoch': 3.0})

In [30]:
# Evaluate
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)

The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 5000
  Batch size = 16


Evaluation Results: {'eval_loss': 0.8052063584327698, 'eval_accuracy': 0.6466, 'eval_f1': 0.6462138905013258, 'eval_precision': 0.6481364077177657, 'eval_recall': 0.6466, 'eval_runtime': 75.8732, 'eval_samples_per_second': 65.899, 'eval_steps_per_second': 4.125, 'epoch': 3.0}


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [23]:
peft_config = LoraConfig(
    task_type = TaskType.SEQ_CLS,
    inference_mode=False,
    lora_alpha=6,
    lora_dropout=0.1
)

# load origin pre trained gpt2 model
model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels=len(set(tokenized_ds['train']['label'])))
model.config.pad_token_id = tokenizer.pad_token_id

lora_model = PeftModelForSequenceClassification(model, peft_config)

lora_model.print_trainable_parameters()


loading configuration file config.json from cache at C:\Users\yuzha/.cache\huggingface\hub\models--gpt2\snapshots\607a30d783dfa663caf39e06633721c8d4cfcd7e\config.json
Model config GPT2Config {
  "_name_or_path": "gpt2",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4"
  },
  "initializer_range": 0.02,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4
  },
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_d

trainable params: 298,752 || all params: 124,742,400 || trainable%: 0.2395




In [31]:
training_args = TrainingArguments(
    output_dir="./results/lora_models",
    learning_rate=3e-5,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    logging_dir='./logs',
    save_strategy="epoch",
    evaluation_strategy="epoch",

    num_train_epochs=10,
    weight_decay=0.01,
    load_best_model_at_end=True,
    logging_steps=10,
    warmup_ratio=0.1,
    fp16=True,
    dataloader_num_workers=16,

)
trainer = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=get_random_subset(tokenized_ds['train']),
    eval_dataset=get_random_subset(tokenized_ds['test']),
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
)

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
Using cuda_amp half precision backend


In [32]:
trainer.train()

The following columns in the training set don't have a corresponding argument in `PeftModelForSequenceClassification.forward` and have been ignored: text. If text are not expected by `PeftModelForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 65000
  Num Epochs = 10
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 20320
  Number of trainable parameters = 298752


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.8828,0.898609,0.6052,0.605636,0.606753,0.6052
2,0.8855,0.89069,0.6026,0.60028,0.599953,0.6026
3,0.8669,0.887805,0.6024,0.600149,0.604298,0.6024
4,0.907,0.885193,0.6102,0.610357,0.612216,0.6102
5,0.9685,0.881323,0.6078,0.604793,0.605248,0.6078
6,0.9293,0.87799,0.61,0.608483,0.611392,0.61
7,0.9298,0.877892,0.6114,0.610275,0.612028,0.6114
8,0.8641,0.876288,0.6122,0.6104,0.611524,0.6122
9,0.8573,0.873643,0.6128,0.610637,0.610313,0.6128
10,0.851,0.873322,0.6132,0.612271,0.612691,0.6132


The following columns in the evaluation set don't have a corresponding argument in `PeftModelForSequenceClassification.forward` and have been ignored: text. If text are not expected by `PeftModelForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 5000
  Batch size = 32
Saving model checkpoint to ./results/lora_models\checkpoint-2032
Trainer.model is not a `PreTrainedModel`, only saving its state dict.
tokenizer config file saved in ./results/lora_models\checkpoint-2032\tokenizer_config.json
Special tokens file saved in ./results/lora_models\checkpoint-2032\special_tokens_map.json
The following columns in the evaluation set don't have a corresponding argument in `PeftModelForSequenceClassification.forward` and have been ignored: text. If text are not expected by `PeftModelForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 5000
  Batch size = 32
Savin

TrainOutput(global_step=20320, training_loss=0.893572951184483, metrics={'train_runtime': 5826.1637, 'train_samples_per_second': 111.566, 'train_steps_per_second': 3.488, 'total_flos': 8.52220182528e+16, 'train_loss': 0.893572951184483, 'epoch': 10.0})

In [33]:
# Evaluate
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)

The following columns in the evaluation set don't have a corresponding argument in `PeftModelForSequenceClassification.forward` and have been ignored: text. If text are not expected by `PeftModelForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 5000
  Batch size = 32


Evaluation Results: {'eval_loss': 0.8733215928077698, 'eval_accuracy': 0.6132, 'eval_f1': 0.6122709780198415, 'eval_precision': 0.6126905632497591, 'eval_recall': 0.6132, 'eval_runtime': 79.8442, 'eval_samples_per_second': 62.622, 'eval_steps_per_second': 1.966, 'epoch': 10.0}


In [34]:
lora_model.save_pretrained('model/lora_model')

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [36]:
dataset['test']

Dataset({
    features: ['label', 'text'],
    num_rows: 50000
})

In [42]:
inference_model = AutoPeftModelForSequenceClassification.from_pretrained('model/lora_model', num_labels=len(set(tokenized_ds['train']['label'])))
inference_model.config.pad_token_id = inference_model.config.eos_token_id

trainer = Trainer(
    model = inference_model,
    args = training_args,
    eval_dataset = tokenized_ds['test'],
    compute_metrics=compute_metrics,
    tokenizer= tokenizer,
    data_collator = DataCollatorWithPadding(tokenizer=tokenizer),
)

loading configuration file config.json from cache at C:\Users\yuzha/.cache\huggingface\hub\models--gpt2\snapshots\607a30d783dfa663caf39e06633721c8d4cfcd7e\config.json
Model config GPT2Config {
  "_name_or_path": "gpt2",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4"
  },
  "initializer_range": 0.02,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4
  },
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_d

In [43]:
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)


The following columns in the evaluation set don't have a corresponding argument in `PeftModelForSequenceClassification.forward` and have been ignored: text. If text are not expected by `PeftModelForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 50000
  Batch size = 32


Evaluation Results: {'eval_loss': 0.8746678233146667, 'eval_accuracy': 0.61432, 'eval_f1': 0.6130612331565982, 'eval_precision': 0.6123497366027331, 'eval_recall': 0.61432, 'eval_runtime': 198.6707, 'eval_samples_per_second': 251.673, 'eval_steps_per_second': 7.867}


In [72]:
id2comment_dict = {
    0: "0-1",
    1: "1-2",
    2: "2-3",
    3: "3-4",
    4: "4-5"
    }

    

In [78]:
def predict(sentence: str) -> str:
    device = inference_model.device
    
    input = tokenizer(sentence, return_tensors='pt').to(device)
    
    output = inference_model(**input)
    logits = output.logits
    probabilities = torch.softmax(logits, dim=1)
    predicted_id = probabilities.argmax().item()
    predicted_label = id2comment_dict[predicted_id]

    return predicted_label
    #return predicted_id
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
inference_model.to(device)
    

PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): GPT2ForSequenceClassification(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 768)
        (wpe): Embedding(1024, 768)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-11): 12 x GPT2Block(
            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2Attention(
              (c_attn): lora.Linear(
                (base_layer): Conv1D()
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=2304, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict(

In [74]:
sentence = "It's the worst meal I have ever had"
predicted_label = predict(sentence)
print(f"Sentence: '{sentence}'\nPredicted label: {predicted_label}")

Sentence: 'It's the worst meal I have ever had'
Predicted label: 0-1


In [75]:
sentence = "good service and nice view"
predicted_label = predict(sentence)
print(f"Sentence: '{sentence}'\nPredicted label: {predicted_label}")

Sentence: 'good service and nice view'
Predicted label: 3-4


In [106]:
test_index = range(10,20)
for i in test_index:
    selected_sentence = dataset['test'][i]['text']
    selected_label = dataset['test'][i]['label']
    selected_tokenized_sentence = tokenizer(selected_sentence, return_tensors = 'pt').to(inference_model.device)
    
    with torch.no_grad():
        logits = inference_model(**selected_tokenized_sentence).logits
        
    predict_label = torch.argmax(logits, dim=1).item()
        
        
    
    print(f"Yelp Comments: {selected_sentence}\n")
    print(f"Score: {selected_label}\n")
    print(f"Prediction: {predict_label}\n\n")

Yelp Comments: Think Chuck E. Cheese for adults.  Skee Ball, video games pool tables.  Clean environment.  Good fun.\n\nUnfortunately, I went for a bite to eat and it was impossible to find anything good and healthy on the menu.  I ended up settling for spinach dip.  Sadly, they topped the dip off with horrible orange shredded cheese that appeared to have been popped in the microwave for a few seconds.  Blahhhh.  Trying to get something healthy, I ordered the apple pecan salad.  I swear the dressing came right out of the grocery store bottle.  I could barely eat the salad.  Too sweet.\n\nMy mom ordered a steak roll.....holy friedness!  The steak was more like hamburger fried with cheese and then stuffed into breading that was fried AGAIN!  Yowzer!  Artery clogger for sure.  \n\nI like the atmosphere.  I like the bar area.  Perhaps next time we'll just stop by for drinks instead.

Score: 1

Prediction: 1


Yelp Comments: We came here on a Saturday night and luckily it wasn't as packed a

## Comapre two Model with metrics

| Metric                | GPT-2 (Original) | GPT-2 PEFT (LoRA) |
|-----------------------|------------------|-------------------|
| **Eval Loss**         | 0.8052           | 0.8747            |
| **Eval Accuracy**     | 64.66%           | 61.432%           |
| **Eval F1 Score**     | 64.6214%         | 61.3061%          |
| **Eval Precision**    | 64.8136%         | 61.2349%          |
| **Eval Recall**       | 64.66%           | 61.432%           |
| **Eval Runtime (s)**  | 75.8732          | 198.6707          |
| **Samples per Second**| 65.899           | 251.673           |
| **Steps per Second**  | 4.125            | 7.867             |

## Observation
1. Performance: GPT-2 (Original) shows better loss, accuracy, F1 score, precision, and recall compared to GPT-2 PEFT (LoRA).
2. GPT-2 PEFT (LoRA) processes samples and steps more quickly than the original GPT-2. This suggests that while PEFT (LoRA) takes longer overall, it handles more data per unit time.