# Lightweight Fine-Tuning Project


In the project, a PEFT method called LoRA (Low Rank Adapation) has been used. Furthermore, the foundation model GPT-2 has been chosen. The evaluation has been done by means of Trainers in Hugging Face. The dataset is a movie review for text classification called *rotten tomatoes* provided by Hugging Face. This is a dataset containing 5,331 positive and 5,331 negative processed sentences (881 kB in size). A subset of the dataset has been used to reduce the computational resources further.  

[See Hugging Face documentation for LoRA](https://huggingface.co/docs/peft/package_reference/lora)

[See Hugging Face documentation for GPT-2](https://huggingface.co/openai-community/gpt2)

[See documentation for dataset](https://huggingface.co/datasets/rotten_tomatoes)


## Loading and Evaluating a Foundation Model

In the cells below, the pre-trained Hugging Face model has been loaded and its performance has been evaluated prior to fine-tuning. These steps include loading an appropriate tokenizer and dataset.

In [19]:
# Load datasets
! pip install -q "datasets==2.15.0"

In [20]:
# Import the datasets and transformers packages

from datasets import load_dataset

# Load the train and test splits of the imdb dataset
splits = ["train", "test"]
ds = {split: ds for split, ds in zip(splits, load_dataset("rotten_tomatoes", split=splits))}

# Thin out the dataset to make it run faster for this example
for split in splits:
    ds[split] = ds[split].shuffle(seed=42).select(range(500))

# Show the dataset
ds

{'train': Dataset({
     features: ['text', 'label'],
     num_rows: 500
 }),
 'test': Dataset({
     features: ['text', 'label'],
     num_rows: 500
 })}

In [21]:
# Preprocess datasets
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("gpt2")

tokenizer.pad_token = tokenizer.eos_token  # Added: Select a token to use as `pad_token` 

def preprocess_function(examples):
    """Preprocess the imdb dataset by returning tokenized examples."""
    return tokenizer(examples["text"], padding="max_length", truncation=True)  

tokenized_ds = {}
for split in splits:
    tokenized_ds[split] = ds[split].map(preprocess_function, batched=True)

# Show the first example of the tokenized training set
print(tokenized_ds["train"][0]["input_ids"])
print(ds["train"][0])

[13, 764, 764, 5341, 588, 8276, 4328, 3711, 4738, 7188, 286, 257, 442, 2442, 3881, 8027, 656, 644, 318, 4306, 257, 537, 14234, 12, 81, 34897, 475, 2116, 12, 34009, 13997, 32251, 764, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256

In [22]:
# Loading the model
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(
    "gpt2",
    num_labels=2,
    id2label={0: "NEGATIVE", 1: "POSITIVE"},  # For converting predictions to strings
    label2id={"NEGATIVE": 0, "POSITIVE": 1},
)

# Freeze all the parameters of the base model
# Hint: Check the documentation at https://huggingface.co/transformers/v4.2.2/training.html
for param in model.base_model.parameters():
    param.requires_grad = False

# model.classifier
model.config.pad_token_id = model.config.eos_token_id

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [23]:
print(model)

GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=2, bias=False)
)


In [24]:
# Evaluation approach used for sequence (text) classification task is Hugging Face Trainers
# Since param.requires_grad = False, the model parameters remain frozen!

import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments


def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}


# The HuggingFace Trainer class handles the training and eval loop for PyTorch for us.
# Read more about it here https://huggingface.co/docs/transformers/main_classes/trainer
trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./data/seq_classification",
        learning_rate=1e-4,
        # Reduce the batch size if you don't have enough memory
        per_device_train_batch_size=4,  
        per_device_eval_batch_size=4,   
        num_train_epochs=1,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)


In [25]:
# Start evaluation
trainer.evaluate()

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


{'eval_loss': 0.7881485819816589,
 'eval_accuracy': 0.508,
 'eval_runtime': 46.8818,
 'eval_samples_per_second': 10.665,
 'eval_steps_per_second': 2.666}

In [26]:
# View the results
import pandas as pd

df = pd.DataFrame(tokenized_ds["test"])
df = df[["text", "label"]]

# Replace <br /> tags in the text with spaces
df["text"] = df["text"].str.replace("<br />", " ")

# Add the model predictions to the dataframe
predictions = trainer.predict(tokenized_ds["test"])
df["predicted_label"] = np.argmax(predictions[0], axis=1)

df.head(5)

Unnamed: 0,text,label,predicted_label
0,"unpretentious , charming , quirky , original",1,0
1,a film really has to be exceptional to justify...,0,0
2,working from a surprisingly sensitive script c...,1,1
3,"it may not be particularly innovative , but th...",1,0
4,such a premise is ripe for all manner of lunac...,0,1


## Performing Parameter-Efficient Fine-Tuning

In the cells below, a PEFT model has been created from the previous model.  Then a training loop has been executed, and the PEFT model weights have been saved.

In [27]:
# Import statements
from peft import LoraConfig, get_peft_model, TaskType

In [28]:
# Create a PEFT config with appropriate hyperparameters for your chosen model.
config = LoraConfig(
          task_type=TaskType.SEQ_CLS,
          r=8, 
          lora_alpha=32,
          target_modules=['c_attn', 'c_proj'],
          lora_dropout=0.1,
          fan_in_fan_out=True)

In [29]:
# Using the PEFT config and foundation model, create a PEFT model.
lora_model = get_peft_model(model, config)

In [30]:
# Necessary to set the model's pad_token_id = eos_token_id
lora_model.config.pad_token_id = lora_model.config.eos_token_id

In [31]:
print(lora_model)

PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): GPT2ForSequenceClassification(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 768)
        (wpe): Embedding(1024, 768)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-11): 12 x GPT2Block(
            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2Attention(
              (c_attn): Linear(
                in_features=768, out_features=2304, bias=True
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=2304, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_

In [32]:
# LoRA sets the target column to 'label'. It should be changed to 'labels'
train_lora = tokenized_ds['train'].rename_column('label', 'labels')
test_lora = tokenized_ds['test'].rename_column('label', 'labels')

In [33]:
# Define the lora_training_args
lora_training_args = TrainingArguments(
    output_dir="./data/seq_classification",
    learning_rate=1e-4,
    # Reduce the batch size if you don't have enough memory
    per_device_train_batch_size=4,  
    per_device_eval_batch_size=4,  
    num_train_epochs=10,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)

In [34]:
# Using the PEFT model and dataset, run a training loop
lora_trainer= Trainer(
  model=lora_model,
  args=lora_training_args,
  train_dataset=train_lora,
  eval_dataset=test_lora,
  tokenizer=tokenizer,
  data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
  compute_metrics=compute_metrics
  )

In [35]:
# Start the training
lora_trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.70304,0.55
2,No log,0.598228,0.71
3,No log,0.505585,0.782
4,0.617300,0.640692,0.768
5,0.617300,0.604739,0.822
6,0.617300,0.649412,0.814
7,0.617300,0.796417,0.804
8,0.414000,0.740578,0.81
9,0.414000,0.765757,0.816
10,0.414000,0.795242,0.812


TrainOutput(global_step=1250, training_loss=0.4725821044921875, metrics={'train_runtime': 1637.8428, 'train_samples_per_second': 3.053, 'train_steps_per_second': 0.763, 'total_flos': 2637928857600000.0, 'train_loss': 0.4725821044921875, 'epoch': 10.0})

In [36]:
# Print trainable parameters
lora_model.print_trainable_parameters()

trainable params: 814,080 || all params: 125,253,888 || trainable%: 0.6499438963523432


In [37]:
# Saving the trained model
lora_model.save_pretrained("gpt-lora")

## Performing Inference with a PEFT Model

In the cells below, the saved PEFT model weights have been loaded, and the performance of the trained PEFT model has been evaluated and compared to the results from prior to fine-tuning.

In [38]:
# Using the appropriate PEFT model class, load your trained model
from peft import AutoPeftModelForSequenceClassification

model = AutoPeftModelForSequenceClassification.from_pretrained(
    "gpt-lora",
    num_labels=2,
    id2label={0: "NEGATIVE", 1: "POSITIVE"},  # For converting predictions to strings
    label2id={"NEGATIVE": 0, "POSITIVE": 1},
)

# Freeze all the parameters of the base model
# Hint: Check the documentation at https://huggingface.co/transformers/v4.2.2/training.html
for param in model.base_model.parameters():
    param.requires_grad = False

# model.classifier
model.config.pad_token_id = model.config.eos_token_id

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [39]:
print(model)

PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): GPT2ForSequenceClassification(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 768)
        (wpe): Embedding(1024, 768)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-11): 12 x GPT2Block(
            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2Attention(
              (c_attn): Linear(
                in_features=768, out_features=2304, bias=True
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=2304, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_

In [40]:
# Repeat the previous evaluation process, this time using the PEFT model
# Since param.requires_grad = False, the model parameters remain frozen!
# Compare the results to the results from the original foundation model
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments


def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}


# The HuggingFace Trainer class handles the training and eval loop for PyTorch for us.
# Read more about it here https://huggingface.co/docs/transformers/main_classes/trainer
trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./data/seq_classification",
        learning_rate=1e-4,
        # Reduce the batch size if you don't have enough memory
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        num_train_epochs=1,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)


In [41]:
# Start evaluation
trainer.evaluate()

{'eval_loss': 0.5055848956108093,
 'eval_accuracy': 0.782,
 'eval_runtime': 49.7813,
 'eval_samples_per_second': 10.044,
 'eval_steps_per_second': 2.511}

The evaluation accuracy has increased from 50.8% to 78.2% after the model has been fine-tuned with LoRA

In [42]:
# View the results
import pandas as pd

df = pd.DataFrame(tokenized_ds["test"])
df = df[["text", "label"]]

# Replace <br /> tags in the text with spaces
df["text"] = df["text"].str.replace("<br />", " ")

# Add the model predictions to the dataframe
predictions = trainer.predict(tokenized_ds["test"])
df["predicted_label"] = np.argmax(predictions[0], axis=1)

df.head(5)

Unnamed: 0,text,label,predicted_label
0,"unpretentious , charming , quirky , original",1,1
1,a film really has to be exceptional to justify...,0,0
2,working from a surprisingly sensitive script c...,1,1
3,"it may not be particularly innovative , but th...",1,1
4,such a premise is ripe for all manner of lunac...,0,0
