<a href="https://colab.research.google.com/github/yilenpan/quantized_lora/blob/main/Udacity_PEFT_training_GPT2_for_sequence_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Import the necessary packages

In [None]:
!pip install transformers "datasets==2.15.0" torch peft numpy bitsandbytes accelerate

Loading DAIR-AI/emotion

In [None]:
from datasets import load_dataset


# The dair-ai/emotion dataset has three splits
splits = ["train", "test", "validation"]

data_splits = load_dataset("dair-ai/emotion")


Split and format data. The inputs for the model have to have certain keywords removed. Only labels, input_ids, and attention_mask can be used. You must also make sure the inputs are pytorch tensors, and flat.

In [None]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

def preprocess_function(x):
    return tokenizer(x['text'], truncation=True, padding=True)


def format_dataset(ds):
    ds = ds.remove_columns('text')
    ds = ds.rename_column('label', 'labels')
    ds.set_format('torch', columns=['labels', 'input_ids', 'attention_mask'])
    return ds

test_split = data_splits["test"].map(preprocess_function)
train_split = data_splits["train"].map(preprocess_function)
val_split = data_splits["validation"].map(preprocess_function)

test_split = format_dataset(test_split)
train_split = format_dataset(train_split)
val_split = format_dataset(val_split)


print(test_split)
print(val_split)

print(train_split)

Use Bytes and Bytes to quantize the model. This converts the data type for the weights from a 32bit number to a 4 bit number. Then load the model with the quantization config. This runs the model at lower precision.

Because we are doing a classification task, use the AutoModelForSeqenceClassification package. Add a layer to the model that only has 6 outputs. Finally, freeze the model. We only want to train the lora.

In [59]:

from peft import prepare_model_for_kbit_training
from transformers import AutoModelForSequenceClassification, BitsAndBytesConfig
import torch


config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)


label2id={'sadness': 0, 'joy': 1, 'love': 2, 'anger': 3, 'fear': 4, 'surprise': 5}
id2label={0: 'sadness', 1: 'joy', 2: 'love', 3: 'anger', 4: 'fear', 5: 'surprise'}
model = AutoModelForSequenceClassification.from_pretrained(
    "gpt2",
    num_labels=6,
    label2id=label2id,
    id2label=id2label,
    pad_token_id=tokenizer.eos_token_id,
    quantization_config=config,
)

model = prepare_model_for_kbit_training(model)


def get_label(output):
    logits = outputs.logits
    logits = logits.float()
    probabilities = torch.softmax(logits, dim=-1)
    return torch.argmax(probabilities, dim=-1).item()

# Testing out the model no training
test_row = test_split[0]
outputs = model(test_row['input_ids'].view(1, -1))
predicted_label = get_label(outputs)
print(f"(predicted label: {predicted_label}, actual label: {test_row['labels']})")


# Freeze all layers
for param in model.parameters():
    param.requires_grad = False


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


(predicted label: 5, actual label: 0)


Create a Lora config, and attach it to the layers in then Attention mechanism. Loras are matricies that are multiplied by the weights in these layers.

In [None]:
from peft import LoraConfig, get_peft_model, TaskType

config = LoraConfig(
    r=16, # Rank
    lora_alpha=8,
    bias="lora_only",
    target_modules=['c_attn', 'c_proj'], #Assign correct layers (see model)
    lora_dropout=0.01,
    task_type=TaskType.SEQ_CLS # WTF NOT IN THE DOCS!!!!!!
)

lora_model = get_peft_model(model, config)

print(lora_model.print_trainable_parameters())

Create the trainer class

In [36]:
from transformers import TrainingArguments, Trainer, DataCollatorWithPadding
import numpy as np


def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

training_args = TrainingArguments(
    output_dir="./data/gpt2-lora",
    overwrite_output_dir=True,
    evaluation_strategy="epoch",
    num_train_epochs=15,
    per_device_train_batch_size=50, # Kick this up if we can
    per_device_eval_batch_size=50, # Kick this up if we can
    learning_rate=2e-5,
    save_strategy="epoch",
)

trainer = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=train_split,
    eval_dataset=test_split,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

print(test_split)
print(test_split[0])

Dataset({
    features: ['labels', 'input_ids', 'attention_mask'],
    num_rows: 2000
})
{'labels': tensor(0), 'input_ids': tensor([  320,  4203,  2138, 36371,   523,   545,   407,   845, 14742,   826,
          783]), 'attention_mask': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])}


Before training, see how well the model does. It does pretty poorly, starting at 11%

In [33]:
trainer.evaluate()

{'eval_loss': 6.337622165679932,
 'eval_accuracy': 0.114,
 'eval_runtime': 6.7019,
 'eval_samples_per_second': 298.421,
 'eval_steps_per_second': 2.387}

In [38]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.617741,0.341
2,2.548000,1.507627,0.4185
3,2.548000,1.302602,0.5215
4,1.449800,1.11849,0.584
5,1.163700,0.984033,0.6465
6,1.163700,0.869175,0.7035
7,0.988700,0.738187,0.75
8,0.822200,0.658167,0.781
9,0.822200,0.615828,0.7895
10,0.733600,0.577483,0.795


TrainOutput(global_step=4800, training_loss=1.044102252324422, metrics={'train_runtime': 2053.3229, 'train_samples_per_second': 116.884, 'train_steps_per_second': 2.338, 'total_flos': 6576232119091200.0, 'train_loss': 1.044102252324422, 'epoch': 15.0})

In [52]:
lora_model.save_pretrained("mygpt2lora")

In [47]:
# Load the old model and evaluated performance
trainer.model = model
trainer.evaluate()

{'eval_loss': 6.868651390075684,
 'eval_accuracy': 0.1125,
 'eval_runtime': 6.3293,
 'eval_samples_per_second': 315.991,
 'eval_steps_per_second': 6.32,
 'epoch': 15.0}

evaluate the lora - 82%!

In [54]:
from peft import AutoPeftModelForSequenceClassification

reloaded_model = AutoPeftModelForSequenceClassification.from_pretrained(
    "mygpt2lora",
    num_labels=6,
    label2id=label2id,
    id2label=id2label,
    pad_token_id=tokenizer.eos_token_id,
    quantization_config=config
)

trainer.model = reloaded_model
trainer.evaluate()

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


{'eval_loss': 0.52783203125,
 'eval_accuracy': 0.816,
 'eval_runtime': 10.3457,
 'eval_samples_per_second': 193.316,
 'eval_steps_per_second': 3.866,
 'epoch': 15.0}

In [58]:
outputs = reloaded_model(test_row['input_ids'].view(1, -1))
predicted_label = get_label(outputs)
print(f"(predicted label: {predicted_label}, actual label: {test_row['labels']})")

(predicted label: 0, actual label: 0)
