# Lightweight Fine-Tuning Project

In this cell, describe your choices for each of the following

* PEFT technique: LORA
* Model: BERT
* Evaluation approach: custom function to calculate accuracy
* Fine-tuning dataset: TimKoornstra/financial-tweets-sentiment

## Preprocess data

In [29]:
import torch
import pandas as pd
import numpy as np
from transformers import (
    AutoTokenizer, 
    AutoModelForSequenceClassification, 
    TrainingArguments, 
    Trainer,
    DataCollatorWithPadding
)
from datasets import load_dataset
from tqdm import tqdm

In [30]:
# Parameters
model_name = 'bert-base-uncased'
dataset_name = 'TimKoornstra/financial-tweets-sentiment'

In [31]:
# Load dataset
# The dataset comprises tweets related to financial markets, stocks, and economic discussions. 
# Each tweet is labeled with a sentiment value, where '1' denotes a positive sentiment, 
# '2' signifies a negative sentiment, and '0' indicates a neutral sentiment.
dataset = load_dataset(dataset_name)
dataset

DatasetDict({
    train: Dataset({
        features: ['tweet', 'sentiment', 'url'],
        num_rows: 38091
    })
})

In [32]:
# Visualize dataset
pd.DataFrame(dataset['train'][:3])

Unnamed: 0,tweet,sentiment,url
0,$BYND - JPMorgan reels in expectations on Beyo...,2,https://huggingface.co/datasets/zeroshot/twitt...
1,$CCL $RCL - Nomura points to bookings weakness...,2,https://huggingface.co/datasets/zeroshot/twitt...
2,"$CX - Cemex cut at Credit Suisse, J.P. Morgan ...",2,https://huggingface.co/datasets/zeroshot/twitt...


In [33]:
pd.DataFrame(dataset['train']['sentiment']).value_counts()

1    17368
0    12181
2     8542
Name: count, dtype: int64

In [34]:
# Clean the data slightly
dataset = dataset.rename_column("sentiment", "label")
dataset = dataset.remove_columns("url")
dataset

DatasetDict({
    train: Dataset({
        features: ['tweet', 'label'],
        num_rows: 38091
    })
})

In [35]:
# Split the dataset into train, val, and test sets
dataset_split1 = dataset['train'].train_test_split(test_size=0.2, seed=42, stratify_by_column='label')
dataset_split2 = dataset_split1['train'].train_test_split(test_size=0.2, seed=42, stratify_by_column='label')

dataset_train = dataset_split2['train']
dataset_val = dataset_split2['test']
dataset_test = dataset_split1['test']  # holdout dataset to be compared with peft model 

assert len(dataset['train']) == len(dataset_train) + len(dataset_val) + len(dataset_test)

In [36]:
# Tokenize the train and validation dataset
tokenizer = AutoTokenizer.from_pretrained(model_name)
dataset_train = dataset_train.map(
    lambda x: tokenizer(x['tweet'], padding=True, truncation=True, return_tensors='pt'), 
    batched=True)
dataset_val = dataset_val.map(
    lambda x: tokenizer(x['tweet'], padding=True, truncation=True, return_tensors='pt'), 
    batched=True)
dataset_train

Map: 100%|██████████| 6095/6095 [00:02<00:00, 2155.31 examples/s]


Dataset({
    features: ['tweet', 'label', 'input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 24377
})

## Loading and Evaluating a Foundation Model (fine-tune classification head)

In [37]:
# Define compute_metrics function
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    logits, labels = np.array(logits), np.array(labels)
    predictions = np.argmax(logits, axis=1)
    return {"accuracy": (predictions == labels).mean()}

In [10]:
# Load pre-trained BERT model for sequence classification
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)

model.safetensors: 100%|██████████| 440M/440M [00:02<00:00, 212MB/s]  
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [11]:
# Freeze base model parameters
for param in model.base_model.parameters():
    param.requires_grad = False

In [12]:
# Define training arguments
training_args = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    learning_rate=2e-4,
    evaluation_strategy="epoch",
    save_strategy="epoch", 
    num_train_epochs=3, 
    weight_decay=0.01,
    load_best_model_at_end=True
)

In [13]:
# Define trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset_train,
    eval_dataset=dataset_val,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer),
    compute_metrics=compute_metrics
)
# Train the model
trainer.train()

You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy
1,0.9949,0.98822,0.537326
2,0.9811,0.958472,0.553404
3,0.9715,0.955107,0.55685


TrainOutput(global_step=4572, training_loss=0.9905648452611941, metrics={'train_runtime': 2729.0845, 'train_samples_per_second': 26.797, 'train_steps_per_second': 1.675, 'total_flos': 1.891171216690472e+16, 'train_loss': 0.9905648452611941, 'epoch': 3.0})

In [14]:
# trainer.evaluate()

## Performing Parameter-Efficient Fine-Tuning with Lora

In [38]:
from peft import get_peft_model, LoraConfig

In [11]:
# Create a PEFT Config
peft_config = LoraConfig(task_type="SEQ_CLS", inference_mode=False, r=8, lora_alpha=16, lora_dropout=0.1)

In [12]:
# Convert a Transformers Model into a PEFT Model
lora_model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)
lora_model = get_peft_model(lora_model, peft_config)
lora_model.print_trainable_parameters()

model.safetensors: 100%|██████████| 440M/440M [00:02<00:00, 211MB/s]  
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 299,526 || all params: 109,781,766 || trainable%: 0.27283765866910903


In [13]:
# Define training arguments
lora_training_args = TrainingArguments(
    output_dir="./lora_results",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    learning_rate=2e-4,
    evaluation_strategy="epoch",
    save_strategy="epoch", 
    num_train_epochs=1, 
    weight_decay=0.01,
    load_best_model_at_end=True
)

In [14]:
# Define trainer
lora_trainer = Trainer(
    model=lora_model,
    args=lora_training_args,
    train_dataset=dataset_train,
    eval_dataset=dataset_val,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer),
    compute_metrics=compute_metrics
)
# Train the model
lora_trainer.train()

You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy
1,0.674,0.641427,0.730763


TrainOutput(global_step=3048, training_loss=0.7594995986758255, metrics={'train_runtime': 1671.4271, 'train_samples_per_second': 14.585, 'train_steps_per_second': 1.824, 'total_flos': 5943863519458596.0, 'train_loss': 0.7594995986758255, 'epoch': 1.0})

## ⚠️ The lora model weights are saved in "./lora_results" folder, as specified in TrainingArguments

In [15]:
# lora_trainer.save_model("./bert_lora")

## Performing Parameter-Efficient Fine-Tuning with Qlora

In [39]:
from transformers import BitsAndBytesConfig
from peft import prepare_model_for_kbit_training

In [17]:
# Quantize a model
config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

qlora_model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3, quantization_config=config)
qlora_model = prepare_model_for_kbit_training(qlora_model)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [18]:
# Create a PEFT Config
peft_config = LoraConfig(task_type="SEQ_CLS", inference_mode=False, r=8, lora_alpha=16, lora_dropout=0.1)

In [19]:
# Convert a Transformers Model into a PEFT Model
qlora_model = get_peft_model(qlora_model, peft_config)
qlora_model.print_trainable_parameters()

trainable params: 299,526 || all params: 109,781,766 || trainable%: 0.27283765866910903


In [20]:
# Define training arguments
# thanks to quantization, batch size can be increased without memory error
qlora_training_args = TrainingArguments(
    output_dir="./qlora_results",
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    learning_rate=2e-4,
    evaluation_strategy="epoch",
    save_strategy="epoch", 
    num_train_epochs=1, 
    weight_decay=0.01,
    load_best_model_at_end=True
)

In [21]:
# Define trainer
qlora_trainer = Trainer(
    model=qlora_model,
    args=qlora_training_args,
    train_dataset=dataset_train,
    eval_dataset=dataset_val,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer),
    compute_metrics=compute_metrics
)
# Train the model
qlora_trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,0.716,0.684999,0.705824


TrainOutput(global_step=1524, training_loss=0.8052296037749043, metrics={'train_runtime': 2545.5606, 'train_samples_per_second': 9.576, 'train_steps_per_second': 0.599, 'total_flos': 3182409719779932.0, 'train_loss': 0.8052296037749043, 'epoch': 1.0})

In [26]:
# Visualize predictions and labels
df = pd.DataFrame(dataset_val)
predictions = qlora_trainer.predict(dataset_val)
df["predicted_label"] = np.argmax(predictions[0], axis=1)

In [33]:
pd.set_option('display.max_colwidth', None)
df[['tweet', 'label', 'predicted_label']].sample(5)

Unnamed: 0,tweet,label,predicted_label
941,$EEENF on the go 🚀🤙🏼 https://t.co/zl4na3aiqE,1,1
2838,Some banking tightening up. Gas &amp; oil extended or very busy but a few possible set ups. Eyeing a few semis including $NVDA $QCOM $ADI for possible short.,2,1
5818,Wants more\n,1,1
2606,$FB (110.20) is starting to show some relative strength and signs of potential B/O on the daily.,1,1
5364,Canadian National Laying Off Workers,2,0


## Performing Inference and compare model performance

In [40]:
from peft import PeftModel, AutoPeftModelForSequenceClassification

In [41]:
# base model
model = AutoModelForSequenceClassification.from_pretrained("./results/checkpoint-4572") 

In [42]:
# lora model: load with AutoPeftModelForSequenceClassification class
lora_model = AutoPeftModelForSequenceClassification.from_pretrained("./lora_results/checkpoint-3048", num_labels=3)
# lora_model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)
# lora_model = PeftModel.from_pretrained(lora_model, "./lora_results/checkpoint-3048")

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [43]:
# qlora model: load with AutoPeftModelForSequenceClassification class
config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

qlora_model = AutoPeftModelForSequenceClassification.from_pretrained(
    "./qlora_results/checkpoint-1524", 
    num_labels=3,
    quantization_config=config)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [44]:
# Perform inference on test dataset
encoding = tokenizer(dataset_test['tweet'][:500], padding=True, truncation=True, return_tensors="pt")
labels = dataset_test['label'][:500]

model.to('cuda')
encoding.to('cuda')
lora_model.to('cuda')
qlora_model.to('cuda')
with torch.no_grad():
    output = model(**encoding)
    lora_output = lora_model(**encoding)
    qlora_output = qlora_model(**encoding)

In [45]:
print(f"""
Performance of BERT model with tuning of classification head: {compute_metrics((output.logits.cpu(), labels))}
Performance of BERT model with PEFT (LORA): {compute_metrics((lora_output.logits.cpu(), labels))}
Performance of BERT model with PEFT (QLORA): {compute_metrics((qlora_output.logits.cpu(), labels))}
""")


Performance of BERT model with tuning of classification head: {'accuracy': 0.552}
Performance of BERT model with PEFT (LORA): {'accuracy': 0.75}
Performance of BERT model with PEFT (QLORA): {'accuracy': 0.734}



* Fine-tuning models with LORA does give significant performance boost even when training for just 1 epoch. (I would like to train more, but I keep getting server errors in Udactiy workspace)
* Quantization with lora leads to lower accuracy as expected, but it allows me to increase batch size per device to 16 or even 32.