# Lightweight Fine-Tuning for Text Classification
This notebook demonstrates **parameter-efficient fine-tuning** using the Hugging Face `peft` library. We'll fine-tune a pre-trained transformer model on a financial sentiment classification task with minimal resources, leveraging techniques like **LoRA** (Low-Rank Adaptation).

The overall pipeline includes:
- Loading a text classification dataset
- Tokenizing and preparing the data
- Fine-tuning a model using PEFT
- Comparing performance before and after tuning

The data and the tech stack use in this fine-tuning project:
* PEFT technique: Lora
* Model: gpt2
* Evaluation approach: Trainer
* Fine-tuning dataset: zeroshot/twitter-financial-news-sentiment (https://huggingface.co/datasets/zeroshot/twitter-financial-news-sentiment)

## Step 1: Install Required Packages
First, we ensure that required dependencies such as `datasets`, `transformers`, and `scikit-learn` are available.

In [2]:
# Install the required version of datasets in case you have an older version
# You will need to choose "Kernel > Restart Kernel" from the menu after executing this cell
# ! pip install -q "datasets==2.15.0"
! pip install scikit-learn

Defaulting to user installation because normal site-packages is not writeable
Collecting scikit-learn
  Downloading scikit_learn-1.4.1.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.1/12.1 MB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting joblib>=1.2.0
  Downloading joblib-1.3.2-py3-none-any.whl (302 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.2/302.2 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting threadpoolctl>=2.0.0
  Downloading threadpoolctl-3.3.0-py3-none-any.whl (17 kB)
Installing collected packages: threadpoolctl, joblib, scikit-learn
Successfully installed joblib-1.3.2 scikit-learn-1.4.1.post1 threadpoolctl-3.3.0


## Step 2: Import Libraries
We import all necessary libraries for model loading, data preprocessing, training, and evaluation.

In [1]:
import pandas as pd
import numpy as np
from transformers import AutoTokenizer, AutoModelForSequenceClassification, \
        TrainingArguments, Trainer, EvalPrediction, DataCollatorWithPadding

  from .autonotebook import tqdm as notebook_tqdm


## Step 3: Load and Explore the Dataset
We'll use the `zeroshot/twitter-financial-news-sentiment` dataset available from Hugging Face. After loading it, we perform a train-test split and inspect the dataset structure.

In [2]:
# Load the dataset

from datasets import load_dataset, load_metric

dataset = load_dataset('zeroshot/twitter-financial-news-sentiment')

# Perform the train-test split on the 'train' dataset with shuffling
split_result = dataset['train'].train_test_split(test_size=0.3, shuffle=True, seed=88)

# Update the dataset dictionary directly with the new splits
dataset.update({
    'train': split_result['train'],
    'test': split_result['test']
})

# Showing first example for train set
dataset['train'][0]

Downloading readme: 100%|██████████| 1.39k/1.39k [00:00<00:00, 1.38MB/s]
Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]
Downloading data:   0%|          | 0.00/859k [00:00<?, ?B/s][A
Downloading data: 100%|██████████| 859k/859k [00:00<00:00, 6.82MB/s][A
Downloading data files:  50%|█████     | 1/2 [00:00<00:00,  6.97it/s]
Downloading data: 100%|██████████| 217k/217k [00:00<00:00, 2.98MB/s]
Downloading data files: 100%|██████████| 2/2 [00:00<00:00,  8.52it/s]
Extracting data files: 100%|██████████| 2/2 [00:00<00:00, 1159.29it/s]
Generating train split: 9543 examples [00:00, 246225.94 examples/s]
Generating validation split: 2388 examples [00:00, 237576.74 examples/s]


{'text': 'Fantasia : FURTHER INFORMATION IN RELATION TO THE CO-OPERATION AGREEMENT WITH SHENGYUAN  #Stock #MarketScreener… https://t.co/NkBhcbaRvs',
 'label': 2}

In [3]:
# The dataset stucture
dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 6680
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 2388
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 2863
    })
})

## Step 4: Tokenization
Here we initialize the tokenizer from our pre-trained model (e.g., GPT-2) and tokenize the text for input to the model. Tokenization ensures the model understands input sequences in the correct format.

The dataset contains training, test and validation set.

In [3]:
# Initialize the tokenizer
tokenizer = AutoTokenizer.from_pretrained('gpt2')
tokenizer.pad_token = tokenizer.eos_token

# Defining a function to tokenize a batch of texts
def tokenize_batch(examples):
    return tokenizer(examples['text'], padding="max_length", truncation=True)

# Tokenize all the examples in each split of the dataset
tokenized_dataset = {}
splits = ['train', 'test', 'validation']
for split in splits:
    # Check if the split exists in the dataset to avoid KeyError
    if split in dataset:
        tokenized_dataset[split] = dataset[split].map(tokenize_batch, batched=True)

# Show the structure of the tokenized dataset
print(tokenized_dataset)

tokenizer_config.json: 100%|██████████| 26.0/26.0 [00:00<00:00, 118kB/s]
config.json: 100%|██████████| 665/665 [00:00<00:00, 3.01MB/s]
vocab.json: 100%|██████████| 1.04M/1.04M [00:00<00:00, 11.9MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 6.62MB/s]
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 25.8MB/s]
Map: 100%|██████████| 6680/6680 [00:04<00:00, 1519.69 examples/s]
Map: 100%|██████████| 2863/2863 [00:01<00:00, 1564.21 examples/s]
Map: 100%|██████████| 2388/2388 [00:01<00:00, 1485.65 examples/s]

{'train': Dataset({
    features: ['text', 'label', 'input_ids', 'attention_mask'],
    num_rows: 6680
}), 'test': Dataset({
    features: ['text', 'label', 'input_ids', 'attention_mask'],
    num_rows: 2863
}), 'validation': Dataset({
    features: ['text', 'label', 'input_ids', 'attention_mask'],
    num_rows: 2388
})}





## Step 5: Load and Prepare the Base Model
We load a pre-trained GPT-2 model for sequence classification and adapt it for our task. Since we're doing parameter-efficient fine-tuning, we **freeze the base model's parameters** to train only a few layers or components.

In [None]:
# Load the pretrained model with specific configuration options
model = AutoModelForSequenceClassification.from_pretrained(
    'gpt2',
    num_labels=3,
    id2label={0: 'Negative', 1: 'Positive', 2: 'Indifferent'},
    label2id={'Negative': 0, 'Positive': 1, 'Indifferent': 2}
)

# Update the model's tokenizer pad token id in its configuration
model.config.pad_token_id = tokenizer.eos_token_id

# Freeze all the parameters of the base model
for param in model.base_model.parameters():
    param.requires_grad = False

# Evaluation metrics
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

## Step 6: Training the Base Model (Frozen)
We define training arguments and use Hugging Face’s `Trainer` to fine-tune the model. Evaluation and saving are done after each epoch to monitor improvements.

In [None]:
# Set the training arguments
trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./data/train",
        # Set the learning rate
        learning_rate = 2e-5,
        # Set the per device train batch size and eval batch size
        per_device_train_batch_size = 16,
        per_device_eval_batch_size = 16,
        # Evaluate and save the model after each epoch
        evaluation_strategy="epoch",
        save_strategy="epoch",
        num_train_epochs=2,
        weight_decay=0.01,
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer.train()


model.safetensors: 100%|██████████| 548M/548M [00:02<00:00, 209MB/s]  
Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.932271,0.648241
2,1.747100,0.887464,0.652848


TrainOutput(global_step=836, training_loss=1.4194727628424977, metrics={'train_runtime': 1690.9183, 'train_samples_per_second': 7.901, 'train_steps_per_second': 0.494, 'total_flos': 6981912216207360.0, 'train_loss': 1.4194727628424977, 'epoch': 2.0})

## Step 7: Evaluation on the Test Set
After training, we evaluate the model on the test set to measure performance. We compute the **accuracy** and display a simple comparison of predictions vs. ground truth.

In [6]:
trainer.evaluate()

{'eval_loss': 0.8874643445014954,
 'eval_accuracy': 0.6528475711892797,
 'eval_runtime': 211.8086,
 'eval_samples_per_second': 11.274,
 'eval_steps_per_second': 0.708,
 'epoch': 2.0}

In [7]:
# Save the pre-trained model
model.save_pretrained('gpt2-model')

In [8]:
# Evaluate the accuracy of the test set
predicted = trainer.predict(tokenized_dataset['test'])

In [9]:
df = pd.DataFrame(
    {
        "predictions": predicted.predictions.argmax(axis=1),
        "actual": predicted.label_ids,
    }
)
df

accuracy = (df['predictions'] == df['actual']).mean()
print(f'Accuracy: {accuracy*100:.2f}%')


Accuracy: 64.62%


## Step 8: Introduce Parameter-Efficient Fine-Tuning (PEFT) with LoRA
Now, we enhance our approach using **LoRA**—a PEFT technique that adds a small number of trainable parameters. This allows effective fine-tuning with drastically reduced compute requirements.

In [4]:
from peft import  get_peft_config, get_peft_model,\
            LoraConfig,  TaskType, AutoPeftModelForSequenceClassification

In [5]:
# Evaluation metrics
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

In [6]:
# PEFT model configuration
peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False,
    r=3,
    lora_alpha=1,
    lora_dropout=0.2,
    bias = 'none',
    target_modules=['c_attn', 'c_proj']
)

# Load the pre-trained GPT-2 model
model = AutoModelForSequenceClassification.from_pretrained('gpt2-model')

# Create the lora model
lora_model = get_peft_model(model, peft_config)
lora_model.print_trainable_parameters()



trainable params: 308,736 || all params: 124,748,544 || trainable%: 0.2474866560366428


In [12]:
# Set the trainer parameters
training_arg = TrainingArguments(
        output_dir="./data/train-lora",
        # Set the learning rate
        learning_rate = 2e-3,
        # Set the per device train batch size and eval batch size
        per_device_train_batch_size = 4,
        per_device_eval_batch_size = 4,
        # Evaluate and save the model after each epoch
        evaluation_strategy="epoch",
        save_strategy="epoch",
        num_train_epochs=1,
        weight_decay=0.01,
        load_best_model_at_end=True,
    )
    
trainer = Trainer(
    model=lora_model,
    args=training_arg,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

## Step 9: Fine-Tune with LoRA
We configure and train the LoRA-adapted model. The training process now updates only the small LoRA modules, while the main model remains frozen. This is ideal for low-resource environments.

In [13]:
# Train the PEFT model
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,0.6017,0.471124,0.846734


TrainOutput(global_step=1670, training_loss=0.7094938906366953, metrics={'train_runtime': 1756.4349, 'train_samples_per_second': 3.803, 'train_steps_per_second': 0.951, 'total_flos': 3503532665733120.0, 'train_loss': 0.7094938906366953, 'epoch': 1.0})

In [14]:
trainer.evaluate()

{'eval_loss': 0.47112417221069336,
 'eval_accuracy': 0.8467336683417085,
 'eval_runtime': 233.9509,
 'eval_samples_per_second': 10.207,
 'eval_steps_per_second': 2.552,
 'epoch': 1.0}

In [15]:
# Save the Lora model
lora_model.save_pretrained("lora-model")

In [None]:
# Evaluate the accuracy of the test set using the pre-trained model
predicted = trainer.predict(tokenized_dataset['test'])
actual = np.array(tokenized_dataset['test']['label'])
x = np.stack((predicted.label_ids, actual))

df = pd.DataFrame(
    {
        "predictions": predicted.predictions.argmax(axis=1),
        "actual": predicted.label_ids,
    }
)
df

# Calculate accuracy
accuracy = (df['predictions'] == df['actual']).mean()
print(f'Accuracy: {accuracy*100:.2f}%')

The accuracy of the fine-tuned model 83.37% is higher than the pre-trained model 64.62%.

## Step 10: Performing Inference with a PEFT Model

Now we load the saved PEFT model weights and take a few samples to evaluate the performance of the trained PEFT.

In [20]:
# load in the model
from peft import AutoPeftModelForSequenceClassification
inference_model = AutoPeftModelForSequenceClassification.from_pretrained("lora-model")
inference_model

PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): GPT2ForSequenceClassification(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 768)
        (wpe): Embedding(1024, 768)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-11): 12 x GPT2Block(
            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2Attention(
              (c_attn): Linear(
                in_features=768, out_features=2304, bias=True
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.2, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=768, out_features=3, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=3, out_features=2304, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_

In [59]:
# Display the first 5 tweets and sentiments in the test set
n = 5
tweets_list = (tokenized_dataset["test"]['text'][:5], tokenized_dataset["test"]['label'][:5])
for i in range(5):
    print(f"{i+1}. {tweets_list[0][i]}")
    print(f"Sentiment: {inference_model.config.id2label[tweets_list[1][i]]}")
    print()

1. It's Official: Nio Brings Former Auto Analyst Wei Feng On As CFO
Sentiment: Indifferent

2. Lumber Liquidators +1.3% after guidance update
Sentiment: Positive

3. Suzuki considers China supply options, third-quarter profit falls 11%
Sentiment: Negative

4. Teva, Bausch Could Be Next to File for Bankruptcy
Sentiment: Negative

5. $NLOK - Taking A Look At The Special Situation Opportunity In NortonLifeLock. https://t.co/wrZvz2jZFb #markets #economy #finance
Sentiment: Indifferent



In [53]:
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Inference function
def get_sentiment(tweet):
    inputs = tokenizer(tweet, return_tensors="pt")
    logits = inference_model(**inputs).logits
    predicted_class_id = logits.argmax().item()
    
    return inference_model.config.id2label[predicted_class_id]

In [60]:
# Use the fine-tuned model to perfrom inference
for i in range(5):
    print(f"{i+1}. {tweets_list[0][i]}")
    print(f"Sentiment: {get_sentiment(tweets_list[0][i])}")
    print()

1. It's Official: Nio Brings Former Auto Analyst Wei Feng On As CFO
Sentiment: Indifferent

2. Lumber Liquidators +1.3% after guidance update
Sentiment: Positive

3. Suzuki considers China supply options, third-quarter profit falls 11%
Sentiment: Negative

4. Teva, Bausch Could Be Next to File for Bankruptcy
Sentiment: Indifferent

5. $NLOK - Taking A Look At The Special Situation Opportunity In NortonLifeLock. https://t.co/wrZvz2jZFb #markets #economy #finance
Sentiment: Indifferent



The fine-tuned model made correct predictions for 4 out of the 5 samples.

## ✅ Conclusion & Key Takeaways
In this project, we demonstrated how to apply **Lightweight Fine-Tuning** using Hugging Face and the `peft` library. By utilizing **LoRA (Low-Rank Adaptation)**, we significantly reduced the number of trainable parameters while still achieving meaningful performance.

**Key Highlights:**
- Used GPT-2 for a sequence classification task (financial sentiment analysis)
- Fine-tuned the base model by freezing its core layers and training only classification heads
- Applied LoRA for parameter-efficient fine-tuning, greatly reducing compute overhead
- Achieved solid classification performance while maintaining resource efficiency

This approach is particularly useful when working in resource-constrained environments or deploying models in production with limited infrastructure.

🚀 *Next steps could involve experimenting with other PEFT methods, testing on different datasets, or deploying the model in an API-powered inference system.*