# Lightweight Fine-Tuning Project

Fine-tuning foundation models with minimal resources is crucial for adapting these models to specific tasks, enabling customization without significant computational overhead. One of the most efficient ways to achieve this is through lightweight fine-tuning techniques, which ensure flexibility and lower resource requirements.

## Dataset Overview

The **Hate Speech Twitter** dataset ([view here](https://huggingface.co/datasets/thefrankhsu/hate_speech_twitter/viewer)) is curated to examine and mitigate hate speech on social media platforms, particularly Twitter. It is divided into two parts: training and testing datasets. Both sets contain tweets labeled and categorized into nine distinct categories related to various forms of hate speech.

### Key Features of the Dataset

This dataset includes three primary features: 

- **Tweets**: The actual content of the tweets.
- **Labels**: Binary values indicating whether the tweet contains hate speech (1) or not (0).
- **Categories**: Nine categories for hate speech classification: behavior, class, disability, ethnicity, gender, physical appearance, race, religion, and sexual orientation.

### Dataset Breakdown

- **Training Set**: 
  - Total tweets: 5679
  - Hate Speech: 1516 
  - Non-Hate Speech: 4163 
  - Note: Hate speech is not evenly distributed across categories.

- **Testing Set**: 
  - Total tweets: 1000
  - Hate Speech: 500
  - Non-Hate Speech: 500
  - Note: Hate speech categories are more evenly distributed.

### Applications

This dataset can be applied to several key tasks, such as:

- **Hate Speech Detection**: Developing and training models to identify hate speech on social media.
- **Prevalence Analysis**: Studying patterns and trends of hate speech across various categories.
- **Categorization Challenges**: Investigating the complexities of accurately classifying hate speech in online platforms.

### Model Training Details

- **PEFT Technique**: [LoRA (Low-Rank Adaptation)](https://github.com/huggingface/peft)
- **Base Model**: GPT-2
- **Evaluation Approach**: Transformer trainer
- **Fine-Tuning Dataset**: [Hate Speech Twitter](https://huggingface.co/datasets/thefrankhsu/hate_speech_twitter/viewer)

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

### Libraries for Deep Learning and Model Training:

1. **`import torch`**
   - **Purpose**: `torch` is the core library for PyTorch, which is an open-source deep learning framework. It provides tools for tensor operations, building neural networks, and managing training loops.
   - **Why Needed**: It is essential for tensor operations, moving models to devices (CPU/GPU), and performing various machine learning operations like backpropagation, gradient calculation, etc.

2. **`from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer, EvalPrediction, DataCollatorWithPadding`**
   - **Purpose**: This set of imports comes from the **Hugging Face Transformers library**, which provides pre-trained models and tools for NLP (Natural Language Processing).
     - **`AutoTokenizer`**: Automatically loads a tokenizer for a pre-trained model. Tokenizers convert raw text into token IDs that a model can understand.
     - **`AutoModelForSequenceClassification`**: Automatically loads a pre-trained model for sequence classification tasks, such as sentiment analysis, text classification, etc.
     - **`TrainingArguments`**: Defines the configuration and hyperparameters for model training, such as learning rate, batch size, and number of epochs.
     - **`Trainer`**: A class that simplifies the process of training and evaluating models. It handles training loops, gradient computation, logging, and evaluation.
     - **`EvalPrediction`**: Used to define the format for predictions and labels during evaluation (often used in the `Trainer`'s evaluation step).
     - **`DataCollatorWithPadding`**: A collator that dynamically pads input sequences to the same length during batching. This ensures that the model receives input data of consistent dimensions.
   - **Why Needed**: These components are crucial for working with pre-trained models, tokenizing text, setting training configurations, and performing the training and evaluation steps. They streamline many common tasks in NLP tasks.

3. **`from datasets import Dataset`**
   - **Purpose**: `Dataset` is a class from the Hugging Face **Datasets library**, which is used to handle and manipulate datasets in a format compatible with the Hugging Face ecosystem.
   - **Why Needed**: This is needed for creating and managing datasets for training and evaluation. It supports efficient loading and processing of large datasets.

### Libraries for Machine Learning and Evaluation:

4. **`from sklearn.model_selection import train_test_split`**
   - **Purpose**: `train_test_split` is a function from **Scikit-learn**, a machine learning library, that splits data into training and testing subsets.
   - **Why Needed**: This is crucial for dividing your dataset into training and validation sets, allowing you to evaluate the model's performance on unseen data during training.

5. **`from sklearn.metrics import accuracy_score, precision_recall_fscore_support`**
   - **Purpose**: These are functions from **Scikit-learn** used to evaluate the model's performance.
     - **`accuracy_score`**: Computes the accuracy of the model, which is the proportion of correctly classified instances.
     - **`precision_recall_fscore_support`**: Computes additional classification metrics such as precision, recall, and F1 score, which are useful for understanding model performance beyond simple accuracy.
   - **Why Needed**: These are essential for assessing the performance of the model on the validation dataset, especially when you want to evaluate not just accuracy but also precision and recall (which are more informative in cases of class imbalance).

### Libraries for Parameter-Efficient Fine-Tuning (PEFT):

6. **`from peft import LoraConfig, PeftModelForSequenceClassification, TaskType, AutoPeftModelForSequenceClassification`**
   - **Purpose**: These imports come from a **custom PEFT (Parameter-Efficient Fine-Tuning)** module, which allows fine-tuning large pre-trained models more efficiently by only updating a small number of parameters.
     - **`LoraConfig`**: Contains the configuration settings for the **LoRA (Low-Rank Adaptation)** technique, which is a PEFT method. It helps reduce the number of parameters that need to be updated during fine-tuning.
     - **`PeftModelForSequenceClassification`**: Wraps a pre-trained model and applies PEFT (specifically LoRA) to allow fine-tuning with fewer parameters.
     - **`TaskType`**: Specifies the task type (e.g., sequence classification, token classification, etc.) for the PEFT model.
     - **`AutoPeftModelForSequenceClassification`**: Automatically loads a pre-trained model and wraps it for PEFT, making it easier to work with models fine-tuned using LoRA for sequence classification tasks.
   - **Why Needed**: PEFT is used here to fine-tune models with fewer parameters, making it possible to fine-tune large models without the need for massive computational resources. These imports are used for loading, configuring, and fine-tuning the model efficiently.

### Summary:
- **PyTorch** (`torch`) is used for model operations and training loops.
- **Hugging Face Transformers** handles the model loading, tokenization, and training infrastructure for NLP tasks.
- **Scikit-learn** is used for dataset splitting and evaluating model performance.
- **PEFT** (Parameter-Efficient Fine-Tuning) tools are used to fine-tune the model in a resource-efficient manner.

Together, these libraries provide all the necessary tools to manage datasets, train models, evaluate performance, and fine-tune large language models efficiently.

In [2]:
import numpy as np
import pandas as pd
import random
import torch
import tqdm

from datasets import Dataset
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    TrainingArguments,
    Trainer,
    EvalPrediction,
    DataCollatorWithPadding,
)
from peft import LoraConfig, PeftModelForSequenceClassification, TaskType, AutoPeftModelForSequenceClassification


In [3]:
ds_train = pd.read_csv('./dataset/training_set.csv', usecols=['tweet', 'label']).dropna()
ds_test  = pd.read_csv('./dataset/testing_set.csv', usecols=['tweet', 'label']).dropna()

In [4]:
ds_train.head()

Unnamed: 0,tweet,label
0,krazy i dont always get drunk and pass out but...,0
1,white kids favorite activities calling people ...,1
2,maam did you clear that tweet with the caref...,0
3,wth is that playing missy i mean seriously rt...,0
4,he promised to stand with the muzzies so,0


In [5]:
ds_test.head()

Unnamed: 0,tweet,label
0,"sad to hear the announcers say that ""it may ha...",0
1,Spazzies aren't welcome around here,1
2,gay people need to die,1
3,the big screen is being fitted right now #eu...,0
4,Why is it that African people smell weird? Do ...,1


In [6]:
# Split the dataset into training and validation sets
train_df, val_df = train_test_split(ds_train, test_size=0.2)

In [7]:
train_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4542 entries, 1810 to 4396
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   tweet   4542 non-null   object
 1   label   4542 non-null   int64 
dtypes: int64(1), object(1)
memory usage: 106.5+ KB


In [8]:
val_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1136 entries, 4868 to 4901
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   tweet   1136 non-null   object
 1   label   1136 non-null   int64 
dtypes: int64(1), object(1)
memory usage: 26.6+ KB


In [9]:
hate2label = {1:"hate speeach", 0:"Neutral"}

In [10]:
# Convert the dataframes into Hugging Face datasets
train_dataset = Dataset.from_pandas(train_df)
val_dataset = Dataset.from_pandas(val_df)

### Initializing the Tokenizer
```python
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token
```
- **Tokenizer Creation**: Utilizes the `gpt2` tokenizer from the Hugging Face Transformers library.  
- **Setting the Pad Token**: As GPT-2 doesn't use a padding token by default, the `pad_token` is set to the end-of-sequence (`eos`) token to maintain consistency in padding.

### Defining the `tokenize_and_encode` Function
```python
def tokenize_and_encode(examples):
    tokenized_inputs = tokenizer(
        examples['tweet'],
        padding="max_length",
        truncation=True,
        max_length=512
    )
    tokenized_inputs['labels'] = examples['label']
    return tokenized_inputs
```
- **Function Purpose**: Processes a batch of examples and returns token IDs that are compatible with the model for training.  
- **Parameters**:
  - `examples['tweet']`: The tweet text to be tokenized.
  - `padding="max_length"`: Ensures that all sequences are padded to a uniform maximum length.
  - `truncation=True`: Ensures sequences longer than the `max_length` are truncated to avoid overflow.
  - `max_length=512`: Specifies the maximum length for each tokenized sequence.
- **Assigning Labels**: The labels from `examples['label']` are added to the `tokenized_inputs` dictionary under the key `'labels'`, making both inputs and labels accessible for the model in one dataset.

### Applying the Function to the Datasets
```python
train_dataset = train_dataset.map(tokenize_and_encode, batched=True)
val_dataset   = val_dataset.map(tokenize_and_encode, batched=True)
```
- **Mapping Over Datasets**: The `.map()` function applies the `tokenize_and_encode` method to each batch of examples in both the training and validation datasets.
- **Why `batched=True`?**: Using `batched=True` in Hugging Face Datasets allows for processing multiple examples at once, improving efficiency over processing them individually.

---

**Summary**:  
- **Purpose**: Prepares textual data and labels for a model that requires GPT-2 tokenized input.  
- **Key Steps**: Initialize the tokenizer, define a function to tokenize and encode the text while attaching labels, and then apply this function to the training and validation datasets.  
- **Outcome**: The `train_dataset` and `val_dataset` are now formatted appropriately for model training.

In [11]:
# Define the tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

# Tokenize and convert
def tokenize_and_encode(examples):
    tokenized_inputs = tokenizer(examples['tweet'], padding="max_length", truncation=True, max_length=512)
    tokenized_inputs['labels'] = examples['label']
    return tokenized_inputs

train_dataset = train_dataset.map(tokenize_and_encode, batched=True)
val_dataset   = val_dataset.map(tokenize_and_encode, batched=True)

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Map:   0%|          | 0/4542 [00:00<?, ? examples/s]

Map:   0%|          | 0/1136 [00:00<?, ? examples/s]

In [12]:
random_idx = random.randint(0, len(train_dataset) - 1)
print(train_dataset[random_idx]["tweet"])

a message for all of humanity  charlie chaplin  automatic


### Clearing CUDA Cache
```python
torch.cuda.empty_cache()
```
- **Purpose**: Clears unused memory from the GPU cache, freeing up resources.  
- **Why It Matters**: Helps mitigate memory fragmentation issues and reduces the risk of running into out-of-memory errors during model training.

### Initializing the Model
```python
model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels=2)
model.config.pad_token_id = tokenizer.pad_token_id
```
- **Using GPT-2 for Classification**: Loads GPT-2 as the base model and configures it for sequence classification with two output labels.  
- **Pad Token ID**: Sets the model’s `config.pad_token_id` to match the tokenizer's padding token to ensure consistent padding behavior across model and tokenizer.

### Defining the Data Collator
```python
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
```
- **Automatic Padding**: Ensures sequences in each batch are padded to the same length, regardless of their original length.  
- **Tokenizer Integration**: Leverages the GPT-2 tokenizer to handle padding correctly.

### Compute Metrics Function
```python
def compute_metrics(p: EvalPrediction):
    preds = np.argmax(p.predictions, axis=1)
    precision, recall, f1, _ = precision_recall_fscore_support(p.label_ids, preds, average='weighted')
    return {
        "accuracy": accuracy_score(p.label_ids, preds),
        "f1": f1,
        "precision": precision,
        "recall": recall
    }
```
- **Purpose**: Computes key evaluation metrics like accuracy, precision, recall, and F1 score from the model’s raw predictions.  
- **Key Steps**:  
  - `np.argmax(...)` converts logits into predicted class indices.  
  - `precision_recall_fscore_support(...)` calculates the precision, recall, and F1 scores.  
  - `accuracy_score(...)` calculates the overall accuracy of the predictions.

### Training Arguments
```python
training_args = TrainingArguments(
    output_dir="./results_normal_model",
    evaluation_strategy="epoch",
    learning_rate=2.5e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=1,
    weight_decay=0.01,
    logging_dir='./logs_normal_model',
    save_strategy="epoch",
    load_best_model_at_end=True,
    logging_steps=150,
    warmup_ratio=0.1,
    eval_accumulation_steps=50
)
```
- **Output Directory**: Stores model checkpoints and logs at `./results_normal_model`.  
- **Evaluation Strategy**: Evaluates the model at the end of each epoch (`evaluation_strategy="epoch"`).  
- **Learning Rate**: Sets a learning rate of `2.5e-5`, commonly used for fine-tuning large models.  
- **Batch Sizes**: Specifies batch sizes of 8 for both training and evaluation.  
- **Epochs**: Trains the model for 1 epoch. More epochs can be added if necessary.  
- **Weight Decay**: Adds regularization to prevent overfitting.  
- **Logging & Saving**: Configures logging frequency (`logging_steps=150`) and saves the model at the end of each epoch.  
- **Load Best Model**: Ensures the best model (based on validation performance) is selected at the end of training.  
- **Warmup Ratio**: Allocates 10% of the total training steps for learning rate warmup (`warmup_ratio=0.1`).  
- **Evaluation Accumulation**: Sets `eval_accumulation_steps=50` to accumulate predictions in small batches for memory efficiency.

### Initializing the Trainer
```python
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=data_collator,
)
```
- **Components**:  
  - **Model**: The GPT-2 model configured for classification.  
  - **Training Arguments**: The configuration set defined earlier.  
  - **Datasets**: Uses both the training and validation datasets.  
  - **Metrics**: Includes the `compute_metrics` function to evaluate performance.  
  - **Tokenizer & Data Collator**: Handles tokenization and padding for the data.

### Training
```python
trainer.train()
```
- **Start Training**: Begins the fine-tuning process, logging training loss, evaluating after each epoch, and saving model checkpoints.

### Evaluation
```python
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)
```
- **Final Evaluation**: Computes the evaluation metrics for the validation dataset using the model trained with the best performance.  
- **Results**: Prints the metrics, including accuracy, precision, recall, and F1 score.

---

**Summary**  
This code shows how to configure and fine-tune a GPT-2 model for binary classification using Hugging Face’s Trainer. The steps include:
- **Model Setup** for classification with GPT-2.  
- **Data Preparation** with a padding collator.  
- **Metric Calculation** for evaluation (accuracy, precision, recall, F1).  
- **Training Strategy** with logging, evaluation, and checkpointing.  
- **Evaluation** to assess model performance on validation data.

In [13]:
torch.cuda.empty_cache()
model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels=2)
model.config.pad_token_id = tokenizer.pad_token_id

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)


# Compute metrics function
def compute_metrics(p: EvalPrediction):
    preds = np.argmax(p.predictions, axis=1)
    precision, recall, f1, _ = precision_recall_fscore_support(p.label_ids, preds, average='weighted')
    return {"accuracy": accuracy_score(p.label_ids, preds), "f1": f1, "precision": precision, "recall": recall}

# Define the training arguments
training_args = TrainingArguments(
    output_dir="./results_normal_model",  # Directory to save model checkpoints and logs
    evaluation_strategy="epoch",  # Evaluate the model at the end of each epoch
    learning_rate=2.5e-5,  # Learning rate for the optimizer
    per_device_train_batch_size=8,  # Batch size for training on each device (GPU/CPU)
    per_device_eval_batch_size=8,  # Batch size for evaluation on each device
    num_train_epochs=1,  # Number of training epochs
    weight_decay=0.01,  # Weight decay (regularization) to prevent overfitting
    logging_dir='./logs_normal_model',  # Directory to save logs for tracking
    save_strategy="epoch",  # Save model at the end of each epoch
    load_best_model_at_end=True,  # Load the best model based on evaluation metrics after training
    logging_steps=150,  # Number of steps between each logging event
    warmup_ratio=0.1,  # Fraction of total steps for learning rate warmup (gradual increase)
    eval_accumulation_steps=50  # Number of steps to accumulate predictions for evaluation (to save memory)
)

# Initialize the Trainer with compute_metrics
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

# Start training
trainer.train()

# Evaluate
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.2901,0.269737,0.923415,0.923783,0.924291,0.923415


Checkpoint destination directory ./results_normal_model/checkpoint-568 already exists and is non-empty.Saving will proceed but saved results may be invalid.


Evaluation Results: {'eval_loss': 0.26973676681518555, 'eval_accuracy': 0.9234154929577465, 'eval_f1': 0.923782679107111, 'eval_precision': 0.9242905024393862, 'eval_recall': 0.9234154929577465, 'eval_runtime': 43.4052, 'eval_samples_per_second': 26.172, 'eval_steps_per_second': 3.271, 'epoch': 1.0}


## Performing Parameter-Efficient Fine-Tuning

### Defining the LoRA Configuration
```python
peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False,
    r=4,
    lora_alpha=16,
    lora_dropout=0.1
)
```
- **PEFT (Parameter-Efficient Fine-Tuning)**: A method designed to fine-tune large language models using fewer trainable parameters, making the process more resource-efficient.  
- **LoRA (Low-Rank Adaptation)**: A specific PEFT technique that introduces low-rank adaptation matrices into certain layers of a pre-trained model, reducing the number of parameters that need to be updated.  
- **Key Configuration Parameters**:  
  - **`task_type=TaskType.SEQ_CLS`**: Specifies that this task is for sequence classification.  
  - **`inference_mode=False`**: Indicates the model is in training mode, allowing updates to the LoRA parameters.  
  - **`r=4`**: Sets the rank of the LoRA adaptation matrices, with a smaller rank reducing the number of parameters.  
  - **`lora_alpha=16`**: Scaling factor for LoRA, controlling the magnitude of updates.  
  - **`lora_dropout=0.1`**: Applies dropout to the LoRA layers to reduce overfitting.

### Loading the Pre-trained GPT-2 Model
```python
model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels=2)
model.config.pad_token_id = model.config.eos_token_id
```
- **GPT-2 for Classification**: Loads GPT-2 as a base model and adapts it for sequence classification with `num_labels=2`.  
- **Padding Token**: Since GPT-2 doesn’t have a padding token, we set the `pad_token_id` to the end-of-sequence (`eos_token_id`) token.

### Wrapping the Model with LoRA (PEFT)
```python
peft_model = PeftModelForSequenceClassification(model, peft_config)
```
- **PeftModelForSequenceClassification**: Wraps the GPT-2 classification model with additional LoRA layers, which are the only parameters being fine-tuned.  
- **Goal**: This approach ensures that the majority of the model's parameters remain frozen, optimizing memory and computational efficiency during fine-tuning.

### Checking Trainable Parameters
```python
peft_model.print_trainable_parameters()
```
- **Function**: Prints the number of trainable parameters in the model, confirming that only the LoRA parameters and possibly the classification head are being updated.  
- **Purpose**: Verifies that the majority of the model’s parameters are frozen, which reduces the computational and memory overhead of fine-tuning.

---

**Summary**  
This code demonstrates how to apply **LoRA** (Low-Rank Adaptation), a parameter-efficient fine-tuning approach, to a **GPT-2** model for sequence classification. By defining a **LoRA configuration** and wrapping the model with **`PeftModelForSequenceClassification`**, fine-tuning is performed on a smaller subset of parameters, making the process faster and more resource-efficient.

In [14]:
# PEFT model configuration
peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,  # Specifies the task type (sequence classification)
    inference_mode=False,  # Set to False for training mode (enables parameter updates)
    r=4,  # Rank of the low-rank adaptation matrices (controls how many parameters are adapted)
    lora_alpha=16,  # Scaling factor for the LoRA updates (higher values mean larger updates)
    lora_dropout=0.1  # Dropout rate applied to LoRA layers to reduce overfitting
)

# Load the pre-trained GPT-2 model
model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels=2)  

# Loads GPT-2 and adapts it for sequence classification with two output labels (binary classification task)
model.config.pad_token_id = model.config.eos_token_id  # Sets pad_token_id to eos_token_id as GPT-2 does not have a dedicated padding token

# Wrap the model with PEFT (Parameter Efficient Fine-Tuning) using LoRA
peft_model = PeftModelForSequenceClassification(model, peft_config)  # Applies the LoRA configuration to the GPT-2 model

# Print the number of trainable parameters in the PEFT model
peft_model.print_trainable_parameters()  # Displays how many parameters are trainable, confirming the effectiveness of PEFT (only LoRA parameters are updated)


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 150,528 || all params: 124,590,336 || trainable%: 0.1208183594592762


In [15]:
# Define the training arguments
training_args = TrainingArguments(
    output_dir="./results_perf_model",  # Directory where model checkpoints and logs will be saved
    evaluation_strategy="epoch",  # Evaluate the model at the end of each epoch
    learning_rate=2.5e-5,  # Set the learning rate for the optimizer
    per_device_train_batch_size=8,  # Batch size used for training on each device (GPU/CPU)
    per_device_eval_batch_size=8,  # Batch size used for evaluation on each device
    num_train_epochs=1,  # Number of training epochs (how many times the model will see the full dataset)
    weight_decay=0.01,  # Weight decay applied to the optimizer to prevent overfitting
    logging_dir='./logs_perf_model',  # Directory where logs (training progress, metrics) will be saved
    save_strategy="epoch",  # Save model at the end of each epoch
    load_best_model_at_end=True,  # Load the best model based on evaluation metrics after training
    logging_steps=150,  # Number of steps between logging events (e.g., showing training progress)
    warmup_ratio=0.1,  # Fraction of total training steps used for learning rate warmup
    eval_accumulation_steps=50  # Number of steps to accumulate predictions before running evaluation (to save memory)
)

# Initialize the Trainer with compute_metrics
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

# Start training
trainer.train()

# Evaluate
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)

Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.5917,0.556955,0.742958,0.644517,0.717799,0.742958


Checkpoint destination directory ./results_perf_model/checkpoint-568 already exists and is non-empty.Saving will proceed but saved results may be invalid.


Evaluation Results: {'eval_loss': 0.5569546222686768, 'eval_accuracy': 0.7429577464788732, 'eval_f1': 0.6445174350830246, 'eval_precision': 0.7177987712370479, 'eval_recall': 0.7429577464788732, 'eval_runtime': 44.3861, 'eval_samples_per_second': 25.594, 'eval_steps_per_second': 3.199, 'epoch': 1.0}


In [16]:
peft_model.save_pretrained('model/peft_model_tweets')

## Performing Inference with a PEFT Model

This code snippet demonstrates how to load a fine-tuned model, set up a **Trainer** for evaluation, and then evaluate its performance. Let's break it down step by step:

### Loading the Fine-Tuned Model
```python
inference_model = AutoPeftModelForSequenceClassification.from_pretrained(
    "model/peft_model",
    num_labels=2
)
inference_model.config.pad_token_id = inference_model.config.eos_token_id
```
- **`AutoPeftModelForSequenceClassification.from_pretrained("model/peft_model", num_labels=2)`**:  
  - This line loads a pre-trained model from the directory `"model/peft_model"`. The model is configured for sequence classification with `num_labels=2`, indicating it is a binary classification task (e.g., hate speech vs. non-hate speech).
  - **PEFT**: The model is based on **Parameter-Efficient Fine-Tuning (PEFT)**, which allows fine-tuning large models with fewer trainable parameters.
  
- **`inference_model.config.pad_token_id = inference_model.config.eos_token_id`**:  
  - Since the model might not have a dedicated padding token, the padding token (`pad_token_id`) is set to the **end-of-sequence (`eos_token_id`)** token, ensuring consistency in padding for sequences.

### Setting Up the Trainer
```python
trainer = Trainer(
    model=inference_model,
    args=training_args,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=data_collator,
)
```
- **`Trainer`**: The Hugging Face `Trainer` class is used to simplify training and evaluation of models. In this case, it’s being set up for evaluation.
  
- **Model**:  
  - The **inference model** (`inference_model`), which is the fine-tuned PEFT model loaded earlier, is provided to the `Trainer`.
  
- **Training Arguments** (`training_args`):  
  - The `training_args` object contains configuration for training and evaluation (e.g., batch size, learning rate, evaluation strategy, etc.). These are set up separately and passed into the `Trainer`.
  
- **Evaluation Dataset** (`eval_dataset=val_dataset`):  
  - The `Trainer` will evaluate the model on the **validation dataset** (`val_dataset`).
  
- **Metrics** (`compute_metrics=compute_metrics`):  
  - The `compute_metrics` function, which is defined elsewhere, computes evaluation metrics (e.g., accuracy, precision, recall) based on the model's predictions and ground truth labels.
  
- **Tokenizer** (`tokenizer=tokenizer`):  
  - The tokenizer is provided to handle tokenization of input text for evaluation, ensuring the model gets the correctly preprocessed input.
  
- **Data Collator** (`data_collator=data_collator`):  
  - The data collator ensures that sequences are padded correctly, allowing the model to handle inputs of varying lengths in the same batch.

### Evaluating the Model
```python
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)
```
- **`trainer.evaluate()`**:  
  - This line triggers the evaluation of the model on the validation dataset. It computes the evaluation metrics defined in `compute_metrics` (e.g., accuracy, F1 score) based on the predictions made by the model on the validation data.
  
- **`evaluation_results`**:  
  - The evaluation results are returned as a dictionary containing the computed metrics (e.g., accuracy, precision, recall, etc.).
  
- **`print("Evaluation Results:", evaluation_results)`**:  
  - Finally, the evaluation results are printed to the console so you can see how well the model performs on the validation set.

---

### Summary
In summary, this code loads a fine-tuned **PEFT model** for sequence classification, sets up the **Trainer** for evaluation, and then evaluates the model on a validation dataset. The results are printed, allowing you to assess the model's performance.

In [17]:
inference_model = AutoPeftModelForSequenceClassification.from_pretrained(
    "model/peft_model",
    num_labels=2
)
inference_model.config.pad_token_id = inference_model.config.eos_token_id

trainer = Trainer(
    model=inference_model,
    args=training_args,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

# Evaluate the model
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Evaluation Results: {'eval_loss': 0.6069970726966858, 'eval_accuracy': 0.6566901408450704, 'eval_f1': 0.630114732355021, 'eval_precision': 0.6124590942547943, 'eval_recall': 0.6566901408450704, 'eval_runtime': 45.8782, 'eval_samples_per_second': 24.761, 'eval_steps_per_second': 3.095}


In [18]:
def predict(prompt: str) -> str:
    """
    Function created to predict/classify each sentence
    
    Args:
        sentence: Sentence to be classified.
        
    Return:
        predicted_class_id: class ID predicted by the model
        predicted_label: actual label predicted
    """
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    inference_model.to(device)

    # Prepare the input text
    inputs = tokenizer(prompt, return_tensors="pt").to(device)

    # Get predictions
    with torch.no_grad():
        outputs = inference_model(**inputs)
        logits = outputs.logits

    probabilities = torch.nn.functional.softmax(logits, dim=1)
    predicted_class_id = probabilities.argmax().item()
    predicted_label = hate2label[predicted_class_id]

    return predicted_class_id, predicted_label

In [19]:
prompt = "I love you"
predicted_class, predicted_label = predict(prompt)
print(f"Prompt: '{prompt}'\nPredicted Class: {predicted_class} - Predicted Label: {predicted_label}")

Prompt: 'I love you'
Predicted Class: 0 - Predicted Label: Neutral


In [20]:
prompt = "maam did you clear that tweet with the careful they may brand you race traitor for the nerve of thinking"
predicted_class, predicted_label = predict(prompt)
print(f"Prompt: '{prompt}'\nPredicted Class: {predicted_class} - Predicted Label: {predicted_label}")

Prompt: 'maam did you clear that tweet with the careful they may brand you race traitor for the nerve of thinking'
Predicted Class: 0 - Predicted Label: Neutral


In [21]:
prompt = "cant take u coons nowhere amp i mean nowhere"
predicted_class, predicted_label = predict(prompt)
print(f"Prompt: '{prompt}'\nPredicted Class: {predicted_class} - Predicted Label: {predicted_label}")

Prompt: 'cant take u coons nowhere amp i mean nowhere'
Predicted Class: 0 - Predicted Label: Neutral


In [22]:
prompt = "0"
predicted_class, predicted_label = predict(prompt)
print(f"Prompt: '{prompt}'\nPredicted Class: {predicted_class} - Predicted Label: {predicted_label}")

Prompt: '0'
Predicted Class: 0 - Predicted Label: Neutral


In [23]:
for _id in [np.random.randint(low=0,high=val_df.shape[0]) for i in range(10)]:
    prompt = val_df.iloc[_id]
    
    print(f"Prompt: {prompt['tweet'][:150]}")
    
    actual_label_id = prompt['label']
    actual_label = hate2label[actual_label_id]
    print(f'Label:  {actual_label}')
    
    inputs = tokenizer(prompt['tweet'], return_tensors="pt").to(inference_model.device)
    
    with torch.no_grad():
        logits = inference_model(**inputs).logits
        
    predictions = torch.argmax(logits, dim=1).item()
    predicted_label = hate2label[predictions]
    print(f'Prediction: {predicted_label}\n')

Prompt: grand opening   top class  oriental massage  perfect
Label:  Neutral
Prediction: Neutral

Prompt: we dem birds
Label:  Neutral
Prediction: Neutral

Prompt: ill break a bitch down like the pussy she is
Label:  hate speeach
Prediction: Neutral

Prompt: parodies of jihadi songs  more effective than asinine muslims should condemn  humorisaweapon
Label:  Neutral
Prediction: hate speeach

Prompt: i agree my fellow cracker
Label:  hate speeach
Prediction: hate speeach

Prompt: nah electric fairy tale seems cool but id rather buy some drugs with that money
Label:  Neutral
Prediction: Neutral

Prompt: poor people are great.... at being everywhere! It's like the plague!
Label:  hate speeach
Prediction: hate speeach

Prompt: lets cripple the vanity industry how actually start liking who we are and how we look
Label:  Neutral
Prediction: Neutral

Prompt: gamboa hoes will be hoes
Label:  hate speeach
Prediction: Neutral

Prompt: im sure  would add this helpful parenthetical explanation if t