# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique:
* Model:
* Evaluation approach:
* Fine-tuning dataset:

In [1]:
# !pip install transformers datasets peft bitsandbytes accelerate
!pip install evaluate transformers datasets torch bitsandbytes accelerate

Defaulting to user installation because normal site-packages is not writeable
Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl (84 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.0/84.0 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m00:01[0m
Installing collected packages: evaluate
[0mSuccessfully installed evaluate-0.4.3


In [11]:
!pip install scikit-learn

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Defaulting to user installation because normal site-packages is not writeable
Collecting scikit-learn
  Downloading scikit_learn-1.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.5/13.5 MB[0m [31m55.2 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting joblib>=1.2.0
  Downloading joblib-1.4.2-py3-none-any.whl (301 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m301.8/301.8 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hCollecting threadpoolctl>=3.1.0
  Downloading threadpoolctl-3.5.0-py3-none-any.whl (18 kB)
Installing collected packages: threadpoolctl, joblib, scikit-learn
Successfully installed joblib-1.4.2 scikit-learn-1.6.1 threadpoolctl-3.5.0


## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [13]:
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer, Trainer, TrainingArguments
from datasets import load_dataset
import evaluate
import numpy as np

In [3]:
# Load pretrained model and tokenizer
model_name = "bert-base-uncased"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_name)

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

In [22]:
dataset = load_dataset("imdb")
small_train_dataset = dataset["train"].shuffle(seed=42).select(range(10000))
small_test_dataset = dataset["test"].shuffle(seed=42).select(range(2000))

In [23]:
dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label'],
        num_rows: 50000
    })
})

In [24]:
# Tokenization function
def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=128)

# Apply tokenization
tokenized_train = small_train_dataset.map(preprocess_function, batched=True)
tokenized_test = small_test_dataset.map(preprocess_function, batched=True)

Map:   0%|          | 0/10000 [00:00<?, ? examples/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

In [25]:
tokenized_train.set_format("torch", columns=["input_ids", "attention_mask", "label"])
tokenized_test.set_format("torch", columns=["input_ids", "attention_mask", "label"])


In [26]:
# Define evaluation metric
metric = evaluate.load("accuracy")

In [27]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)


In [28]:
training_args = TrainingArguments(
    output_dir="./results",
    per_device_eval_batch_size=8,
    logging_dir="./logs",
)

trainer = Trainer(
    model=model,
    args=training_args,
    eval_dataset=tokenized_test,
    compute_metrics=compute_metrics,
)

pretrained_eval_results = trainer.evaluate()
print("Pretrained Model Evaluation Results:", pretrained_eval_results)

Pretrained Model Evaluation Results: {'eval_loss': 0.6863271594047546, 'eval_accuracy': 0.551, 'eval_runtime': 6.6746, 'eval_samples_per_second': 299.642, 'eval_steps_per_second': 37.455}


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [29]:
# Required imports for PEFT
from peft import LoraConfig, get_peft_model
from bitsandbytes.optim import AdamW8bit

# Create PEFT model with LoRA
lora_config = LoraConfig(
    r=16,  # Rank of the low-rank update
    lora_alpha=32,  # Scaling factor
    target_modules=["query", "value"],  # Target attention layers in BERT
    lora_dropout=0.1,
    bias="none",
    task_type="SEQ_CLS",
)

peft_model = get_peft_model(model, lora_config)
peft_model.print_trainable_parameters()

trainable params: 589,824 || all params: 110,075,140 || trainable%: 0.5358376105631117


In [32]:
# Training arguments
training_args = TrainingArguments(
    output_dir="./peft_results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=5,  # At least 1 epoch
    weight_decay=0.01,
    logging_dir="./peft_logs",
    save_strategy="epoch",
    load_best_model_at_end=True,
)

# Set up trainer with QLoRA (8-bit optimizer)
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_test,
    compute_metrics=compute_metrics,
    optimizers=(AdamW8bit(peft_model.parameters(), lr=2e-5), None),  # QLoRA
)

In [33]:
trainer.train()

# Save the PEFT model weights
output_dir = "./peft_model_weights"
peft_model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)  # Optional: save tokenizer
print(f"PEFT model saved to {output_dir}")

Epoch,Training Loss,Validation Loss,Accuracy
1,0.5625,0.433655,0.812
2,0.4002,0.38452,0.837
3,0.3916,0.36996,0.844
4,0.3769,0.362441,0.85
5,0.3597,0.363849,0.8475


PEFT model saved to ./peft_model_weights


## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

#### Load the fine-Tuned LoRA Model for inference

In [34]:
# Required imports for loading PEFT model
from peft import PeftModel

# Load the saved PEFT model
loaded_model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
loaded_peft_model = PeftModel.from_pretrained(loaded_model, output_dir)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [35]:
# Set up trainer for evaluation
trainer = Trainer(
    model=loaded_peft_model,
    args=training_args,
    eval_dataset=tokenized_test,
    compute_metrics=compute_metrics,
)

In [36]:
# Evaluate fine-tuned model
finetuned_results = trainer.evaluate()
print("Fine-Tuned Model Evaluation Results:", finetuned_results)

# Compare results
print("\nComparison of Pretrained vs Fine-Tuned Model:")
print(f"Pretrained Accuracy: {pretrained_results['eval_accuracy']:.4f}")
print(f"Fine-Tuned Accuracy: {finetuned_results['eval_accuracy']:.4f}")

Fine-Tuned Model Evaluation Results: {'eval_loss': 0.3624410331249237, 'eval_accuracy': 0.85, 'eval_runtime': 6.3049, 'eval_samples_per_second': 317.213, 'eval_steps_per_second': 39.652}

Comparison of Pretrained vs Fine-Tuned Model:
Pretrained Accuracy: 0.5300
Fine-Tuned Accuracy: 0.8500


In [42]:
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
print(f"Using device: {device}")

def predict_sentence(sentence, model, tokenizer, device):
    # Tokenize the input sentence
    inputs = tokenizer(sentence, truncation=True, padding="max_length", max_length=64, return_tensors="pt")
    inputs = {key: val.to(device) for key, val in inputs.items()}  # Move to device
    
    # Perform inference
    model.eval()
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        prediction = torch.argmax(logits, dim=-1).item()
    
    # Map prediction to label
    label = "Positive" if prediction == 1 else "Negative"
    return label

# Test sentence (you can change this)
test_sentence = dataset['test'][988]['text']

# Test with pretrained model
pretrained_prediction = predict_sentence(test_sentence, model, tokenizer, device)
print(f"\nPretrained Model Prediction for '{test_sentence}': {pretrained_prediction}")

# Test with fine-tuned model
finetuned_prediction = predict_sentence(test_sentence, loaded_peft_model, tokenizer, device)
print(f"Fine-Tuned Model Prediction for '{test_sentence}': {finetuned_prediction}")

Using device: cuda

Pretrained Model Prediction for 'First there was Tsui Hark's Zu Warriors (2001), which is visually ground-breaking, but much lacking in the acting and writing departments, now this movie, which is visually almost as good as Zu (though no longer ground-breaking), but is even worse in the acting and writing departments. It's really sad that there seems to be an almost complete lack of acting and writing talents in the HK movie industry. I guess you need to understand Cantonese to understand how bad and vulgar the dialogs in the movie really are. It's like some delinquent kids talking in the street, it's that bad. To make it worse, the actors and actresses themselves look like delinquent kids, and can't act even if their life depend on it. I understand that this movie is supposed to be a comedy aimed at the younger generation in HK, but has HK youths really become so brain-dead that they can't appreciate anything but such juvenile and vulgar acting/writing? If that's t