# **Udacity Project #1: Apply Lightweight Fine-Tuning to a Foundation Model**

### By: Vijay Nadkarni
### Email: vjnadkarni@gmail.com
### Course: Generative AI

This project implements parameter-efficient fine-tuning (PEFT) using low rank adaptation (LoRA). The model used is "google/flan-t5-base" and the dataset is "SAMSum". The fine-tuning is done using a subset of the SAMSum dataset.

TODO: In this cell, describe your choices for each of the following

* **PEFT technique:** This project performs parameter-efficient fine-tuning (PEFT) using quantized low rank adaptation (LoRA). In this project the weights of the selected model will be frozen by way of the 'peft' library (i.e. no further training) and only the weights of the adapter will be trained. This will allow the original LLM to provide a baseline response which the PEFT-tuned adapter will improve upon. The extent of improvement will be measured quantitatively using Rouge metrics -- at the end a direct before and after comparison will be performed.
* **Model:** Since this is a text-to-text implementation, the AutoModelForSeq2SeqLM model from Hugging Face is used. The specific model chosen is 'google/flan-t5-base' which is highly regarded for text-to-text applications. The flan-t5 LLM is an encoder-decoder model that works well for text-to-text (language-to-language) tasks such as interactive dialog, text summarization, translation and transcription. In this project, interactive dialog between two parties will be implemented and its quality evaluated.
* **Evaluation approach:** Rouge metrics are used to compare quantitatively the output of the original model with that of the PEFT/LoRA model. If the Rouge metrics with PEFT/LoRA tuning are significantly better than those of the Original non-finetuned model, then the fine tuning will have been successful.
* **Fine-tuning dataset:** The SAMSum dataset by Samsung is used. This dataset maps well to the AutoModelForSeq2SeqLM model and is well suited to the flan-t5 model instance. It contains around 16,000 conversations of all kinds between two people in their daily lives. These conversations are a mix of formal, semi-formal and informal, and may contain slang words, misspellings, grammatical imperfections, emotions and profanities. They are a good sample of conversations that take place between people in their regular lives.

## **Log into Hugging Face with Access Token**

In [None]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## **Install and/or upgrade pip, Hugging Face and Rouge libraries as needed**

In [None]:
!pip install -U pip
!pip install -U transformers -q
!pip install -U datasets -q
!pip install -U accelerate -q
!pip install -U bitsandbytes -q
!pip install -U evaluate -q
!pip install -U loralib -q
!pip install -U peft -q
!pip install -U py7zr -q
!pip install -U rouge_score -q

[0m

In [None]:
# Check for type of GPU
!nvidia-smi -L

GPU 0: NVIDIA L4 (UUID: GPU-92e75326-0265-2379-3927-5594f318fb59)


## **Import the required libraries from Hugging Face, etc.**

In [None]:
import transformers, datasets, accelerate, evaluate, loralib, peft, py7zr
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, TrainingArguments, Trainer, GenerationConfig
import bitsandbytes as bnb   # have model run in 8 bits
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, TaskType
import torch
import torch.nn as nn
import numpy as np
import pandas as pd
import random
import time

## **Load the 'google/flan-t5-base' model and create tokenizer**
The flan-t5 LLM is an encoder-decoder model that works well for text-to-text (language-to-language) tasks such as interactive dialog, text summarization, translation and transcription. In this project, interactive dialog between two parties will be implemented and its quality evaluated.

In [None]:
model_name='google/flan-t5-base'

original_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


## **Load the 'SAMSum' dataset (by Samsung)**
This dataset contains around 16,000 messenger-like conversations with summaries, similar to conversations between people in daily life. These conversations are a mix of formal, semi-formal and informal, and may contain slang words, misspellings, grammatical imperfections, emotions and profanities. They are a good sample of conversations that take place between people on a daily basis. An interesting feature of the SAMSun dataset was that it contained human annotated summarizations of each conversation in it. These could be considered as 'labels' and provided a useful avenue to validate the results.

In [None]:
dataset_name = "samsum"
dataset = load_dataset(dataset_name)
dataset

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary'],
        num_rows: 14732
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary'],
        num_rows: 819
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary'],
        num_rows: 818
    })
})

## **Perform a sanity-check of the model to make sure it works**

In [None]:
num = 77  # random index

dialog = dataset['test'][num]['dialogue']
summary = dataset['test'][num]['summary']

prompt = f"""
Summarize the following conversation.

{dialog}

Summary:
"""

inputs = tokenizer(prompt, return_tensors='pt')
output = tokenizer.decode(
    original_model.generate(
        inputs["input_ids"],
        max_new_tokens=200,
    )[0],
    skip_special_tokens=True
)

dashed_line = '---------------------------------------------'
print(dashed_line)
print(f'Input Query to LLM:\n{prompt}')
print(dashed_line)
print(f'Human Annotated Summary:\n{summary}\n')
print(dashed_line)
print(f'Model Output - Non-PEFT:\n{output}')

---------------------------------------------
Input Query to LLM:

Summarize the following conversation.

Mary: Did you tell your sister I am doing online job?
Mark: yes !
Mary: why
Mark: because she keep saying your good for nothing?
Mary: dint I tell you I don’t care?
Mark: what happened?
Mary: see I don’t want to prove anything to anyone..
Mark: I know… but I was just feeling proud so it was kind of show off…
Mary: she is asking everyone… and trying to get to the people I am working for
Mark: really!! I am sorry for that…
Mary: don’t be! I understand your feelings…  but u know how she is…
Mark: I know!! :? 
Mary: don’t be sad now its ok.. she cant do much about it… chill its ok but just be careful
Mark: I will be ..
Mary: btw it feels good that she is so jealous :P
Mark: lol my aim was to make her feel jealous
Mary: but i dont like it that she tries to contact the people i am working for ... what does she want?
Mark: may be she wants to confirm if its true... becau

## **Perform preprocessing of the SAMSum dataset**

In [None]:
def tokenize_function(example):
    start_prompt = 'Summarize the following conversation.\n\n'
    end_prompt = '\n\nSummary: '
    prompt = [start_prompt + dialog + end_prompt for dialog in example["dialogue"]]
    example['input_ids'] = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt").input_ids
    example['labels'] = tokenizer(example["summary"], padding="max_length", truncation=True, return_tensors="pt").input_ids

    return example

# The dataset contains 3 different splits. Tokenize function is handling all of these splits
tokenized_datasets = dataset.map(tokenize_function, batched=True)
tokenized_datasets = tokenized_datasets.remove_columns(['id', 'dialogue', 'summary',])
tokenized_datasets = tokenized_datasets.filter(lambda example, num: num % 12 == 0, with_indices=True)

Map:   0%|          | 0/819 [00:00<?, ? examples/s]

Filter:   0%|          | 0/819 [00:00<?, ? examples/s]

## **Print the shapes of pre-processed dataset**

In [None]:
print(f"Shapes of the datasets:")
print(f"Training: {tokenized_datasets['train'].shape}")
print(f"Validation: {tokenized_datasets['validation'].shape}")
print(f"Test: {tokenized_datasets['test'].shape}")

print(tokenized_datasets)

Shapes of the datasets:
Training: (1228, 2)
Validation: (69, 2)
Test: (69, 2)
DatasetDict({
    train: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 1228
    })
    test: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 69
    })
    validation: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 69
    })
})


## **Perform parameter-efficient fine tuning (PEFT) on the model**
### We will set up the PEFT/LoRA model for fine-tuning with a new layer/parameter adapter. Using PEFT/LoRA, we can freeze the weights of the original LLM and only train the adapter. In the LoRA configuration below, the rank (r) hyper-parameter defines the rank/dimension of the adapter to be trained.

In [None]:
from peft import LoraConfig, get_peft_model, TaskType

# Set up the LoRA hyperparameters
lora_config = LoraConfig(
    r=32, # Rank of the low-rank matrices
    lora_alpha=32, # Similar to learning rate
    target_modules=["q", "v"], # Targeting query and key layers
    lora_dropout=0.05, # Similar to dropout in neural networks
    bias="none",
    task_type=TaskType.SEQ_2_SEQ_LM # FLAN-T5 task type
)

peft_model = get_peft_model(original_model,
                            lora_config)

## **Define the training argument and create the Trainer instance**

In [None]:
output_dir = f'./peft_flan_t5_samsum-{str(int(time.time()))}'

peft_training_arguments = TrainingArguments(
    output_dir=output_dir,
    auto_find_batch_size=True, # Automatically computes the largest batch size possible
    learning_rate=1e-3, # Will be higher compared to LR for finetuning
    weight_decay=0.01,
    num_train_epochs=10,
    logging_steps=50,
)

peft_trainer = Trainer(
    model=peft_model,
    args=peft_training_arguments,
    train_dataset=tokenized_datasets["train"],
)

## **Train the PEFT/LoRA model and measure the training time**

In [None]:
time_start = time.time()
peft_trainer.train()   # this is where the PEFT model 'peft_trainer' is trained.
time_end = time.time()

training_time = time_end - time_start

print(f'Time taken to train the model for 10 epochs using LoRA is: {training_time} seconds')

Step,Training Loss
50,9.7098
100,0.3665
150,0.2036
200,0.1677
250,0.1442
300,0.1377
350,0.1294
400,0.3449
450,0.1196
500,0.1215




Time taken to train the model for 10 epochs using LoRA is: 910.8612849712372 seconds


### **Note:** Training loss was found to decrease nicely! It appeared to converge to an asymptote of ~ 0.1 after around 500 steps and did not change much after that.

## **Save the trained model to Google Colab**

In [None]:
peft_model_path="./peft_flan_t5_samsum"

peft_trainer.model.save_pretrained(peft_model_path)
tokenizer.save_pretrained(peft_model_path)



('./peft_flan_t5_samsum/tokenizer_config.json',
 './peft_flan_t5_samsum/special_tokens_map.json',
 './peft_flan_t5_samsum/spiece.model',
 './peft_flan_t5_samsum/added_tokens.json',
 './peft_flan_t5_samsum/tokenizer.json')

## **Prepare the PEFT model (adapter)**
Add an adapter to the original flan-t5-base model. In this adapter, training is turned off since we want only to perform inference with the PEFT model, and not train the PEFT adapter any further.

In [None]:
from peft import PeftModel, PeftConfig

peft_model_base = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base", torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-base")

peft_model = PeftModel.from_pretrained(peft_model_base,
                                       peft_model_path,
                                       torch_dtype=torch.bfloat16,
                                       is_trainable=False)   # prevent further training of original model

## **Push the trained PEFT adapter to Hugging Face Hub (vjnadkarni acct)**
Adapter name on Hugging Face Hub: vjnadkarni/peft_flan_t5_samsum

In [None]:
peft_model.push_to_hub("vjnadkarni/peft_flan_t5_samsum",
                         use_auth_token=True,
                         commit_message="Initial commit",
                         private=True,
                         )



adapter_model.safetensors:   0%|          | 0.00/7.10M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/vjnadkarni/peft_flan_t5_samsum/commit/0766802f05995ce225e0d0b521553d6522b5a44e', commit_message='Initial commit', commit_description='', oid='0766802f05995ce225e0d0b521553d6522b5a44e', pr_url=None, pr_revision=None, pr_num=None)

## **Load the trained PEFT adapter from Hugging Face Hub ("vjnadkarni/peft_flan_t5_samsum") and merge it with base "google/flan-t5-base" model**

In [None]:
from peft import PeftModel, PeftConfig

peft_model_id = "vjnadkarni/peft_flan_t5_samsum"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(
    config.base_model_name_or_path,
    torch_dtype=torch.bfloat16,
    load_in_8bit=True,
    return_dict=True,
    )

# Load the PEFT/LoRA model and merge it with base model
peft_model = PeftModel.from_pretrained(model, peft_model_id)

adapter_config.json:   0%|          | 0.00/638 [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
`low_cpu_mem_usage` was None, now set to True since model is quantized.


adapter_model.safetensors:   0%|          | 0.00/7.10M [00:00<?, ?B/s]

## **Assign the original model to CPU**

Uncomment print statement if structure of original model is needed. The output is a few pages long.



In [None]:
original_model_to_cpu = original_model.to('cpu')
# print(f"Structure of original model: {original_model_to_cpu}")  # Uncomment, if structure of model needs to be printed

## **Print a sample conversation between two people from a random index**

In [None]:
num = 20  # random index
dialog = dataset['test'][num]['dialogue']
human_annotated_summary = dataset['test'][num]['summary']
dashed_line = '---------------------------------------------'

prompt = f"""
Summarize the following conversation.

{dialog}

Summary: """

input_ids = tokenizer(prompt, return_tensors="pt").input_ids

original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)

peft_model_outputs = peft_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
peft_model_text_output = tokenizer.decode(peft_model_outputs[0], skip_special_tokens=True)

print(f'Directive to LLM: \n{prompt}')
print(dashed_line)
print(f'Human Annotated Summary (dataset label):\n{human_annotated_summary}')
print(dashed_line)
print(f'Original Model Summary:\n{original_model_text_output}')
print(dashed_line)
print(f'PEFT/LoRA Model Summary:\n{peft_model_text_output}')

Token indices sequence length is longer than the specified maximum sequence length for this model (603 > 512). Running this sequence through the model will result in indexing errors


Directive to LLM: 

Summarize the following conversation.

Deirdre: Hi Beth, how are you love?
Beth: Hi Auntie Deirdre, I'm been meaning to message you, had a favour to ask.
Deirdre: Wondered if you had any thought about your Mum's 40th, we've got to do something special!
Beth: How about a girls weekend, just mum, me, you and the girls, Kira will have to come back from Uni, of course.
Deirdre: Sounds fab! Get your thinking cap on, it's only in 6 weeks! Bet she's dreading it, I remember doing that!
Beth: Oh yeah, we had a surprise party for you, you nearly had a heart attack! 
Deirdre: Well, it was a lovely surprise! Gosh, thats nearly 4 years ago now, time flies! What was the favour, darling?
Beth: Oh, it was just that I fancied trying a bit of work experience in the salon, auntie.
Deirdre: Well, I am looking for Saturday girls, are you sure about it? you could do well in the exams and go on to college or 6th form.
Beth: I know, but it's not for me, auntie, I am doing all foun

## **Subjectively (qualitatively) compare the human-annotated (dataset), original model and PEFT model summaries**


In [None]:
num_start = 50  # randomly selected index
range = 10
dialogs = dataset['test'][num:num+10]['dialogue']
human_annotated_summaries = dataset['test'][num:num+10]['summary']

original_model_summaries = []
peft_model_summaries = []

for idx, dialog in enumerate(dialogs):
    prompt = f"""
Summarize the following conversation.

{dialog}

Summary: """

    input_ids = tokenizer(prompt, return_tensors="pt").input_ids

    original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200))
    original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)

    peft_model_outputs = peft_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200))
    peft_model_text_output = tokenizer.decode(peft_model_outputs[0], skip_special_tokens=True)

    original_model_summaries.append(original_model_text_output)
    peft_model_summaries.append(peft_model_text_output)

zipped_summaries = list(zip(human_annotated_summaries, original_model_summaries, peft_model_summaries))

df = pd.DataFrame(zipped_summaries, columns = ['Human_annotated_summaries', 'Original_model_summaries', 'PEFT_model_summaries'])
df

Unnamed: 0,Human_annotated_summaries,Original_model_summaries,PEFT_model_summaries
0,Beth wants to organize a girls weekend to cele...,Deirdre and Beth are going to have a girls wee...,Beth wants to try a bit of work experience in ...
1,Gloria has an exam soon. It lasts 4 hours. Emm...,Gloria and Emma are going to a university exam...,Gloria recommends a website for the exam.
2,Adam and Karen are worried that May suffers fr...,May is depressed and has trouble sleeping. Kar...,May is depressed and she doesn't want to go ou...
3,Mark lied to Anne about his age. Mark is 40.,Mark lied to Anne. Irene saw Mark's passport.,Mark told Anne that he's 30 and he's 40. He to...
4,"Next week is Wharton's birthday. Augustine, Da...",Wharton's birthday is next week. They will buy...,Wharton's birthday is next week. They need to ...
5,Kelly is scared of sculpture garden figures in...,Kelly doesn't want to go to a sculpture garden...,Ollie is going to a sculpture garden in Finnla...
6,Selah called a person that did not pick up.,Selah can't see the phone number.[/re«mailâ.än...,Myah can't see the phone number of the person....
7,Bella and Eric dismissed a request of a client...,Bella is angry at her boss because he was not ...,Bella's boss appreciated the decision to dismi...
8,Emma is about to take a nap in the back of the...,Emma is at the bus. Ben will wake her up aroun...,Emma is on the rare bus to NY. She will be the...
9,"Jesse, Melvin, Lee and Maxine are going to tak...",Jesse is thinking of doing something for the l...,Jesse is thinking about doing something for th...


### **Assessment:** The PEFT model summaries appear to be generally better than the Original model summaries in the table above. The Human Annotated summaries are usually the best of the lot, but that is to be expected because those summaries are actually labeled data by real human beings who understood the conversation. In a few cases Original model summaries devolve into gibberish near the end of the sentence, and occasionally they are grammatically or syntactically incorrect. The PEFT model summaries have a greater similarity to the Human Annotated summaries than the Original model summaries, implying that their loss (difference between inference and ground truth) is lower than that of the Original model summaries.

## **Compare the Original model and PEFT/LoRA model quantitatively by computing their Rouge metrics**

---



In [None]:
rouge = evaluate.load('rouge')

original_model_metrics = rouge.compute(
    predictions=original_model_summaries, # Summaries generated using the base model
    references=human_annotated_summaries[0:len(original_model_summaries)], # Reference summaries by humans
    use_aggregator=True,
    use_stemmer=True,
)

peft_model_metrics = rouge.compute(
    predictions=peft_model_summaries, # Summaries generated using the fine-tuned model
    references=human_annotated_summaries[0:len(peft_model_summaries)], # Reference summaries by humans
    use_aggregator=True,
    use_stemmer=True,
)

print(f"Comparison of Rouge metrics of Original model and PEFT/LoRA model\n")
print('Original non-finetuned model metrics:')
print(f"{original_model_metrics}\n{dashed_line}")
print('PEFT/LoRA model metrics:')
print(f"{peft_model_metrics}\n{dashed_line}")

rouge1_improvement = ((peft_model_metrics['rouge1']-original_model_metrics['rouge1'])/original_model_metrics['rouge1']) * 100
rouge2_improvement = ((peft_model_metrics['rouge2']-original_model_metrics['rouge2'])/original_model_metrics['rouge2']) * 100
rougeL_improvement = ((peft_model_metrics['rougeL']-original_model_metrics['rougeL'])/original_model_metrics['rougeL']) * 100
rougeLsum_improvement = ((peft_model_metrics['rougeLsum']-original_model_metrics['rougeLsum'])/original_model_metrics['rougeLsum']) * 100
print(f"\nPEFT/LoRA Rouge score improvements relative to Original non-finetuned model")
print(f"    rouge1: {rouge1_improvement:.1f}%")
print(f"    rouge2: {rouge2_improvement:.1f}%")
print(f"    rougeL: {rougeL_improvement:.1f}%")
print(f"    rougeLsum: {rougeLsum_improvement:.1f}%")

Comparison of Rouge metrics of Original model and PEFT/LoRA model

Original non-finetuned model metrics:
{'rouge1': 0.32701318152895137, 'rouge2': 0.1102543854421201, 'rougeL': 0.2637243740527735, 'rougeLsum': 0.2633401776356087}
---------------------------------------------
PEFT/LoRA model metrics:
{'rouge1': 0.37927085578805847, 'rouge2': 0.16918833085511772, 'rougeL': 0.2945923370749601, 'rougeLsum': 0.29713064364088004}
---------------------------------------------

PEFT/LoRA Rouge score improvements relative to Original non-finetuned model
    rouge1: 16.0%
    rouge2: 53.5%
    rougeL: 11.7%
    rougeLsum: 12.8%


### **Assessment:** The Rouge scores for the PEFT/LoRA model are significantly higher than the same scores for the Original model. Of the 4 Rouge metrics, the improvements ranged from 11.7% to 53.5%. This implies that the fine tuning was successful and able to provide results superior to the Original model.

# Conclusion

### In this project, the PEFT/LoRA model that was implemented gave successful results, quantitatively measured by way of the Rouge library. All 4 Rouge metrics, rouge1, rouge2, rougeL and rougeLsum improved significantly for the PEFT/LoRA model by 16.0%, 53.5%, 11.7% and 12.8% respectively, across the dataset.

### In addition, even when evaluated subjectively for 10 random conversations in the dataset, the PEFT/LoRA model appeared to produce better quality summarizations of the conversations compared with the original model.

### Learnings from the project were several:
- Understanding of models that are amenable for PEFT & LoRA improvements
- How to use Hugging Face Hub, navigate its vast collection of models, tokenizers, datasets, upload/download one's own models to/from it and understand the meanings of various configuration parameters
- Insights into the selection of suitable models and datasets with which to perform the PEFT training
- Comparing the performance of non-finetuned and PEFT-tuned models, quantitatively as well as qualitatively
- Learning how to merge original models with PEFT-trained adapters to create a superset model that provides superior results, while retaining the weights of the original model.