**Author: *Abdul Hanan Bin Saeed***

**Action Learning Team: 7**

**Specialization: *DSA***

# FINETUNING GPT-2 (PEFT)

**Objective**

The objective of this notebook is to finetune GPT-2, using the technique; PEFT. We specifically finetune the model for achieving better performance in text-summarization tasks in the context of zero-shot learning.

**Dataset Info**

**DialogSum dataset**

For this purpose we utilize the DialogSum dataset.

How to access the Dataset: https://huggingface.co/datasets/neil-code/dialogsum-test

* The dataset consists of 4 data fields; id, dialogue, summary, topic
* The dataset has 3 splits: train, validation, and test. With 12 460, 500 and 1 500 records respectively.
* The language of each record in this dataset is English.

Further information about the Datasets is provided on the Hugging Face website.

**Importing/Installing the relevant Libraries**

In [1]:
!pip install -q -U bitsandbytes transformers peft accelerate datasets scipy einops evaluate trl rouge_score

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf 24.4.1 requires cubinlinker, which is not installed.
cudf 24.4.1 requires cupy-cuda11x>=12.0.0, which is not installed.
cudf 24.4.1 requires ptxcompiler, which is not installed.
cuml 24.4.0 requires cupy-cuda11x>=12.0.0, which is not installed.
dask-cudf 24.4.1 requires cupy-cuda11x>=12.0.0, which is not installed.
apache-beam 2.46.0 requires dill<0.3.2,>=0.3.1.1, but you have dill 0.3.8 which is incompatible.
apache-beam 2.46.0 requires numpy<1.25.0,>=1.14.3, but you have numpy 1.26.4 which is incompatible.
apache-beam 2.46.0 requires pyarrow<10.0.0,>=3.0.0, but you have pyarrow 16.1.0 which is incompatible.
beatrix-jupyterlab 2023.128.151533 requires jupyterlab~=3.6.0, but you have jupyterlab 4.2.1 which is incompatible.
cudf 24.4.1 requires cuda-python<12.0a0,>=11.7.1, but you have cuda-python 12.5.0 

In [2]:
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    GenerationConfig
)
from tqdm import tqdm
from trl import SFTTrainer
import torch
import time
import pandas as pd
import numpy as np
from huggingface_hub import interpreter_login

interpreter_login()

2024-06-27 12:09:41.246553: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-27 12:09:41.246679: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-27 12:09:41.365891: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered



    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .


Enter your token (input will not be visible):  ·····································
Add token as git credential? (Y/n)  n


Token is valid (permission: fineGrained).
Your token has been saved to /root/.cache/huggingface/token
Login successful


**Disabling weights and biases**

In [3]:
import os
os.environ['WANDB_DISABLED']="true"

**Loading the Dataset**

We utilize the Dialogsum dataset, from Hugging Face.

In [4]:
dataset_name = "neil-code/dialogsum-test"
df = load_dataset(dataset_name)

Downloading readme:   0%|          | 0.00/4.56k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.81M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/441k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/447k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1999 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/499 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/499 [00:00<?, ? examples/s]

**Having a better understanding of the structure of the dataset:**

In [5]:
df['train'][0]

{'id': 'train_0',
 'dialogue': "#Person1#: Hi, Mr. Smith. I'm Doctor Hawkins. Why are you here today?\n#Person2#: I found it would be a good idea to get a check-up.\n#Person1#: Yes, well, you haven't had one for 5 years. You should have one every year.\n#Person2#: I know. I figure as long as there is nothing wrong, why go see the doctor?\n#Person1#: Well, the best way to avoid serious illnesses is to find out about them early. So try to come at least once a year for your own good.\n#Person2#: Ok.\n#Person1#: Let me see here. Your eyes and ears look fine. Take a deep breath, please. Do you smoke, Mr. Smith?\n#Person2#: Yes.\n#Person1#: Smoking is the leading cause of lung cancer and heart disease, you know. You really should quit.\n#Person2#: I've tried hundreds of times, but I just can't seem to kick the habit.\n#Person1#: Well, we have classes and some medications that might help. I'll give you more information before you leave.\n#Person2#: Ok, thanks doctor.",
 'summary': "Mr. Smith'

From the above output, we can infer that each record in this dataset consists of some dialogues between people, a summary of the conversation and a topic directing towards the main central idea of the conversation.

**Specifying the Quantization to be performed**

In [6]:
compute_dtype = getattr(torch, "float16")
bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type='nf4',
        bnb_4bit_compute_dtype=compute_dtype,
        bnb_4bit_use_double_quant=False,
    )

**Loading the GPT-2 Model**

In this section, we load the pre-trained GPT-2 Model from Hugging Face.

**About the Model:** *GPT-2 is developed by OpenAI, based on the Transformer architecture influenced by the paper “Attention is all you need”. The model can be used for various applications; text-generation being the most suitable. The model makes use of self-attention mechanisms for a better understanding of relationships and dependencies between the input text. The number of parameters vary for each variation of GPT-2, however the model used for this study is the smallest version of GPT-2, with 124M parameters. Moreover, this model is trained on WebText in a self-supervised manner. This model can exhibit biases in its behaviour and can produce limited results.* 

In [7]:
model_name='openai-community/gpt2'
device_map = {"": 0}
original_model = AutoModelForCausalLM.from_pretrained(model_name, 
                                                      device_map=device_map,
                                                      quantization_config=bnb_config,
                                                      trust_remote_code=True,
                                                      use_auth_token=True)



config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

**Setting up the Tokenizer**

We incorporate left-padding for a better memory-usage during the training phase.

In [8]:
tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True,padding_side="left",add_eos_token=True,add_bos_token=True,use_fast=False)
tokenizer.pad_token = tokenizer.eos_token

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

**Evaluating the base model**

In [9]:
%%time
from transformers import set_seed, pipeline
seed = 42
set_seed(seed)

gen = pipeline("text-generation", model=original_model, tokenizer=tokenizer)

index = 10

prompt = df['test'][index]['dialogue']
summary = df['test'][index]['summary']

formatted_prompt = f"Instruct: Summarize the following conversation.\n{prompt}\nOutput:\n"
res = gen(formatted_prompt, max_new_tokens=200, do_sample=True, temperature=0.7, top_p=0.9)
#print(res[0])
output = res[0]['generated_text'].split('Output:\n')[1].strip() if 'Output:\n' in res[0]['generated_text'] else "No output generated."

# Print the results
dash_line = '-' * 100
print(dash_line)
print(f'INPUT PROMPT:\n{formatted_prompt}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - ZERO SHOT:\n{output}')

----------------------------------------------------------------------------------------------------
INPUT PROMPT:
Instruct: Summarize the following conversation.
#Person1#: Happy Birthday, this is for you, Brian.
#Person2#: I'm so happy you remember, please come in and enjoy the party. Everyone's here, I'm sure you have a good time.
#Person1#: Brian, may I have a pleasure to have a dance with you?
#Person2#: Ok.
#Person1#: This is really wonderful party.
#Person2#: Yes, you are always popular with everyone. and you look very pretty today.
#Person1#: Thanks, that's very kind of you to say. I hope my necklace goes with my dress, and they both make me look good I feel.
#Person2#: You look great, you are absolutely glowing.
#Person1#: Thanks, this is a fine party. We should have a drink together to celebrate your birthday
Output:

----------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# attends Brian's birthday 

**Dataset Pre-processing Stage**

To make this dataset suitable for model training, we define functions for data preprocessing where text-samples from the dataset are formatted for the model and suitable prompts are constructed. The formatted prompt contains INTRO_BLURB, INSTRUCTION_KEY, RESPONSE_KEY, END_KEY, where INSTRUCTION_KEY being "### Instruct: Summarize the below conversation." Another function ensures that the input sequence is compatible with the length that the model can handle, after checking the model configuration and setting the default value to 1024. Similarly, the data in batches is tokenized and a longer sequence of samples is truncated. The dataset is also shuffled according to the provided seed. Consequently, each sample of data is tokenized, formatted and pre-processed with respect to the model being fine tuned.

In [10]:
def create_prompt_formats(sample):
    """
    Format various fields of the sample ('instruction','output')
    Then concatenate them using two newline characters 
    :param sample: Sample dictionnary
    """
    INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
    INSTRUCTION_KEY = "### Instruct: Summarize the below conversation."
    RESPONSE_KEY = "### Output:"
    END_KEY = "### End"
    
    blurb = f"\n{INTRO_BLURB}"
    instruction = f"{INSTRUCTION_KEY}"
    input_context = f"{sample['dialogue']}" if sample["dialogue"] else None
    response = f"{RESPONSE_KEY}\n{sample['summary']}"
    end = f"{END_KEY}"
    
    parts = [part for part in [blurb, instruction, input_context, response, end] if part]

    formatted_prompt = "\n\n".join(parts)
    sample["text"] = formatted_prompt

    return sample

In [11]:
from functools import partial

def get_max_length(model):
    conf = model.config
    max_length = None
    for length_setting in ["n_positions", "max_position_embeddings", "seq_length"]:
        max_length = getattr(model.config, length_setting, None)
        if max_length:
            print(f"Found max lenth: {max_length}")
            break
    if not max_length:
        max_length = 1024
        print(f"Using default max length: {max_length}")
    return max_length


def preprocess_batch(batch, tokenizer, max_length):
    """
    Tokenizing a batch
    """
    return tokenizer(
        batch["text"],
        max_length=max_length,
        truncation=True,
    )

def preprocess_dataset(tokenizer: AutoTokenizer, max_length: int,seed, dataset):
    """Format & tokenize it so it is ready for training
    :param tokenizer (AutoTokenizer): Model Tokenizer
    :param max_length (int): Maximum number of tokens to emit from tokenizer
    """
    
    print("Preprocessing dataset...")
    dataset = dataset.map(create_prompt_formats)#, batched=True)
    
    
    _preprocessing_function = partial(preprocess_batch, max_length=max_length, tokenizer=tokenizer)
    dataset = dataset.map(
        _preprocessing_function,
        batched=True,
        remove_columns=['id', 'topic', 'dialogue', 'summary'],
    )


    dataset = dataset.filter(lambda sample: len(sample["input_ids"]) < max_length)

    dataset = dataset.shuffle(seed=seed)

    return dataset

In [12]:
max_length = get_max_length(original_model)
print(max_length)

train_dataset = preprocess_dataset(tokenizer, max_length,seed, df['train'])
eval_dataset = preprocess_dataset(tokenizer, max_length,seed, df['validation'])

Found max lenth: 1024
1024
Preprocessing dataset...


Map:   0%|          | 0/1999 [00:00<?, ? examples/s]

Map:   0%|          | 0/1999 [00:00<?, ? examples/s]

Filter:   0%|          | 0/1999 [00:00<?, ? examples/s]

Preprocessing dataset...


Map:   0%|          | 0/499 [00:00<?, ? examples/s]

Map:   0%|          | 0/499 [00:00<?, ? examples/s]

Filter:   0%|          | 0/499 [00:00<?, ? examples/s]

**Initializing the model for QLoRA**

In [13]:
from peft import prepare_model_for_kbit_training

In [14]:
original_model = prepare_model_for_kbit_training(original_model)

**Getting a better picture of GPT-2's architecture**

In [15]:
print(original_model)

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Linear4bit(in_features=768, out_features=2304, bias=True)
          (c_proj): Linear4bit(in_features=768, out_features=768, bias=True)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Linear4bit(in_features=768, out_features=3072, bias=True)
          (c_proj): Linear4bit(in_features=3072, out_features=768, bias=True)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affin

**Defining the LoRA config for Fine-tuning GPT-2**

In [16]:
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

config = LoraConfig(
    r=32, #Rank
    lora_alpha=32,
    target_modules=[
        'c_attn',
        'c_proj',
        'c_fc'
    ],
    bias="none",
    lora_dropout=0.05,  # Conventional
    task_type="CAUSAL_LM",
)

# 1 - Enabling gradient checkpointing to reduce memory usage during fine-tuning
original_model.gradient_checkpointing_enable()

peft_model = get_peft_model(original_model, config)

**Checking the Trainable parameters**

In [17]:
def print_number_of_trainable_model_parameters(model):
    trainable_model_params = 0
    all_model_params = 0
    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    return f"trainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"

print(print_number_of_trainable_model_parameters(original_model))

trainable model parameters: 4718592
all model parameters: 86691072
percentage of trainable model parameters: 5.44%


**Training Process**

To finetune GPT-2 we make use of PEFT with LoRA, with details defined as follows:

* LoRA Parameters for GPT-2
The task for which the model is intended for is CAUSAL_LM. For this study we specify the rank ‘r’, and ‘lora_alpha’ to 32. We set the target modules as 'c_attn', 'c_proj' and 'c_fc' namely Context Attention, Context Projection and Context Fully Connected, to undergo low-rank adaption as these modules are the most crucial in a transformer-based architecture used for tasks like summarization and generation  (language modelling), where these layers enable the model to learn deeply about the sequential data and process it. To reduce the computational complexity, we set ‘bias’ = ‘none’. Furthermore, to avoid overfitting, we set lora_dropout=0.05.

In the training configuration, we set-up gradient accumulation and checkpointing. 


In [18]:
output_dir = f'./peft-dialogue-summary-training-{str(int(time.time()))}'
import transformers

peft_training_args = TrainingArguments(
    output_dir = output_dir,
    warmup_steps=1,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    optim="paged_adamw_8bit",
    logging_steps=25,
    logging_dir="./logs",
    save_strategy="steps",
    save_steps=25,
    evaluation_strategy="steps",
    eval_steps=25,
    do_eval=True,
    gradient_checkpointing=True,
    report_to="none",
    overwrite_output_dir = 'True',
    group_by_length=True,
)

peft_model.config.use_cache = False

peft_trainer = transformers.Trainer(
    model=peft_model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    args=peft_training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)



In [None]:
peft_trainer.train()

**Model Evaluation**

* Loading the pre-trained model and setting up the tokenizer

In [19]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

base_model_id = "openai-community/gpt2"
base_model = AutoModelForCausalLM.from_pretrained(base_model_id, 
                                                      device_map='auto',
                                                      quantization_config=bnb_config,
                                                      trust_remote_code=True,
                                                      use_auth_token=True)



In [20]:
eval_tokenizer = AutoTokenizer.from_pretrained(base_model_id, add_bos_token=True, trust_remote_code=True, use_fast=False)
eval_tokenizer.pad_token = eval_tokenizer.eos_token

* Loading the saved finetuned model

In [21]:
from peft import PeftModel

ft_model = PeftModel.from_pretrained(base_model, "/kaggle/working/peft-dialogue-summary-training-1719345153/checkpoint-725",torch_dtype=torch.float16,is_trainable=False)

* Inference with the PEFT model

In [22]:
%%time
from transformers import set_seed
seed = 42
set_seed(seed)

index = 10

gen = pipeline("text-generation", model=ft_model, tokenizer=tokenizer)

dialogue = df['test'][index]['dialogue']
summary = df['test'][index]['summary']

prompt = f"Instruct: Summarize the following conversation.\n{dialogue}\nOutput:\n"

peft_model_res = gen(prompt,max_new_tokens=200, do_sample=True, temperature=0.7, top_p=0.9)
peft_model_output = peft_model_res[0]['generated_text'].split('Output:\n')[1].strip() if 'Output:\n' in res[0]['generated_text'] else "No output generated."
#print(peft_model_output)
prefix, success, result = peft_model_output.partition('###')

dash_line = '-'.join('' for x in range(100))
print(dash_line)
print(f'INPUT PROMPT:\n{prompt}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'PEFT MODEL:\n{prefix}')



The model 'PeftModelForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'LlamaForCausalLM', 'CodeGenForCausalLM', 'CohereForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'DbrxForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'FuyuForCausalLM', 'GemmaForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'JambaForCausalLM', 'JetMoeForCausalLM', 'LlamaForCausalLM', 'MambaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'MptForCausalLM', 'MusicgenForCausalL

---------------------------------------------------------------------------------------------------
INPUT PROMPT:
Instruct: Summarize the following conversation.
#Person1#: Happy Birthday, this is for you, Brian.
#Person2#: I'm so happy you remember, please come in and enjoy the party. Everyone's here, I'm sure you have a good time.
#Person1#: Brian, may I have a pleasure to have a dance with you?
#Person2#: Ok.
#Person1#: This is really wonderful party.
#Person2#: Yes, you are always popular with everyone. and you look very pretty today.
#Person1#: Thanks, that's very kind of you to say. I hope my necklace goes with my dress, and they both make me look good I feel.
#Person2#: You look great, you are absolutely glowing.
#Person1#: Thanks, this is a fine party. We should have a drink together to celebrate your birthday
Output:

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# attends Brian's birthday pa

**ROUGE Metric Evaluation**

In this step we compare the performance of our finetuned model with the original base model, using ROUGE metric evaluation. We perform inference on the same dataset on which the model is finetuned, but on the 'test' split.

In [23]:
original_model = AutoModelForCausalLM.from_pretrained(base_model_id, 
                                                      device_map='auto',
                                                      quantization_config=bnb_config,
                                                      trust_remote_code=True,
                                                      use_auth_token=True)

In [24]:
import pandas as pd

dialogues = df['test'][0:10]['dialogue']
human_baseline_summaries = df['test'][0:10]['summary']

original_model_summaries = []
instruct_model_summaries = []
peft_model_summaries = []

original_model_gen = pipeline("text-generation", model=original_model, tokenizer=tokenizer)
peft_model_gen = pipeline("text-generation", model=ft_model, tokenizer=tokenizer)


for idx, dialogue in enumerate(dialogues):
    human_baseline_text_output = human_baseline_summaries[idx]
    prompt = f"Instruct: Summarize the following conversation.\n{dialogue}\nOutput:\n"
    
    original_model_res = original_model_gen(prompt, max_new_tokens=200, do_sample=True, temperature=0.7, top_p=0.9)
    original_model_text_output = original_model_res[0]['generated_text'].split('Output:\n')[1].strip() if 'Output:\n' in res[0]['generated_text'] else "No output generated."
    
    peft_model_res = peft_model_gen(prompt,max_new_tokens=200, do_sample=True, temperature=0.7, top_p=0.9)
    peft_model_output = peft_model_res[0]['generated_text'].split('Output:\n')[1].strip() if 'Output:\n' in res[0]['generated_text'] else "No output generated."
    print(peft_model_output)
    peft_model_text_output, success, result = peft_model_output.partition('###')

    original_model_summaries.append(original_model_text_output)
    peft_model_summaries.append(peft_model_text_output)

zipped_summaries = list(zip(human_baseline_summaries, original_model_summaries, peft_model_summaries))
 
df_metrics = pd.DataFrame(zipped_summaries, columns = ['human_baseline_summaries', 'original_model_summaries', 'peft_model_summaries'])
df_metrics

The model 'PeftModelForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'LlamaForCausalLM', 'CodeGenForCausalLM', 'CohereForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'DbrxForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'FuyuForCausalLM', 'GemmaForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'JambaForCausalLM', 'JetMoeForCausalLM', 'LlamaForCausalLM', 'MambaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'MptForCausalLM', 'MusicgenForCausalL

#Person1# asks #Person2# to take a dictation for #Person2#. #Person2# tells #Person1# #Person1# will have to take a dictation.

### End the conversation.

### End the conversation.

### End the conversation.

### End the conversation.

### End the conversation.

### End the conversation.

### End the conversation.

### End the conversation.

### End the conversation.

### End the conversation.

### End the conversation.

### End the conversation.

### End the conversation.

### End the conversation.

### End the conversation.

### End the conversation.

### End the conversation.

### End the conversation.

### End the conversation.

### End the conversation.

### End the conversation.

### End the conversation.

### End the
Mr. Dawson wants to take a dictation from Ms. Dawson and sends the memo to all employees.

### End the conversation.

### End the conversation.

### End the conversation.

### End the conversation.

### End the conversation.

### End the conversation.

### End the c

Unnamed: 0,human_baseline_summaries,original_model_summaries,peft_model_summaries
0,Ms. Dawson helps #Person1# to write a memo to ...,#Person1#: *PERSON1*\n#Person2#: *PERSON2*\n#P...,#Person1# asks #Person2# to take a dictation f...
1,In order to prevent employees from wasting tim...,"As always, the program is available on the Int...",Mr. Dawson wants to take a dictation from Ms. ...
2,Ms. Dawson takes a dictation for #Person1# abo...,#Person1#: I will take a dictation for you.\n#...,#Person1# asks Ms. Dawson to take a dictation ...
3,#Person2# arrives late because of traffic jam....,#Person1#: You're going to quit driving.,#Person1# tells #Person2# that #Person2# will ...
4,#Person2# decides to follow #Person1#'s sugges...,#Person1#: It's not my place to tell you what ...,#Person2# thinks #Person1# should try to find ...
5,#Person2# complains to #Person1# about the tra...,#Person1#: What is your favorite part of the c...,#Person1# tells #Person1# #Person2#'s car is c...
6,#Person1# tells Kate that Masha and Hero get d...,#Person2#: I don't think they will.,Kate tells #Person2# Masha and Hero are gettin...
7,#Person1# tells Kate that Masha and Hero are g...,#Person1#: That's it.\n#Person2#: You are right.,Kate is worried about the kids. She is afraid ...
8,#Person1# and Kate talk about the divorce betw...,The final chapter of this book was written in ...,Kate tells #Person1# about the divorce and the...
9,#Person1# and Brian are at the birthday party ...,"#Person1#: I was just feeling very sleepy, I'm...",Brian and #Person2# have a good time and enjoy...


* Statistics

In [25]:
import evaluate

rouge = evaluate.load('rouge')

original_model_results = rouge.compute(
    predictions=original_model_summaries,
    references=human_baseline_summaries[0:len(original_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

peft_model_results = rouge.compute(
    predictions=peft_model_summaries,
    references=human_baseline_summaries[0:len(peft_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

print('ORIGINAL MODEL:')
print(original_model_results)
print('PEFT MODEL:')
print(peft_model_results)

print("Absolute percentage improvement of PEFT MODEL over ORIGINAL MODEL")

improvement = (np.array(list(peft_model_results.values())) - np.array(list(original_model_results.values())))
for key, value in zip(peft_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')

Downloading builder script:   0%|          | 0.00/6.27k [00:00<?, ?B/s]

ORIGINAL MODEL:
{'rouge1': 0.15140897852634483, 'rouge2': 0.016760347470618293, 'rougeL': 0.12892774470287083, 'rougeLsum': 0.14493243920342874}
PEFT MODEL:
{'rouge1': 0.352963550227572, 'rouge2': 0.08746638678239227, 'rougeL': 0.26950756720566904, 'rougeLsum': 0.2715624076597606}
Absolute percentage improvement of PEFT MODEL over ORIGINAL MODEL
rouge1: 20.16%
rouge2: 7.07%
rougeL: 14.06%
rougeLsum: 12.66%
