# 基于PEFT的生成式AI模型训练

使用FLAN-T5模型(参数量251M)，利用[DialogSum](https://huggingface.co/datasets/knkarthick/dialogsum)数据库（包含12.5k行的训练集，1.5k测试集, 0.5k验证集），进行Full Fine-Tuning和基于PEFT的LoRA Fine-Tuning，来优化模型在dialogue summary这一子任务上的效果。

Full-Fine-Tuning模型可在[Hugging Face Hub](https://huggingface.co/linlinlin/full-fine-tuning)查看

基于PEFT中LoRA技术训练的模型可在[Hugging Face Hub](https://huggingface.co/linlinlin/peft-fine-tuning)查看

最末尾对比了baseline model, Full-Fine-Tuning, PEFT模型的结果。

项目资源：colab notebook, V100 GPU, 16GB GPU RAM。

# 1 - 库和数据准备

导库，准备数据集

In [None]:
%pip install \
    transformers==4.27.2 \
    datasets==2.11.0 \
    evaluate==0.4.0 \
    rouge_score==0.1.2 \
    loralib \
    peft

In [2]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer
import torch
import time
import evaluate
import pandas as pd
import numpy as np

device = "cuda"

要使用的基础模型是FLAN-T5

In [3]:
model_name = 'google/flan-t5-base'

# original_model是baseline model
original_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
original_model = original_model.to(device)

tokenizer = AutoTokenizer.from_pretrained(model_name)


编写一个函数：打印输出模型可训练的参数数量

In [4]:
def print_number_of_trainable_model_parameters(model):
    trainable_model_params = 0
    all_model_params = 0
    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    return f"trainable model parameters: {trainable_model_params} \nall model parameter: {all_model_params} \npercentage of trainable model parameters: {round(trainable_model_params/all_model_params * 100, 2)}%"

print(print_number_of_trainable_model_parameters(original_model))


trainable model parameters: 247577856 
all model parameter: 247577856 
percentage of trainable model parameters: 100.0%


In [5]:
huggingface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(huggingface_dataset_name)

dataset



  0%|          | 0/3 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
})

挂载Google Drive，将模型保存在本地或复制保存到Google Drive中

In [6]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


登陆Hugging Face，将模型保存到Hugging Face Hub中

In [11]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

# 2 - Zero Shot

在Fine-Tuning之前，先做一个zero shot看看baseline model的效果

In [7]:
index = 200

dialogue = dataset['test'][index]['dialogue']
summary = dataset['test'][index]['summary']

prompt = f"""
    Summarize the following conversation.

    {dialogue}

    Summary:
    """

inputs = tokenizer(prompt, return_tensors='pt').to(device)
output = tokenizer.decode(
    original_model.generate(
        inputs["input_ids"],
        max_new_tokens=200,
    )[0],
    skip_special_tokens=True
)

dash_line = '-'.join('' for x in range(100))
print(dash_line)
print(f'INPUT PROMPT:\n{prompt}')
print(dash_line)
print(f'HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - ZERO SHOT:\n{output}')



---------------------------------------------------------------------------------------------------
INPUT PROMPT:

    Summarize the following conversation.

    #Person1#: Have you considered upgrading your system?
#Person2#: Yes, but I'm not sure what exactly I would need.
#Person1#: You could consider adding a painting program to your software. It would allow you to make up your own flyers and banners for advertising.
#Person2#: That would be a definite bonus.
#Person1#: You might also want to upgrade your hardware because it is pretty outdated now.
#Person2#: How can we do that?
#Person1#: You'd probably need a faster processor, to begin with. And you also need a more powerful hard disc, more memory and a faster modem. Do you have a CD-ROM drive?
#Person2#: No.
#Person1#: Then you might want to add a CD-ROM drive too, because most new software programs are coming out on Cds.
#Person2#: That sounds great. Thanks.

    Summary:
    
---------------------------------------------------

# 3 - Full Fine-Tuning


## 3.1 - 预处理数据集

先对数据集进行处理，把input组合成instruction prompt的形式

In [8]:
def tokenize_function(examples):
    start_prompt = "Summarize the following conversation.\n\n"
    end_prompt = "\n\nSummary:"
    prompt = [start_prompt + dialogue + end_prompt for dialogue in examples['dialogue']]
    examples['input_ids'] = tokenizer(prompt, padding="max_length", truncation=True, return_tensors='pt').input_ids.to(device)
    examples['labels'] = tokenizer(examples['summary'], padding="max_length", truncation=True, return_tensors='pt').input_ids.to(device)

    return examples

# 用map函数，对dataset所有数据集的每一行数据进行处理
tokenized_datasets = dataset.map(tokenize_function, batched=True)
tokenized_datasets = tokenized_datasets.remove_columns(['id', 'topic', 'dialogue', 'summary'])



Map:   0%|          | 0/500 [00:00<?, ? examples/s]

看一下处理之后的数据集

In [9]:
print(f"Shapes of the datasets:")
print(f"Training: {tokenized_datasets['train'].shape}")
print(f"Validation: {tokenized_datasets['validation'].shape}")
print(f"Test: {tokenized_datasets['test'].shape}")

print(tokenized_datasets)


Shapes of the datasets:
Training: (12460, 2)
Validation: (500, 2)
Test: (1500, 2)
DatasetDict({
    train: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 12460
    })
    test: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 1500
    })
    validation: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 500
    })
})


把数据集缩小50倍，加速后续计算

In [10]:
tokenized_datasets = tokenized_datasets.filter(lambda example, index: index % 50 == 0, with_indices=True)



Filter:   0%|          | 0/500 [00:00<?, ? examples/s]

## 3.2 - Fine-Tuning模型训练

In [12]:
output_dir = f'./full-fine-tuning'

# 制定训练参数，可根据需要调整
training_args = TrainingArguments(
    output_dir=output_dir,
    learning_rate=1e-5,
    num_train_epochs=5,
    weight_decay=0.01,
    logging_steps=1,
    max_steps=50,
    push_to_hub=True  # 稍后可将模型传到Hugging Face Hub中
)

# 实例化
trainer = Trainer(
    model=original_model,  # 指定pre_trained模型
    args=training_args,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['validation']
)


/content/./full-fine-tuning is already a clone of https://huggingface.co/linlinlin/full-fine-tuning. Make sure you pull the latest changes with `repo.git_pull()`.


In [13]:
trainer.train()  # Fine-Tuning训练



Step,Training Loss
1,50.5
2,50.75
3,49.75
4,49.25
5,50.5
6,49.25
7,50.75
8,50.0
9,49.25
10,49.25


TrainOutput(global_step=50, training_loss=48.75, metrics={'train_runtime': 41.9892, 'train_samples_per_second': 9.526, 'train_steps_per_second': 1.191, 'total_flos': 269794396864512.0, 'train_loss': 48.75, 'epoch': 1.56})

In [35]:
# trainer.push_to_hub()  # 将模型保存到Hugging Face Hub，今后可直接下载调用该模型

Upload file pytorch_model.bin:   0%|          | 1.00/486M [00:00<?, ?B/s]

To https://huggingface.co/linlinlin/full-fine-tuning
   0f91131..0206bb3  main -> main

   0f91131..0206bb3  main -> main



'https://huggingface.co/linlinlin/full-fine-tuning/commit/0206bb3dbadf602bdafd8b734193b6c33917afb1'

In [14]:
trainer.save_model(output_dir=output_dir)  # 保存到本地文件夹

Upload file pytorch_model.bin:   0%|          | 1.00/472M [00:00<?, ?B/s]

Upload file runs/Jul06_10-42-37_0d81faf37b67/events.out.tfevents.1688640168.0d81faf37b67.6146.0:   0%|        …

Upload file runs/Jul06_10-39-47_0d81faf37b67/1688639992.9379106/events.out.tfevents.1688639992.0d81faf37b67.62…

Upload file runs/Jul06_10-42-37_0d81faf37b67/1688640168.5596447/events.out.tfevents.1688640168.0d81faf37b67.61…

Upload file training_args.bin:   0%|          | 1.00/3.50k [00:00<?, ?B/s]

Upload file runs/Jul06_10-39-47_0d81faf37b67/events.out.tfevents.1688639992.0d81faf37b67.622.5:   0%|         …

To https://huggingface.co/linlinlin/full-fine-tuning
   5cb9537..0f91131  main -> main

   5cb9537..0f91131  main -> main



## 3.3 - 模型评估

### 3.3.1 - 定性评估

In [15]:
# 调用保存在本地的刚刚训练好的模型
instruct_model = AutoModelForSeq2SeqLM.from_pretrained(output_dir)
instruct_model = instruct_model.to(device)

In [16]:
index = 200
dialogue = dataset['test'][index]['dialogue']
human_baseline_summary = dataset['test'][index]['summary']

prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
"""

input_ids = tokenizer(prompt, return_tensors='pt').input_ids.to(device)  # input_ids要加上.to(device)

original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
original_model_test_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)

instruct_model_outputs = instruct_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
instruct_model_test_output = tokenizer.decode(instruct_model_outputs[0], skip_special_tokens=True)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{human_baseline_summary}\n')
print(dash_line)
print(f'ORIGINAL MODEL:\n{original_model_test_output}\n')
print(dash_line)
print(f'INSTRUCT MODEL:\n{instruct_model_test_output}\n')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

---------------------------------------------------------------------------------------------------
ORIGINAL MODEL:
#Person1#: I'm thinking of upgrading your computer.

---------------------------------------------------------------------------------------------------
INSTRUCT MODEL:
#Person1#: I'm thinking of upgrading my computer.



### 3.3.2 - 定量评估

使用ROUGE来评估文本摘要的结果

In [17]:
rouge = evaluate.load('rouge')

In [18]:
# 只用了测试集的10个样本，可根据需要调整
dialogues = dataset['test'][0:10]['dialogue']
human_baseline_summaries = dataset['test'][0:10]['summary']

original_model_summaries = []
instruct_model_summaries = []

for _, dialogue in enumerate(dialogues):
    prompt = f"""
    Summarize the following conversation.

    {dialogue}

    Summary: """

    input_ids = tokenizer(prompt, return_tensors='pt').input_ids.to(device)

    original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
    original_model_test_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)
    original_model_summaries.append(original_model_test_output)

    instruct_model_outputs = instruct_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
    instruct_model_test_output = tokenizer.decode(instruct_model_outputs[0], skip_special_tokens=True)
    instruct_model_summaries.append(instruct_model_test_output)


可视化看一下

In [19]:
zipped_summaries = list(zip(human_baseline_summaries, original_model_summaries, instruct_model_summaries))
df = pd.DataFrame(zipped_summaries, columns=['human_baseline', 'original_model', 'instruct_model'])
df

Unnamed: 0,human_baseline,original_model,instruct_model
0,Ms. Dawson helps #Person1# to write a memo to ...,#Person1#: Please type the memo into the email.,#Person1#: I need to take a dictation for you.
1,In order to prevent employees from wasting tim...,#Person1#: I am calling to ask you questions.,#Person1#: I need to take a dictation for you.
2,Ms. Dawson takes a dictation for #Person1# abo...,#Person1: #Person2: #Person2: #Person1: #Perso...,#Person1#: I need to take a dictation for you.
3,#Person2# arrives late because of traffic jam....,The traffic is bad in the city.,The traffic jam at the Carrefour intersection ...
4,#Person2# decides to follow #Person1#'s sugges...,The person is a driver.,The traffic jam at the Carrefour intersection ...
5,#Person2# complains to #Person1# about the tra...,The public transport system is good for the en...,The traffic jam at the Carrefour intersection ...
6,#Person1# tells Kate that Masha and Hero get d...,#Person1: Masha and Hero are divorced.,Masha and Hero are getting divorced.
7,#Person1# tells Kate that Masha and Hero are g...,#Person1#: You are the best. #Person2: You are...,Masha and Hero are getting divorced.
8,#Person1# and Kate talk about the divorce betw...,Masha and Hero are getting divorced.,Masha and Hero are getting divorced.
9,#Person1# and Brian are at the birthday party ...,"#Person1: Happy birthday, Brian.","#Person1#: Happy birthday, Brian. #Person2#: I..."


计算ROUGE值和变化量

In [20]:
original_model_results = rouge.compute(
    predictions=original_model_summaries,
    references=human_baseline_summaries[0:len(original_model_summaries)],
    use_aggregator=True,
    use_stemmer = True
)

instruct_model_results = rouge.compute(
    predictions=instruct_model_summaries,
    references=human_baseline_summaries[0:len(instruct_model_summaries)],
    use_aggregator=True,
    use_stemmer = True
)

print("ORIGINAL MODEL RESULTS")
print(original_model_results)
print("INSTRUCT MODEL RESULTS")
print(instruct_model_results)


ORIGINAL MODEL RESULTS
{'rouge1': 0.19064239809399192, 'rouge2': 0.04172463768115942, 'rougeL': 0.17519423092462152, 'rougeLsum': 0.1783972710572288}
INSTRUCT MODEL RESULTS
{'rouge1': 0.23884559093833285, 'rouge2': 0.11535720375106562, 'rougeL': 0.21714203657752046, 'rougeLsum': 0.2175800707655546}


In [21]:
improvement = (np.array(list(instruct_model_results.values())) - np.array(list(original_model_results.values())))

for key, value in zip(instruct_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')

rouge1: 4.82%
rouge2: 7.36%
rougeL: 4.19%
rougeLsum: 3.92%


# 4 - PEFT建模

PEFT的全称是Parameter Efficient Fine-Tuning，包括了LoRA(Low-Rank Adaptation)技术。LoRA可以让我们用很小的计算资源来对模型进行微调。原理是冻结原模型的参数，新增一个LoRA adapter层。这样微调的时候只需要不断迭代LoRA adapter层的参数即可。

## 4.1 - 预处理

In [22]:
from peft import LoraConfig, get_peft_model, TaskType

lora_config = LoraConfig(
    r=32, # Rank，决定训练的参数量
    lora_alpha=32,
    target_modules=["q", "v"],
    lora_dropout=0.05,
    bias='none',
    task_type=TaskType.SEQ_2_SEQ_LM  # FLAN_T5
)

将LoRA adapter层加到LLM中

In [23]:
peft_model = get_peft_model(original_model, lora_config)
peft_model.to(device)
print_number_of_trainable_model_parameters(peft_model)

'trainable model parameters: 3538944 \nall model parameter: 251116800 \npercentage of trainable model parameters: 1.41%'

## 4.2 - PFET模型训练

In [24]:
peft_output_dir = f'./peft-fine-tuning'

peft_training_args = TrainingArguments(
    output_dir=peft_output_dir,
    auto_find_batch_size=True,
    learning_rate=1e-3,  # higher learning rate than full fine-tuning
    num_train_epochs=5,
    logging_steps=1,
    max_steps=50,
    push_to_hub=True
)

peft_trainer = Trainer(
    model=peft_model,
    args=peft_training_args,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['validation']
)

/content/./peft-fine-tuning is already a clone of https://huggingface.co/linlinlin/peft-fine-tuning. Make sure you pull the latest changes with `repo.git_pull()`.


In [None]:
peft_trainer.train()

In [36]:
peft_trainer.push_to_hub()  # 将模型保存到Hugging Face Hub，今后可直接下载调用该模型

Upload file pytorch_model.bin:   0%|          | 1.00/486M [00:00<?, ?B/s]

Upload file adapter_model.bin:   0%|          | 1.00/13.6M [00:00<?, ?B/s]

Upload file runs/Jul06_10-47-00_0d81faf37b67/1688640428.3066304/events.out.tfevents.1688640428.0d81faf37b67.61…

Upload file runs/Jul06_10-47-00_0d81faf37b67/1688640427.5967362/events.out.tfevents.1688640427.0d81faf37b67.61…

Upload file training_args.bin:   0%|          | 1.00/3.50k [00:00<?, ?B/s]

Upload file runs/Jul06_10-47-00_0d81faf37b67/events.out.tfevents.1688640427.0d81faf37b67.6146.2:   0%|        …

To https://huggingface.co/linlinlin/peft-fine-tuning
   274a49b..f0ecb65  main -> main

   274a49b..f0ecb65  main -> main



'https://huggingface.co/linlinlin/peft-fine-tuning/commit/f0ecb6566767e0eb8e821da34402a345976402cb'

In [26]:
# 保存到本地文件夹
peft_trainer.model.save_pretrained(peft_output_dir)
tokenizer.save_pretrained(peft_output_dir)

('./peft-fine-tuning/tokenizer_config.json',
 './peft-fine-tuning/special_tokens_map.json',
 './peft-fine-tuning/tokenizer.json')

## 4.3 - 模型评估

### 4.3.1 - 定性评估

In [27]:
from peft import PeftModel, PeftConfig

# 这里的peft_model_base就是上文的original_model
peft_model_base = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base", torch_dtype=torch.bfloat16)
peft_model_base = peft_model_base.to(device)
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-base")

peft_model = PeftModel.from_pretrained(peft_model_base,
                                        peft_output_dir,  # 加载保存在本地的刚刚训练好的peft模型
                                        torch_dtype=torch.bfloat16,
                                        is_trainable=False)

In [28]:
# 加载时，参数is_trainable设置成False，从而可训练的模型数量为0
print(print_number_of_trainable_model_parameters(peft_model))

trainable model parameters: 0 
all model parameter: 251116800 
percentage of trainable model parameters: 0.0%


In [29]:
index = 200
dialogue = dataset['test'][index]['dialogue']
human_baseline_summary = dataset['test'][index]['summary']

prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
"""

input_ids = tokenizer(prompt, return_tensors='pt').input_ids.to(device)

original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
original_model_test_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)

instruct_model_outputs = instruct_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
instruct_model_test_output = tokenizer.decode(instruct_model_outputs[0], skip_special_tokens=True)

peft_model_outputs = peft_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
peft_model_test_output = tokenizer.decode(peft_model_outputs[0], skip_special_tokens=True)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{human_baseline_summary}\n')
print(dash_line)
print(f'ORIGINAL MODEL:\n{original_model_test_output}\n')
print(dash_line)
print(f'INSTRUCT MODEL:\n{instruct_model_test_output}\n')
print(dash_line)
print(f'PEFT MODEL:\n{peft_model_test_output}\n')



---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

---------------------------------------------------------------------------------------------------
ORIGINAL MODEL:
You might want to add a CD-ROM drive to your computer.

---------------------------------------------------------------------------------------------------
INSTRUCT MODEL:
#Person1#: I'm thinking of upgrading my computer.

---------------------------------------------------------------------------------------------------
PEFT MODEL:
Upgrade your system.



### 4.3.2 - 定量评估

In [30]:
# 只用了测试集的10个样本，可根据需要调整
dialogues = dataset['test'][0:10]['dialogue']
human_baseline_summaries = dataset['test'][0:10]['summary']

original_model_summaries = []
instruct_model_summaries = []
peft_model_summaries = []

for _, dialogue in enumerate(dialogues):
    prompt = f"""
    Summarize the following conversation.

    {dialogue}

    Summary: """

    input_ids = tokenizer(prompt, return_tensors='pt').input_ids.to(device)

    original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
    original_model_test_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)
    original_model_summaries.append(original_model_test_output)

    instruct_model_outputs = instruct_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
    instruct_model_test_output = tokenizer.decode(instruct_model_outputs[0], skip_special_tokens=True)
    instruct_model_summaries.append(instruct_model_test_output)

    peft_model_outputs = peft_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
    peft_model_test_output = tokenizer.decode(peft_model_outputs[0], skip_special_tokens=True)
    peft_model_summaries.append(peft_model_test_output)

In [31]:
zipped_summaries = list(zip(human_baseline_summaries, original_model_summaries, instruct_model_summaries, peft_model_summaries))
df = pd.DataFrame(zipped_summaries, columns=['human_baseline', 'original_model', 'instruct_model', 'peft_model'])
df


Unnamed: 0,human_baseline,original_model,instruct_model,peft_model
0,Ms. Dawson helps #Person1# to write a memo to ...,This memo should go out as an intra-office mem...,#Person1#: I need to take a dictation for you.,This memo should go out as an intra-office mem...
1,In order to prevent employees from wasting tim...,Request #Person1# to be a #Person1# to be a #P...,#Person1#: I need to take a dictation for you.,This memo should go out as an intra-office mem...
2,Ms. Dawson takes a dictation for #Person1# abo...,Publications will be a memo issued by the depa...,#Person1#: I need to take a dictation for you.,This memo should go out as an intra-office mem...
3,#Person2# arrives late because of traffic jam....,Taking the subway to work would be a better op...,The traffic jam at the Carrefour intersection ...,"If you're going to quit driving to work, you'l..."
4,#Person2# decides to follow #Person1#'s sugges...,The public transport system is pretty good.,The traffic jam at the Carrefour intersection ...,"If you're going to quit driving to work, you'l..."
5,#Person2# complains to #Person1# about the tra...,I'm a little worried about the congestion on t...,The traffic jam at the Carrefour intersection ...,"If you're going to quit driving to work, you'l..."
6,#Person1# tells Kate that Masha and Hero get d...,@Person1#: :...,Masha and Hero are getting divorced.,@Person1#:
7,#Person1# tells Kate that Masha and Hero are g...,The couple have a separation for 2 months and ...,Masha and Hero are getting divorced.,@Person1#:
8,#Person1# and Kate talk about the divorce betw...,"The divorce is a surprise, but the kids are a ...",Masha and Hero are getting divorced.,@Person1#:
9,#Person1# and Brian are at the birthday party ...,"Thank #Person1#: ""I'm so happy to have a dance...","#Person1#: Happy birthday, Brian. #Person2#: I...",@Person1#:


In [32]:
original_model_results = rouge.compute(
    predictions=original_model_summaries,
    references=human_baseline_summaries[0:len(original_model_summaries)],
    use_aggregator=True,
    use_stemmer = True
)

instruct_model_results = rouge.compute(
    predictions=instruct_model_summaries,
    references=human_baseline_summaries[0:len(instruct_model_summaries)],
    use_aggregator=True,
    use_stemmer = True
)

peft_model_results = rouge.compute(
    predictions=peft_model_summaries,
    references=human_baseline_summaries[0:len(peft_model_summaries)],
    use_aggregator=True,
    use_stemmer = True
)

print("ORIGINAL MODEL RESULTS")
print(original_model_results)
print("INSTRUCT MODEL RESULTS")
print(instruct_model_results)
print("PEFT MODEL RESULTS")
print(peft_model_results)


ORIGINAL MODEL RESULTS
{'rouge1': 0.15315356590381868, 'rouge2': 0.03302490192734095, 'rougeL': 0.14831670969183608, 'rougeLsum': 0.14798341867098191}
INSTRUCT MODEL RESULTS
{'rouge1': 0.23884559093833285, 'rouge2': 0.11535720375106562, 'rougeL': 0.21714203657752046, 'rougeLsum': 0.2175800707655546}
PEFT MODEL RESULTS
{'rouge1': 0.19299535837479534, 'rouge2': 0.05097744360902255, 'rougeL': 0.17338166255546916, 'rougeLsum': 0.17176716118576585}


In [34]:
improvement = (np.array(list(peft_model_results.values())) - np.array(list(original_model_results.values())))

for key, value in zip(peft_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')

rouge1: 3.98%
rouge2: 1.80%
rougeL: 2.51%
rougeLsum: 2.38%
