# 生成式AI应用：总结对话 Summarize Dialogue

使用FLAN-T5(参数量251M)模型，利用[DialogSum](https://huggingface.co/datasets/knkarthick/dialogsum)数据库（包含12.5k行的训练集，1.5k测试集, 0.5k验证集），用Prompt工程，通过调整input text来优化模型在dialogue summary这一子任务上的效果。

本项目中的Prompt工程包括instruction prompt中zero shot, one shot, few shot推断。

最末尾对比了no prompt model, zero shot, one shot, two shot, five shot的结果。

项目资源：colab notebook, T4 GPU, 15GB GPU RAM。

# 1 - 库和数据准备

导库，准备数据集

In [None]:
%pip install \
    transformers==4.27.2 \
    datasets==2.11.0

In [2]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

In [None]:
huggingface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(huggingface_dataset_name)

看一下数据格式和大小，一共有4列，除了dialogue和summary外，还有一个topic的特征

In [4]:
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
})


In [5]:
example_indices = [666, 999]

dash_line = '-'.join('' for x in range(20))

for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example', i + 1)
    print(dash_line)
    print('DIALOGUE:\n')
    print(dataset['test'][index]['dialogue'])
    print(dash_line)
    print('HUMAN SUMMARY:\n')
    print(dataset['test'][index]['summary'])
    print(dash_line)
    print()

-------------------
Example 1
-------------------
DIALOGUE:

#Person1#: Hey, Tom, what to go for a run?
#Person2#: No thanks. I like to run in the morning. I ran a couple of miles when I woke up today.
#Person1#: I try to do that, but I can't get up early enough.
#Person2#: I couldn't either at first, but you get used to it.
#Person1#: It's so hot at lunchtime ; I'd rather run in the morning.
#Person2#: Well, why don't you come tomorrow? I'll stop by your house on my way out.
#Person1#: I could try, but I can't say for sure if I'll get up in time. What time do you want to go?
#Person2#: I'll give you a call around 6 o'clock and stop by around 6 thirty.
#Person1#: O. K. , maybe if I have someone to go with, I'll be able to get up in time for a jog.
#Person2#: Great, I'll see you then.
#Person1#: See you.
-------------------
HUMAN SUMMARY:

Tom invites #Person1# to run in the morning. #Person1# would try to get up and join him.
-------------------

-------------------
Example 2
---------

上个cell把dialogue和summary列打印了几个样本，可以从中知道样本的大致样子。

接下来从HuggingFace库里加载Flan-T5模型

In [None]:
device = 'cuda'

model_name='google/flan-t5-base'

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
model = model.to(device)

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name)

`tokenizer`的功能效果：将单词向量化。需要注意的是不同模型的`tokenizer`有所不同，要使用对应的`tokenizer`

In [8]:
sentence = "What time is it, Tom?"


# 将上述句子向量化
sentence_encoded = tokenizer(sentence, return_tensors='pt').to(device)

# 解码
sentence_decoded = tokenizer.decode(
        sentence_encoded["input_ids"][0],
        skip_special_tokens=True
    )

print('ENCODED SENTENCE:')
print(sentence_encoded["input_ids"][0])
print('\nDECODED SENTENCE:')
print(sentence_decoded)

ENCODED SENTENCE:
tensor([ 363,   97,   19,   34,    6, 3059,   58,    1], device='cuda:0')

DECODED SENTENCE:
What time is it, Tom?


# 2 - Baseline

不使用任何Prompt工程

Prompt engineering是针对给定任务，人为调整prompt(input)来改善模型结果的过程

In [9]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    # 使用Flan-T5的tokenizer，把dialogue向量化
    inputs = tokenizer(dialogue, return_tensors='pt').to(device)
    # 把inputs喂给Flan-T5，得到结果
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"],
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'DIALOGUE:\n{dialogue}')
    print(dash_line)
    print(f'HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'BASELINE MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}\n')


-------------------
Example  1
-------------------
DIALOGUE:
#Person1#: Hey, Tom, what to go for a run?
#Person2#: No thanks. I like to run in the morning. I ran a couple of miles when I woke up today.
#Person1#: I try to do that, but I can't get up early enough.
#Person2#: I couldn't either at first, but you get used to it.
#Person1#: It's so hot at lunchtime ; I'd rather run in the morning.
#Person2#: Well, why don't you come tomorrow? I'll stop by your house on my way out.
#Person1#: I could try, but I can't say for sure if I'll get up in time. What time do you want to go?
#Person2#: I'll give you a call around 6 o'clock and stop by around 6 thirty.
#Person1#: O. K. , maybe if I have someone to go with, I'll be able to get up in time for a jog.
#Person2#: Great, I'll see you then.
#Person1#: See you.
-------------------
HUMAN SUMMARY:
Tom invites #Person1# to run in the morning. #Person1# would try to get up and join him.
-------------------
BASELINE MODEL GENERATION - WITHOUT PROMP

# 3 - Instruction Prompt




## 3.1 - Zero Shot Inference

将input结构化，在input中例子的数量为0


In [10]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    # input输入不再仅仅是dialogue, 而是在其首尾加入了一个"引导"的prompt
    prompt = f"""
        Summarize the following conversation.

        {dialogue}

        Summary:
        """

    inputs = tokenizer(prompt, return_tensors='pt').to(device)
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"],
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{prompt}')
    print(dash_line)
    print(f'HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL GENERATION - ZERO SHOT:\n{output}\n')

-------------------
Example  1
-------------------
INPUT PROMPT:

        Summarize the following conversation.

        #Person1#: Hey, Tom, what to go for a run?
#Person2#: No thanks. I like to run in the morning. I ran a couple of miles when I woke up today.
#Person1#: I try to do that, but I can't get up early enough.
#Person2#: I couldn't either at first, but you get used to it.
#Person1#: It's so hot at lunchtime ; I'd rather run in the morning.
#Person2#: Well, why don't you come tomorrow? I'll stop by your house on my way out.
#Person1#: I could try, but I can't say for sure if I'll get up in time. What time do you want to go?
#Person2#: I'll give you a call around 6 o'clock and stop by around 6 thirty.
#Person1#: O. K. , maybe if I have someone to go with, I'll be able to get up in time for a jog.
#Person2#: Great, I'll see you then.
#Person1#: See you.

        Summary:
        
-------------------
HUMAN SUMMARY:
Tom invites #Person1# to run in the morning. #Person1# would tr

## 3.2 - Zero Shot Inference with another template

可以在[pre-built FLAN-T5 prompts](https://github.com/google-research/FLAN/blob/main/flan/v2/templates.py)找到其他的模板，这里再尝试一个新的instruction

In [11]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    # 最后一句话和3.1有所不同
    prompt = f"""
        Dialogue:

        {dialogue}

        What was going on?
        """

    inputs = tokenizer(prompt, return_tensors='pt').to(device)
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"],
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{prompt}')
    print(dash_line)
    print(f'HUMAN SUMMARY:\n{summary}\n')
    print(dash_line)
    print(f'MODEL GENERATION - ZERO SHOT:\n{output}\n')

-------------------
Example  1
-------------------
INPUT PROMPT:

        Dialogue:

        #Person1#: Hey, Tom, what to go for a run?
#Person2#: No thanks. I like to run in the morning. I ran a couple of miles when I woke up today.
#Person1#: I try to do that, but I can't get up early enough.
#Person2#: I couldn't either at first, but you get used to it.
#Person1#: It's so hot at lunchtime ; I'd rather run in the morning.
#Person2#: Well, why don't you come tomorrow? I'll stop by your house on my way out.
#Person1#: I could try, but I can't say for sure if I'll get up in time. What time do you want to go?
#Person2#: I'll give you a call around 6 o'clock and stop by around 6 thirty.
#Person1#: O. K. , maybe if I have someone to go with, I'll be able to get up in time for a jog.
#Person2#: Great, I'll see you then.
#Person1#: See you.

        What was going on?
        
-------------------
HUMAN SUMMARY:
Tom invites #Person1# to run in the morning. #Person1# would try to get up and jo

结果与使用模板一稍有区别，使用第三人称的口吻，结果更合理，接下来尝试one shot和few shot

# 4 - One Shot and Few Shot

在input中给到LLM例子。如果给到了一个例子，那就是one shot，多个例子是few shot。注意例子并不是越多越好，并且例子数量会受限于input window size。

这个方法叫"in-context learning"，可参考[blog from HuggingFace](https://huggingface.co/blog/few-shot-learning-gpt-neo-and-inference-api).

首先编写一个把例子添加到Prompt的函数

In [12]:
def make_prompt(example_indices_full, example_index_to_summarize):
    prompt = ''
    for index in example_indices_full:
        dialogue = dataset['test'][index]['dialogue']
        summary = dataset['test'][index]['summary']

        # 注意FLAN-T5模型的截止需要用'{summary}\n\n\n'
        prompt += f"""
            Dialogue:

            {dialogue}

            What was going on?
            {summary}


            """

    dialogue = dataset['test'][example_index_to_summarize]['dialogue']

    prompt += f"""
        Dialogue:

        {dialogue}

        What was going on?
        """

    return prompt

## 4.1 - One Shot

In [13]:
example_indices_full = [40]  # 加入一个例子
example_index_to_summarize = 200

one_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)

print(one_shot_prompt)


            Dialogue:

            #Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

            What was going on?
            #Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.


            
        Dialogue:

        #Person1#: Have you considered upgrading your system?
#Person2#: Yes, but I'm not sure what exactly I would need.
#Person1#: You could consider adding a painting program to your software. It would allow you to make up your own flyers and banners for advertising.
#Person2#: That would be a definite bonus.
#Person1#: You might also want to upgrade your hardware because it is pretty outdated now.
#Person2#: How can we do that?
#Pers

In [14]:
summary = dataset['test'][example_index_to_summarize]['summary']

inputs = tokenizer(one_shot_prompt, return_tensors='pt').to(device)
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0],
    skip_special_tokens=True
)

print(dash_line)
print(f'HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - ONE SHOT:\n{output}')

-------------------
HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

-------------------
MODEL GENERATION - ONE SHOT:
#Person1 wants to upgrade his system. #Person2 wants to add a painting program to his software. #Person1 wants to add a CD-ROM drive.


## 4.2 - Few Shots

### 4.2.1 - Two Shot

In [15]:
example_indices_full = [40, 80]  # 加入两个例子
example_index_to_summarize = 200

two_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)

print(two_shot_prompt)


            Dialogue:

            #Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

            What was going on?
            #Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.


            
            Dialogue:

            #Person1#: May, do you mind helping me prepare for the picnic?
#Person2#: Sure. Have you checked the weather report?
#Person1#: Yes. It says it will be sunny all day. No sign of rain at all. This is your father's favorite sausage. Sandwiches for you and Daniel.
#Person2#: No, thanks Mom. I'd like some toast and chicken wings.
#Person1#: Okay. Please take some fruit salad and crackers for me.
#Person2#: Done. Oh, don't for

In [16]:
summary = dataset['test'][example_index_to_summarize]['summary']

inputs = tokenizer(two_shot_prompt, return_tensors='pt').to(device)
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0],
    skip_special_tokens=True
)

print(dash_line)
print(f'HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - TWO SHOT:\n{output}')

Token indices sequence length is longer than the specified maximum sequence length for this model (631 > 512). Running this sequence through the model will result in indexing errors


-------------------
HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

-------------------
MODEL GENERATION - TWO SHOT:
#Person1 wants to upgrade his system. #Person2 wants to add a painting program to his software. #Person1 wants to upgrade his hardware.


### 4.2.2 - Three Shot

In [17]:
example_indices_full = [40, 80, 120]
example_index_to_summarize = 200

three_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)

# print(three_shot_prompt)

In [18]:
summary = dataset['test'][example_index_to_summarize]['summary']

inputs = tokenizer(three_shot_prompt, return_tensors='pt').to(device)
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0],
    skip_special_tokens=True
)

print(dash_line)
print(f'HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - THREE SHOT:\n{output}')

-------------------
HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

-------------------
MODEL GENERATION - THREE SHOT:
#Person1 wants to upgrade his system. #Person2 wants to add a painting program to his software. #Person1 wants to upgrade his hardware.


### 4.2.3 - Five Shot

In [19]:
example_indices_full = [x for x in range(40, 200, 32)]
example_index_to_summarize = 200

five_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)

# print(five_shot_prompt)

In [20]:
summary = dataset['test'][example_index_to_summarize]['summary']

inputs = tokenizer(five_shot_prompt, return_tensors='pt').to(device)
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0],
    skip_special_tokens=True
)

print(dash_line)
print(f'HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - FIVE SHOT:\n{output}')

-------------------
HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

-------------------
MODEL GENERATION - FIVE SHOT:
#Person1 recommends upgrading their system and hardware.


### 4.2.4 - Ten Shot

In [21]:
example_indices_full = [x for x in range(40, 200, 16)]
example_index_to_summarize = 200

ten_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)

# print(five_shot_prompt)

In [22]:
summary = dataset['test'][example_index_to_summarize]['summary']

inputs = tokenizer(ten_shot_prompt, return_tensors='pt').to(device)
output = tokenizer.decode(
    model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0],
    skip_special_tokens=True
)

print(dash_line)
print(f'HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - TEN SHOT:\n{output}')

-------------------
HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.

-------------------
MODEL GENERATION - TEN SHOT:
#Person1 recommends upgrading the system and hardware.


# 5 - Comparison

在一个结果中，对比baseline model, zero shot, one shot, two shot, five shot

In [24]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']


    inputs = tokenizer(dialogue, return_tensors='pt').to(device)
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"],
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )
    print(dash_line)
    print(f'HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'BASELINE MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}')

    prompt = f"""
        Summarize the following conversation.

        {dialogue}

        Summary:
        """

    # Zero Shot
    inputs = tokenizer(prompt, return_tensors='pt').to(device)
    output = tokenizer.decode(
        model.generate(
            inputs["input_ids"],
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )
    print(dash_line)
    print(f'MODEL GENERATION - ZERO SHOT:\n{output}')

    # One Shot
    one_shot_prompt = make_prompt([40], index)
    inputs_one_shot = tokenizer(one_shot_prompt, return_tensors='pt').to(device)
    output_one_shot = tokenizer.decode(
        model.generate(
            inputs_one_shot["input_ids"],
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )
    print(dash_line)
    print(f'MODEL GENERATION - ONE SHOT:\n{output_one_shot}')

    # Two Shot
    two_shot_prompt = make_prompt([40, 80], index)
    inputs_two_shot = tokenizer(two_shot_prompt, return_tensors='pt').to(device)
    output_two_shot = tokenizer.decode(
        model.generate(
            inputs_two_shot["input_ids"],
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )
    print(dash_line)
    print(f'MODEL GENERATION - TWO SHOT:\n{output_two_shot}')

    # Five Shot
    five_shot_prompt = make_prompt([x for x in range(40, 200, 32)], index)
    inputs_five_shot = tokenizer(five_shot_prompt, return_tensors='pt').to(device)
    output_five_shot = tokenizer.decode(
        model.generate(
            inputs_five_shot["input_ids"],
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )
    print(dash_line)
    print(f'MODEL GENERATION - FIVE SHOT:\n{output_five_shot}\n\n')


-------------------
HUMAN SUMMARY:
Tom invites #Person1# to run in the morning. #Person1# would try to get up and join him.
-------------------
BASELINE MODEL GENERATION - WITHOUT PROMPT ENGINEERING:
Person1#: Hey, Tom. What to do for a run?
-------------------
MODEL GENERATION - ZERO SHOT:
Person1#: Hi, Tom. I'm looking for a jog.
-------------------
MODEL GENERATION - ONE SHOT:
Person1 wants to go for a run in the morning.
-------------------
MODEL GENERATION - TWO SHOT:
Person1 wants to go for a run in the morning.
-------------------
MODEL GENERATION - FIVE SHOT:
Person1 wants to go for a run in the morning.


-------------------
HUMAN SUMMARY:
#Person1# has difficulty getting access to the computers in the library to do #Person1#'s assignment.
-------------------
BASELINE MODEL GENERATION - WITHOUT PROMPT ENGINEERING:
Person1: I'm frustrated. We're supposed to do our assignment on the computer, but I have difficulty getting access to the computers in the library.
-----------------

定性来看，few shot结果好于zero shot好于baseline。

更多shot不一定能带来更好的结果。

之后可以使用ROUGE来定量计算结果的变化。