# Generative AI Use Case: Summarize Dialogue

We will do the dialogue summarization task using generative AI. We will explore how the input text affects the output of the model, and perform prompt engineering to direct it towards the task we need. 

By comparing zero shot, one shot, and few shot inferences, we will take the first step towards prompt engineering and see how it can enhance the generative output of Large Language Models.

## 1 - Set up Kernel and Required Dependencies

```
pip3 install --upgrade pip

pip3 install --disable-pip-version-check \
    torch==2.0.0 \
    torchdata==0.6.0

pip3 install \
    transformers==4.27.2 \
    datasets==2.11.0
```

Load the datasets, Large Language Model (LLM), tokenizer, and configurator. 

In [11]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

## 2 - Import Dataset and Model

In this use case, we will be generating a summary of a dialogue with the pre-trained Large Language Model (LLM) [FLAN-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5) from Hugging Face. The list of available models in the Hugging Face `transformers` package can be found [here](https://huggingface.co/docs/transformers/index).

Let's upload some simple dialogues from the [DialogSum](https://huggingface.co/datasets/knkarthick/dialogsum) Hugging Face dataset. This dataset contains 10,000+ dialogues with the corresponding manually labeled summaries and topics.

In [40]:
dataset_name = "knkarthick/dialogsum"
dataset = load_dataset(dataset_name)

from enum import Enum
class Dataset_Splits(Enum):
    TRAIN = 'train'
    VALIDATION = 'validation'
    TEST = 'test'


class Dataset_Columns(Enum):
    DIALOGUE = 'dialogue'
    SUMMARY = 'summary'

Print a few dialogues from the dataset with their baseline summaries.

In [41]:
# only 2 rows in the dataset
example_indices = [40, 200]
dash_line = '-'.join('' for x in range(100))

for i, index in enumerate(example_indices):
    print(dash_line)
    print("Example ", i+1)
    print(dash_line)

    print("INPUT DIALOGUE:")
    print(dataset[Dataset_Splits.TEST.value][index][Dataset_Columns.DIALOGUE.value])

    print(dash_line)
    print("BASELINE HUMAN SUMMARY:")
    print(dataset[Dataset_Splits.TEST.value][index][Dataset_Columns.SUMMARY.value])
    print(dash_line)

    print()
    

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------
Exa

Load the FLAN-T5 model, creating an instance of the `AutoModelForSeq2SeqLM` class with the `.from_pretrained()` method.

In [44]:
model_name = 'google/flan-t5-base'
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

## 3 - Setup the Tokenizer

To perform encoding and decoding, we need to work with text in tokenized form.

Download the tokenizer for the FLAN-T5 model using `AutoTokenizer.from_pretrained()` method. Parameter `use_fast` switches on fast tokenizer. 
Find the tokenizer parameters in the [documentation](https://huggingface.co/docs/transformers/v4.28.1/en/model_doc/auto#transformers.AutoTokenizer).

In [15]:
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

spiece.model: 100%|██████████| 792k/792k [00:00<00:00, 3.88MB/s]
(…)flan-t5-base/resolve/main/tokenizer.json: 100%|██████████| 2.42M/2.42M [00:00<00:00, 11.4MB/s]
(…)ase/resolve/main/special_tokens_map.json: 100%|██████████| 2.20k/2.20k [00:00<00:00, 9.96MB/s]


Test the tokenizer encoding and decoding a simple sentence:

In [39]:
sentence = "What time is it, Tom?"

sentence_encoded = tokenizer.encode(sentence)
sentence_decoded = tokenizer.decode(
    sentence_encoded,
    skip_special_tokens=True
)

print("ENCODED SENTENCE:")
print(sentence_encoded)
print("\nDECODED SENTENCE:")
print(sentence_decoded)


ENCODED SENTENCE:
[363, 97, 19, 34, 6, 3059, 58, 1]

DECODED SENTENCE:
What time is it, Tom?


## 4 - Summarize Dialogues without Prompt Engineering

Now it's time to explore how well the base LLM summarizes a dialogue without **any** prompt engineering. Prompt engineering is an act of a human changing the prompt (input) to improve the response for a given task.

In [67]:
for i, index in enumerate(example_indices):
    dialogue = dataset[Dataset_Splits.TEST.value][index][Dataset_Columns.DIALOGUE.value]
    summary = dataset[Dataset_Splits.TEST.value][index][Dataset_Columns.SUMMARY.value]

    # return_tensors – (optional) can be set to 'tf' or 'pt' to 
    # return respectively TensorFlow tf. constant or PyTorch torch.Tensor
    # instead of a list of python integers.
    # Need to pass in list of Pytorch.Tensors to generate() instead of 
    # a list of Python integers
    inputs = tokenizer.encode(dialogue, return_tensors='pt')
    model_completion = model.generate(
        inputs,
        max_new_tokens = 50
    )
    output = tokenizer.decode(model_completion[0], skip_special_tokens=True)

    print(dash_line)
    print("Example ", i+1)
    print(dash_line)
    print(f"INPUT PROMPT:\n{dialogue}")
    print(dash_line)
    print(f"BASELINE HUMAN SUMMARY:\n{summary}")
    print(dash_line)
    print(f"MODEL GENERATION - WITHOUT ANY PROMPT ENGINEERING:\n{output}\n")


    

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - WITHOUT ANY PROMPT ENGINEERING:
Person1: It's ten to nine.

---------------------------

You can see that the guesses of the model make some sense, but it doesn't seem to be sure what task it is supposed to accomplish. Prompt engineering can help here.

## 5 - Summarize Dialogue with an Instruction Prompt

Prompt engineering is an important concept in using foundation models for text generation. [This blog](https://www.amazon.science/blog/emnlp-prompt-engineering-is-the-new-feature-engineering) from Amazon Science is a quick introduction to prompt engineering.

### 5.1 - Zero Shot Inference with an Instruction Prompt

In order to instruct the model to perform a task - summarize a dialogue - we can take the dialogue and convert it into an instruction prompt. 
This is often called zero shot inference. We can check out [this blog](https://aws.amazon.com/blogs/machine-learning/zero-shot-prompting-for-the-flan-t5-foundation-model-in-amazon-sagemaker-jumpstart/) from AWS for a quick description of what zero shot learning is and why it is an important concept to the LLM model.

Wrap the dialogue in a descriptive instruction and see how the generated text will change:

In [68]:
for i, index in enumerate(example_indices):
    dialogue = dataset[Dataset_Splits.TEST.value][index][Dataset_Columns.DIALOGUE.value]
    summary = dataset[Dataset_Splits.TEST.value][index][Dataset_Columns.SUMMARY.value]

    prompt = f"""
Summarize the following conversation:
{dialogue}

Summary:
    """

    inputs = tokenizer.encode(prompt, return_tensors='pt')
    model_completion = model.generate(
        inputs,
        max_new_tokens = 50
    )
    output = tokenizer.decode(model_completion[0], skip_special_tokens=True)

    print(dash_line)
    print("Example ", i+1)
    print(dash_line)
    print(f"INPUT PROMPT:\n{prompt}")
    print(dash_line)
    print(f"BASELINE HUMAN SUMMARY:\n{summary}")
    print(dash_line)
    print(f"MODEL GENERATION - ZERO SHOT WITH INSTRUCTION PROMPT:\n{output}\n")

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:

Summarize the following conversation:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

Summary:
    
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - ZERO SHOT WITH INSTRUCTION PROMPT

This is much better! But the model still does not pick up on the nuance of the conversations though.

#### Exercise:

- Experiment with the `prompt` text and see how the inferences will be changed. Will the inferences change if you end the prompt with just empty string vs. `Summary:`?
- Try to rephrase the beginning of the `prompt` text from `Summarize the following conversation`. to something different - and see how it will influence the generated output.


#### Exercise Solution:

In [64]:
for i, index in enumerate(example_indices):
    dialogue = dataset[Dataset_Splits.TEST.value][index][Dataset_Columns.DIALOGUE.value]
    summary = dataset[Dataset_Splits.TEST.value][index][Dataset_Columns.SUMMARY.value]

    prompt = f"""
Summarize the following conversation:
{dialogue}
    """

    inputs = tokenizer.encode(prompt, return_tensors='pt')
    model_completion = model.generate(
        inputs,
        max_new_tokens = 50
    )
    output = tokenizer.decode(model_completion[0], skip_special_tokens=True)

    print(dash_line)
    print("Example ", i+1)
    print(dash_line)
    print(f"INPUT PROMPT:\n{prompt}")
    print(dash_line)
    print(f"BASELINE HUMAN SUMMARY:\n{summary}")
    print(dash_line)
    print(f"MODEL GENERATION - ZERO SHOT WITH INSTRUCTION PROMPT:\n{output}\n")

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:

Summarize the following conversation:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
    
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - ZERO SHOT WITH INSTRUCTION PROMPT:
The trai

Not having `Summary:` in the prompt didn't affect the 1st example, but the second example is worse than it was before.

In [65]:
for i, index in enumerate(example_indices):
    dialogue = dataset[Dataset_Splits.TEST.value][index][Dataset_Columns.DIALOGUE.value]
    summary = dataset[Dataset_Splits.TEST.value][index][Dataset_Columns.SUMMARY.value]

    prompt = f"""
Summarize the following conversation to capture the essence of what is going on:
{dialogue}

Summary:
    """

    inputs = tokenizer.encode(prompt, return_tensors='pt')
    model_completion = model.generate(
        inputs,
        max_new_tokens = 50
    )
    output = tokenizer.decode(model_completion[0], skip_special_tokens=True)

    print(dash_line)
    print("Example ", i+1)
    print(dash_line)
    print(f"INPUT PROMPT:\n{prompt}")
    print(dash_line)
    print(f"BASELINE HUMAN SUMMARY:\n{summary}")
    print(dash_line)
    print(f"MODEL GENERATION - ZERO SHOT WITH INSTRUCTION PROMPT:\n{output}\n")

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:

Summarize the following conversation to capture the essence of what is going on:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

Summary:
    
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GEN

In [66]:
for i, index in enumerate(example_indices):
    dialogue = dataset[Dataset_Splits.TEST.value][index][Dataset_Columns.DIALOGUE.value]
    summary = dataset[Dataset_Splits.TEST.value][index][Dataset_Columns.SUMMARY.value]

    prompt = f"""
Tell me the topic of the conversation. Pick up to 3 topics:
{dialogue}

Topics(s):
    """

    inputs = tokenizer.encode(prompt, return_tensors='pt')
    model_completion = model.generate(
        inputs,
        max_new_tokens = 50
    )
    output = tokenizer.decode(model_completion[0], skip_special_tokens=True)

    print(dash_line)
    print("Example ", i+1)
    print(dash_line)
    print(f"INPUT PROMPT:\n{prompt}")
    print(dash_line)
    print(f"BASELINE HUMAN SUMMARY:\n{summary}")
    print(dash_line)
    print(f"MODEL GENERATION - ZERO SHOT WITH INSTRUCTION PROMPT:\n{output}\n")

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:

Tell me the topic of the conversation. Pick up to 3 topics:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

Topics(s):
    
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - ZERO SHOT

### 5.2 - Zero Shot Inference with Prompt Templates from FLAN-T5

FLAN-T5 has many prompt templates that are published for certain tasks [here](https://github.com/google-research/FLAN/tree/main/flan/v2). In the following code, you will use one of the [pre-built FLAN-T5 prompts](https://github.com/google-research/FLAN/blob/main/flan/v2/templates.py):

In [69]:
for i, index in enumerate(example_indices):
    dialogue = dataset[Dataset_Splits.TEST.value][index][Dataset_Columns.DIALOGUE.value]
    summary = dataset[Dataset_Splits.TEST.value][index][Dataset_Columns.SUMMARY.value]

    prompt = f"""
Dialogue:
{dialogue}

What was going on?
    """

    inputs = tokenizer.encode(prompt, return_tensors='pt')
    model_completion = model.generate(
        inputs,
        max_new_tokens = 50
    )
    output = tokenizer.decode(model_completion[0], skip_special_tokens=True)

    print(dash_line)
    print("Example ", i+1)
    print(dash_line)
    print(f"INPUT PROMPT:\n{prompt}")
    print(dash_line)
    print(f"BASELINE HUMAN SUMMARY:\n{summary}")
    print(dash_line)
    print(f"MODEL GENERATION - ZERO SHOT WITH PROMPT TEMPLATE:\n{output}\n")

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:

Dialogue:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

What was going on?
    
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - ZERO SHOT WITH PROMPT TEMPLATE:
Tom is late for the

This prompt from FLAN-T5 did help a bit, but still struggles to pick up on the nuance of the conversation. This is what we will try to solve with the few shot inferencing.