# Generative AI Use Case: Summarize Dialogue

The goal is to do a dialogue summarization task using generative AI. The impacts from the given inputs to the outputs of the model will be evaluated as well the use of prompt engineering. Let's compare zero shot, one shot and few shot inferences.

### Table of contents

* 1 - Set up Kernel and Required Dependencies
* 2 - Summarize Dialogue without Prompt Engineering
* 3 - Summarize Dialogue with an Instruction Prompt
    * 3.1 - Zero Shot Inference with an Instruction Prompt
    * 3.2 - Zero Shot Inference with the Prompt Template from FLAN-T5
* 4 - Summarize Dialogue with One Shot and Few Shot Inference
    * 4.1 - One Shot Inference
    * 4.2 - Few Shot Inference
* 5 - Generative Configuration Parameters for Inference

### 1 - Set up Kernel and Required Dependencies

In [5]:
# %pip install --upgrade pip
# %pip install --disable-pip-version-check \
#     torch==1.13.1 \
#     torchdata==0.5.1 --quiet

# %pip install \
#     transformers==4.27.2 \
#     datasets==2.11.0 -- quiet


Load the datasets, Large Language Model (LLM), tokenizer and configurator.

In [6]:
from datasets import load_dataset
from transformers import AutoTokenizer
from transformers import AutoModelForSeq2SeqLM
from transformers import GenerationConfig

### 2 - Summarize Dialogue without Prompt Engineering

Let's generate a summary of a dialogue with the pre-trained LLM FLAN-T5 from Hugging Face. The list of available models in the Hugging Face `transformers` package can be found [here](https://huggingface.co/docs/transformers/index).

Also let's upload some simple dialogues from the [DialogSum](https://huggingface.co/datasets/knkarthick/dialogsum) Hugging Face dataset. This dataset contains dialogues from the [DailyDialog](https://arxiv.org/abs/1710.03957) dataset.

In [8]:
huggingface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(huggingface_dataset_name)

Downloading readme: 100%|██████████| 4.56k/4.56k [00:00<00:00, 1.91MB/s]
Downloading data: 100%|██████████| 11.3M/11.3M [00:03<00:00, 3.23MB/s]
Downloading data: 100%|██████████| 442k/442k [00:00<00:00, 1.25MB/s]]
Downloading data: 100%|██████████| 1.35M/1.35M [00:00<00:00, 2.78MB/s]
Downloading data files: 100%|██████████| 3/3 [00:04<00:00,  1.48s/it]
Extracting data files: 100%|██████████| 3/3 [00:00<00:00, 1131.96it/s]
Generating train split: 12460 examples [00:00, 99903.71 examples/s]
Generating validation split: 500 examples [00:00, 85222.37 examples/s]
Generating test split: 1500 examples [00:00, 147030.99 examples/s]


Some dialogues with their baseline summaries

In [9]:
example_indices = [40, 200]

dash_line = '-'.join('' for x in range(100))

for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example {}:'.format(i+1))
    print(dash_line)
    print('INPUT DIALOGUE:')
    print(dataset['test'][index]['dialogue'])
    print(dash_line)
    print('SUMMARY:')
    print(dataset['test'][index]['summary'])
    print(dash_line)
    print()

---------------------------------------------------------------------------------------------------
Example 1:
---------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------
Example 2:
-------

Load the [FLAN-T5 model](https://huggingface.co/docs/transformers/model_doc/flan-t5), creating an instance of the `AutoModelForSeq2SeqLM` class with the `.from_pretrained()` method

In [10]:
model_name='google/flan-t5-base'

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

Downloading (…)lve/main/config.json: 100%|██████████| 1.40k/1.40k [00:00<00:00, 4.73MB/s]
Downloading model.safetensors: 100%|██████████| 990M/990M [00:30<00:00, 33.0MB/s] 
Downloading (…)neration_config.json: 100%|██████████| 147/147 [00:00<00:00, 320kB/s]


To perform encoding and decoding, we need to work with text in a tokenized form. **Tokenization** is the proess of splitting texts into smaller units that can be processed as by the LLM models.

We'll download the tokenizer for the FLAN-T5 model using AutoTokenizer.from_pretrained() method. Parameter `use_fast` switches on fast tokenizer. At this stage, there is no need to go into the details of that but further documentation is available [here](https://huggingface.co/docs/transformers/v4.28.1/en/model_doc/auto#transformers.AutoTokenizer).

In [11]:
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

Downloading (…)okenizer_config.json: 100%|██████████| 2.54k/2.54k [00:00<00:00, 7.55MB/s]
Downloading spiece.model: 100%|██████████| 792k/792k [00:00<00:00, 1.40MB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████| 2.42M/2.42M [00:00<00:00, 5.37MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 2.20k/2.20k [00:00<00:00, 6.81MB/s]


Test the tokenizer encoding and decoding a simple sentence:

In [12]:
sentence = "What time is it, Tom?"

sentence_encoded = tokenizer(sentence, return_tensors="pt")

sentence_decoded = tokenizer.decode(
    sentence_encoded["input_ids"][0], skip_special_tokens=True)

print('ENCODED SENTENCE:')
print(sentence_encoded["input_ids"][0])
print('DECODED SENTENCE:')
print(sentence_decoded)

ENCODED SENTENCE:
tensor([ 363,   97,   19,   34,    6, 3059,   58,    1])
DECODED SENTENCE:
What time is it, Tom?


Now it's time to explore how well the base LLM summarizes a dialogue without any prompt engineering. **Prompt engineering** is an act of a human changing the **prompt** (input) to improve the response for a given task.

In [13]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    inputs = tokenizer(dialogue, return_tensors='pt')
    output = tokenizer.decode(
        model.generate(
            inputs['input_ids'],
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{dialogue}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:
#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - WITHOUT PROMPT ENGINEERING:
Person1: It's ten to nine.

-------------------------------

The guesses of the model make some sense, but it doesn't seem to be sure what taks it is supposed to accomplish. Seems it just makes up the next sentence in the dialogue. Prompt engineering might help here.

### 3 - Summarize Dialogue with an Instruction Prompt

Prompt engineering is an important concept in using foundation models for text generation. [This blog](https://www.amazon.science/blog/emnlp-prompt-engineering-is-the-new-feature-engineering) from Amazon Science explains the concept well.

#### 3.1 - Zero Shot Inference with an Instruction Prompt

In order to instruct the model to perform a task - summarize a dialogue - we can take the dialogue and convert it into an instruction prompt. This often is called **zero shot inference**. [This blog from AWS](https://aws.amazon.com/blogs/machine-learning/zero-shot-prompting-for-the-flan-t5-foundation-model-in-amazon-sagemaker-jumpstart/) has a quick description of what zero shot learning is and why it is important concept to the LLM model.

Wrap the dialogue in a descriptive instruction and see how the generated text will change:

In [15]:
for i, index in enumerate(example_indices):
    dialogue = dataset['test'][index]['dialogue']
    summary = dataset['test'][index]['summary']

    prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
    """

    # Input constructed prompt instead of the dialogue.
    inputs = tokenizer(prompt, return_tensors='pt')
    output = tokenizer.decode(
        model.generate(
            inputs['input_ids'],
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True
    )

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'INPUT PROMPT:\n{prompt}')
    print(dash_line)
    print(f'BASELINE HUMAN SUMMARY:\n{summary}')
    print(dash_line)
    print(f'MODEL GENERATION - ZERO SHOT:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:

Summarize the following conversation.

#Person1#: What time is it, Tom?
#Person2#: Just a minute. It's ten to nine by my watch.
#Person1#: Is it? I had no idea it was so late. I must be off now.
#Person2#: What's the hurry?
#Person1#: I must catch the nine-thirty train.
#Person2#: You've plenty of time yet. The railway station is very close. It won't take more than twenty minutes to get there.

Summary:
    
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# is in a hurry to catch a train. Tom tells #Person1# there is plenty of time.
---------------------------------------------------------------------------------------------------
MODEL GENERATION - ZERO SHOT:
The train is about to

This is much better but the model still does not pick up on the nuance of the conversations though.