# Generative AI Prompt Engineering

In this lab we will use a famous Encoder-Decoder LLM: Flan-T5. You will first do simple tasks to get your hands dirty.

Then you will learn about few shot prompting, and see how at a certain point the LLM just cannot do the task.

You will finish by testing the different possible configurations.

## Install Required Dependencies

Now install the required packages to use Hugging Face transformers and datasets.

In [None]:
!pip install --upgrade pip
!pip install transformers==4.35.2 datasets==2.15.0  --quiet

Collecting pip
  Downloading pip-25.1.1-py3-none-any.whl.metadata (3.6 kB)
Downloading pip-25.1.1-py3-none-any.whl (1.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m58.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.1.2
    Uninstalling pip-24.1.2:
      Successfully uninstalled pip-24.1.2
Successfully installed pip-25.1.1
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.9/7.9 MB[0m [31m115.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.6/3.6 MB[0m [31m94.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5/5[0m [datasets]
[1A[2K[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gcsfs 2025.3.2 requires fsspec==2025.

Load the datasets, Large Language Model (LLM), tokenizer, and configurator. Do not worry if you do not understand yet all of those components - they will be described and discussed later in the notebook.

In [None]:
from datasets import load_dataset
from transformers import TFAutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig

  _torch_pytree._register_pytree_node(


## Doing Simple Tasks with Flan-T5

In this case we wil do simple sentiment analysis so you get the gist of how to use these LLMs. You will use the pre-trained Large Language Model (LLM) FLAN-T5 from Hugging Face. The list of available models in the Hugging Face `transformers` package can be found [here](https://huggingface.co/docs/transformers/index)

In [None]:
huggingface_dataset_name = "imdb"

dataset = load_dataset(huggingface_dataset_name)

Downloading readme: 0.00B [00:00, ?B/s]

Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

In [None]:
dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label'],
        num_rows: 50000
    })
})

Let's use from the train set, but it is the same for us now

In [None]:
import numpy as np
def get_random_review_and_label():
  random_index = np.random.randint(1, 25000)
  random_review = dataset['train'][random_index]['text']
  label = dataset['train'][random_index]['label']
  return random_review, label

random_review, label = get_random_review_and_label()

dash_line = '-'.join('' for x in range(100))

print(f'Review: \n\n{random_review}')
print(dash_line)
print(f'Label: {label}')

Review: 

This movie was very enjoyable, though you'll only like it if: - you hate going to the dentist but aren't afraid of a movie where one of them goes beserk - you love horror movies<br /><br />I particularly liked the fact that some care was given to explaining the brute actions of the main character. The fact that he's totally obsessed by cleanliness (especially in the mouth) and then catches his wives providing some oral pleasure to the mud-covered pool-man is a pretty believable reason to go overboard.<br /><br />Liked it. I give it an 8.
---------------------------------------------------------------------------------------------------
Label: 1


Let's now use the model! For that we need to use the Tokenizer to transform the text into the "model language" (more on this during the course). Also we need to download the model.

In [None]:
model_name='google/flan-t5-base'

from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig # Keep this import as it's used later


model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

In [None]:
sentence = random_review[:50]
print(f'Review trimmed: {sentence}')

sentence_encoded = tokenizer(sentence)

sentence_decoded = tokenizer.decode(
        sentence_encoded["input_ids"],
        skip_special_tokens=True
    )

print('ENCODED SENTENCE:')
print(sentence_encoded["input_ids"])
print('\nDECODED SENTENCE:')
print(sentence_decoded)

Review trimmed: This movie was very enjoyable, though you'll only 
ENCODED SENTENCE:
[100, 1974, 47, 182, 9231, 6, 713, 25, 31, 195, 163, 3, 1]

DECODED SENTENCE:
This movie was very enjoyable, though you'll only 


Now let's call the model. As this is a TFAutoModelForSeq2SeqLM this means that is a LLM for seq2seq tasks, like summarizing or text generation, so let's put our prompt that way.

In [None]:
import tensorflow as tf
review, label = get_random_review_and_label()

prompt = f"""
Analyze the sentiment of the following review:

{review}

Sentiment:

"""

input = tokenizer(prompt)

In [None]:
print(prompt)


Analyze the sentiment of the following review:

Throughout this film, you might think this film is just for kids. Well, it is mainly pointed towards them, but it's also well-rounded enough with the jokes pointed also at the adults in the audience. This time around, the Muppet gang try to get on Broadway, with the dire straits keeping them from getting it produced, leading them to splitting up. But Kermit won't stop, and his determination keeps things moving along until after getting the deal together he gets hit by a car and sent into amnesia! <br /><br />It's a send-up, in part, of those old starring vehicles from the 40s with musicals actually as the topic of a musical, only here there's the usual lot of zaniness and wonderful moments thrown into a pot of hysterically funny moments (Lou Zealand's boomerang fish; Gonzo's water-stunt display, the whisper campaign, among many others), but also with a lot of heart too. The Muppet writers aren't shy of the conventions, on the contrary, t

In [None]:
import torch
model.generate(torch.tensor([input['input_ids']]), max_new_tokens=50)

tensor([[   0, 1465,    1]])

In [None]:
import torch
tokenizer.decode(
        model.generate(torch.tensor([input['input_ids']]), max_new_tokens=50)[0],
        skip_special_tokens=True
    )

'positive'

And what was the real sentiment? Remember in this dataset `0` is negative and `1` is positive

In [None]:
label

1

## Summarize News without Prompt Engineering

In this use case, you will be generating a summary of news with Flan-T5.

Let's upload some simple dialogues from the dialogsum Hugging Face dataset. This dataset contains 10,000+ articles with the corresponding manually labeled summaries.

In [None]:
huggingface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(huggingface_dataset_name)

Downloading readme: 0.00B [00:00, ?B/s]

Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/11.3M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/442k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

  return pd.read_csv(xopen(filepath_or_buffer, "rb", download_config=download_config), **kwargs)


Generating validation split: 0 examples [00:00, ? examples/s]

  return pd.read_csv(xopen(filepath_or_buffer, "rb", download_config=download_config), **kwargs)


Generating test split: 0 examples [00:00, ? examples/s]

  return pd.read_csv(xopen(filepath_or_buffer, "rb", download_config=download_config), **kwargs)


In [None]:
dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
})

Print a couple of dialogues with their baseline summaries.

In [None]:
def get_random_dialogue_and_summary():
  random_index = np.random.randint(1, 10000)
  random_dialogue = dataset['train'][random_index]['dialogue']
  summary = dataset['train'][random_index]['summary']
  return random_dialogue, summary

random_dialogue, summary = get_random_dialogue_and_summary()

dash_line = '-'.join('' for x in range(100))

print(f'Dialogue: \n\n{random_dialogue}')
print(dash_line)
print(f'Summary: {summary}')

Dialogue: 

#Person1#: What's your new girlfriend like?
#Person2#: Katherine? Well, she's good at languages.
#Person1#: Does she know how to speak Spanish?
#Person2#: She knows how to speak Spanish and Japanese.
#Person1#: Wow!!!
#Person2#: And she's good at sports, too. She knows how to play tennis and basketball.
#Person1#: That's terrific!
#Person2#: But there's one thing she's not good at.
#Person1#: What's that?
#Person2#: She's not good at remembering things. We have a date, and she's an hour late!!
---------------------------------------------------------------------------------------------------
Summary: #Person2# tells #Person1# that his girlfriend is good at languages and sports but bad at remembering things.


Test the tokenizer encoding and decoding a simple sentence:

Now it's time to explore how well the base LLM summarizes a dialogue without any prompt engineering. **Prompt engineering** is an act of a human changing the **prompt** (input) to improve the response for a given task.

In [None]:
import torch
for i in range(3):
    dialogue, summary = get_random_dialogue_and_summary()
    prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
    """
    inputs = tokenizer(prompt)
    output = tokenizer.decode(
        model.generate(torch.tensor([inputs['input_ids']]), max_new_tokens=50)[0],
        skip_special_tokens=True
    )

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'Dialogue:\n{dialogue}')
    print(dash_line)
    print(f'Summary:\n{summary}')
    print(dash_line)
    print(f'Model Summary - Without prompt engineering:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
Dialogue:
#Person1#: Rose, Christmas is coming soon. What presents shall we buy for the children?
#Person2#: What about a bike for John? He's been asking for one for a long time.
#Person1#: But I don't think he's old enough to ride a bike to school. Let's buy him a football instead alright?
#Person2#: OK, what should we buy for Jane?
#Person1#: Well, she likes music very much. Shall we buy her a guitar?
#Person2#: I think an MP3 player will be better. It can help her learn Chinese.
#Person1#: Then let's buy one for her. Now what about little Jack?
#Person2#: Well, he's still a young baby. I think a toy car is best for him.
#Person1#: I couldn't agree more. When shall we go and buy the presents?
#Person2#: Well, tomorrow is Sunday. Let's go shopping tomorrow afternoon after we se

You can see that the guesses of the model make some sense, but it doesn't seem to be sure what task it is supposed to accomplish. Seems it just makes up the next sentence in the dialogue. Prompt engineering can help here.

## Summarize Dialogue with an Instruction Prompt

Prompt engineering is an important concept in using foundation models for text generation.

<a name='3.1'></a>
### 3.1 - Zero Shot Inference with an Instruction Prompt

In order to instruct the model to perform a task - summarize a dialogue - you can take the dialogue and convert it into an instruction prompt. This is often called **zero shot inference**.  
Wrap the dialogue in a descriptive instruction and see how the generated text will change:

In [None]:
import torch
for i in range(3):
    dialogue, summary = get_random_dialogue_and_summary()
    prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
    """
    inputs = tokenizer(prompt)
    output = tokenizer.decode(
        model.generate(torch.tensor([inputs['input_ids']]), max_new_tokens=50)[0],
        skip_special_tokens=True
    )

    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print(f'Dialogue:\n{dialogue}')
    print(dash_line)
    print(f'Summary:\n{summary}')
    print(dash_line)
    print(f'Model Summary - Zero shot inference prompt engineering:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
Dialogue:
#Person1#: Well, good morning, Tom. I haven't seen you for a long time.
#Person2#: I'd been feeling pretty well until just a few days ago.
#Person1#: What seems to be the trouble now?
#Person2#: I feel run down, tired. I've been having headaches almost every day. And I'm not getting as much sleep as usually do.
#Person1#: Have you been eating properly? Eating the right kind of food is important for your health, you know.
#Person2#: Well, I haven't been eating well, I guess. I usually only have enough time to grab a sandwich and a cup of coffee for lunch.
#Person1#: And what about dinner?
#Person2#: Sometimes I'm too tired to eat anything at all.
#Person1#: That's not good. You don't have a well-balanced diet. Have you been taking vitamin pills?
#Person2#: I don't like 

This is much better! But the model still does not pick up on the nuance of the conversations though.

## Summarize Dialogue with One Shot and Few Shot Inference

**One shot and few shot inference** are the practices of providing an LLM with either one or more full examples of prompt-response pairs that match your task - before your actual prompt that you want completed. This is called "in-context learning" and puts your model into a state that understands your specific task.  

## One Shot Inference



In [None]:
def make_prompt_and_return_real_summary(number_of_shots):
    prompt = ''
    for i in range(number_of_shots):
        dialogue, summary = get_random_dialogue_and_summary()

        # The stop sequence '{summary}\n\n\n' is important for FLAN-T5. Other models may have their own preferred stop sequence.
        prompt += f"""
Dialogue:

{dialogue}

Summary:

{summary}


"""

    dialogue_to_analise , real_summary = get_random_dialogue_and_summary()

    prompt += f"""
Dialogue:

{dialogue_to_analise}

Summary:

"""

    return prompt, real_summary

Construct the prompt to perform one shot inference:

In [None]:
one_shot_prompt, real_summary = make_prompt_and_return_real_summary(1)

print(one_shot_prompt)


Dialogue:

#Person1#: You should save some money on parking here.
#Person2#: Yeah, I don't have to pay for a space on the street.
#Person1#: Really? How long did it take you to find a spot yesterday?
#Person2#: Well, last night it took me half an hour to find a spot when I came home from work.
#Person1#: You get home late, don't you?
#Person2#: Yeah, around seven. Most of the street parking is gone by then.
#Person1#: Ah, well. You can't have everything.
#Person2#: Yeah. I can live with it. It's great to be living alone.

Summary:

#Person1# tells #Person2# #Person1# saves money by parking on the street though it's hard to find a spot.



Dialogue:

#Person1#: How may I help you. sir?
#Person2#: I'm wondering if anyone has turned in a train ticket. I just lost my ticket for Beijing tonight.
#Person1#: Let me see. I'm sorry. Nothing's been turned in. Do you want to buy another one?
#Person2#: Yes. If I don't make it to Beijing tomorrow morning. my wife would kill me. How much does it c

Now pass this prompt to perform the one shot inference:

In [None]:
import torch
for i in range (3):
  one_shot_prompt, real_summary = make_prompt_and_return_real_summary(1)
  inputs = tokenizer(one_shot_prompt, return_tensors='pt', truncation=True, max_length=512)
  output = tokenizer.decode(
      model.generate(inputs['input_ids'], max_new_tokens=50)[0],
      skip_special_tokens=True
  )

  print(dash_line)
  print(f'Example {i + 1}')
  print(dash_line)
  print(f'Dialogue:\n{one_shot_prompt}')
  print(dash_line)
  print(f'Summary:\n{real_summary}')
  print(dash_line)
  print(f'Model Summary - One shot inference prompt engineering:\n{output}\n')

---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
Dialogue:

Dialogue:

#Person1#: I am not sure what to do to get ready for my job interview.
#Person2#: Make sure that you understand the company. Do you understand what it is that they do?
#Person1#: No, I probably need to do some more research.
#Person2#: When you've finished your research it will help you figure out whether your company is rigid in philosophy or kind of more relaxed. Does that make sense?
#Person1#: I think that their attitude is rather casual.
#Person2#: So all of that information will help you to pick out what to wear. Do you have something to wear?
#Person1#: I have absolutely nothing so far.
#Person2#: You know I could go shopping with you sometime if you need it, but can we talk about other basics?
#Person1#: Yes, where should we go from here?
#Person2#: 

### Few Shot Inference

Let's explore few shot inference by adding two more full dialogue-summary pairs to your prompt.

In [None]:
import torch
for i in range (3):
  few_shot_prompt, real_summary = make_prompt_and_return_real_summary(5)
  inputs = tokenizer(few_shot_prompt, return_tensors='pt', truncation=True, max_length=512)
  output = model.generate(inputs['input_ids'], max_new_tokens=50)
  decoded_output = tokenizer.decode(
      output[0],
      skip_special_tokens=True
  )

  print(dash_line)
  print(f'Example {i + 1}')
  print(dash_line)
  print(f'Dialogue:\n{few_shot_prompt}')
  print(dash_line)
  print(f'Summary:\n{real_summary}')
  print(dash_line)
  print(f'Model Summary - Few shot inference prompt engineering:\n{decoded_output}\n')

---------------------------------------------------------------------------------------------------
Example 1
---------------------------------------------------------------------------------------------------
Dialogue:

Dialogue:

#Person1#: I've been worried that Richard is frozen.
#Person2#: What sounds to be a problem?
#Person1#: Well, he has trouble concentrating when getting along with other children. I was wondering there might be something on his mind. Some problem at home?

Summary:

#Person1# tells #Person2# Richard has trouble concentrating when getting along with other children.



Dialogue:

#Person1#: What's this then?
#Person2#: It's my geography, sir. The Map of Africa you set us.
#Person1#: But this should have been handed in last Thursday.
#Person2#: Yes, I know, sir. I'm sorry.
#Person1#: Well, what's your excuse then?
#Person2#: My mother's been ill and I had to stay at home.
#Person1#: Oh, Yes?
#Person2#: It's true, sir.

Summary:

#Person2# explains why #Person2# 

In this case, few shot did not provide much of an improvement over one shot inference.  And, anything above 5 or 6 shot will typically not help much, either.  Also, you need to make sure that you do not exceed the model's input-context length which, in our case, if 512 tokens.  Anything above the context length will be ignored.

However, you can see that feeding in at least one full example (one shot) provides the model with more information and qualitatively improves the summary overall.

## Configuration Parameters

In [None]:
import torch
generation_config = GenerationConfig(max_new_tokens=100, do_sample=True, temperature=2.0)

inputs = tokenizer(few_shot_prompt)
output = tokenizer.decode(
    model.generate(torch.tensor([inputs['input_ids']]), generation_config=generation_config)[0],
    skip_special_tokens=True
)

print(dash_line)
print(f'MODEL GENERATION - FEW SHOT:\n{output}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{real_summary}\n')

---------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT:
They are hungry but are still hungry on what to see in order for coffee while the meal passes. Person1 might see a picture taken by the waitress of some Strawberry Tart and convince me to bring that side of the dessert
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# asks #Person2# what to eat and what to drink.



Comments related to the choice of the parameters in the code cell above:
- Choosing `max_new_tokens=10` will make the output text too short, so the dialogue summary will be cut.
- Putting `do_sample = True` and changing the temperature value you get more flexibility in the output.

As you can see, prompt engineering can take you a long way for this use case, but there are some limitations.