In this exercise, you will perform prompt engineering on a dialogue summarization task using [Flan-T5](https://huggingface.co/google/flan-t5-large) and the [dialogsum dataset](https://huggingface.co/datasets/knkarthick/dialogsum). You will explore how different prompts affect the output of the model, and compare zero-shot and few-shot inferences. <br/>
Complete the code in the cells below.

### 1. Set up Required Dependencies

In [None]:
!pip install datasets -q

In [None]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig
from datasets import load_dataset

### 2. Explore the Dataset

In [None]:
from datasets import load_dataset

dataset = load_dataset('knkarthick/dialogsum')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

train.csv:   0%|          | 0.00/11.3M [00:00<?, ?B/s]

validation.csv: 0.00B [00:00, ?B/s]

test.csv: 0.00B [00:00, ?B/s]

Generating train split:   0%|          | 0/12460 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/500 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1500 [00:00<?, ? examples/s]

Print several dialogues with their baseline summaries.

In [None]:
example_indices = [0, 42, 800]
dash_line = '-' * 100

for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example', i + 1)
    print(dash_line)
    print('INPUT DIALOGUE:')
    dialogue = dataset['test'][index]['dialogue']# [index]:selects a specific example from the test set, index is the position number (like 0, 42, or 800)
    print(dialogue)
    print(dash_line)
    print('BASELINE HUMAN SUMMARY:')
    print(dataset['test'][index]['summary'])
    print(dash_line)
    print()

----------------------------------------------------------------------------------------------------
Example 1
----------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message programs by employees during working hours is strictly prohibited.
#Person2#: Sir, does this apply to intra-office communications only? Or will it also restrict external communications?
#Person1#: It should apply to all communications, not only in this office between employees, but also any outside communications.
#Person2#: But sir, many employees use Instant Messaging to 

### 3. Summarize Dialogues without Prompt Engineering

Load the Flan-T5-large model and its tokenizer.

In [None]:
model_name = 'google/flan-t5-large'

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

**Exercise**: Use the pre-trained model to summarize the example dialogues without any prompt engineering. Use the `model.generate()` function with `max_new_tokens=50`.

In [None]:
### WRITE YOUR CODE HERE

def generate_summary(dialogue_text):
    """
    Generate summary for a given dialogue text
    Flow: dialogue text -> tokenize -> model generate -> decode -> summary
    """
    # Step 1: Tokenize the dialogue text
    # This converts text to token IDs that the model can understand
    inputs = tokenizer(dialogue_text, return_tensors="pt")

    # Step 2: Model generates output token IDs
    # Model takes input token IDs and generates summary token IDs
    outputs = model.generate(
        inputs["input_ids"],
        max_new_tokens=50  # Greedy decoding (default) - always picks the most probable next token
    )

    # Step 3: Decode the output token IDs back to text
    # This converts the generated token IDs back to human-readable text
    generated_summary = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return generated_summary

# Test with different examples
example_indices = [0, 42, 800]
dash_line = '-' * 50

for i, index in enumerate(example_indices):
    print(f"Example {i + 1} (Index {index}):")
    dialogue = dataset['test'][index]['dialogue']
    summary = generate_summary(dialogue)
    print(f"Generated Summary: {summary}")
    print(f"Human Summary: {dataset['test'][index]['summary']}")
    print(dash_line)




Example 1 (Index 0):
Generated Summary: #Person1: Ms. Dawson, please take dictation for me.
Human Summary: Ms. Dawson helps #Person1# to write a memo to inform every employee that they have to change the communication method and should not use Instant Messaging anymore.
--------------------------------------------------
Example 2 (Index 42):
Generated Summary: #Person1#: Thank you, #Person2#.
Human Summary: #Person1# wants to adjust #Person1#'s life and #Person2# suggests #Person1# be positive and stay healthy.
--------------------------------------------------
Example 3 (Index 800):
Generated Summary: #Person1#: That's great.
Human Summary: #Person2# tells #Person1# about the relationships between their family and the uncle Bill's, who will visit them next year.
--------------------------------------------------


You can see that the model generations make some sense, but the model doesn't seem to be sure what task it is supposed to accomplish and it often just makes up the next sentence in the dialogue. Prompt engineering can help here.

### 4. Summarize Dialogues with Instruction Prompts

In order to instruct the model to perform a task (e.g., summarize a dialogue), you can take the dialogue and convert it into an instruction prompt. This is often called **zero-shot inference**.

**Exercise**: Wrap the dialogues in a descriptive instruction (e.g., "Summarize the following conversation."), and examine how the generated text changes.

In [None]:
### WRITE YOUR CODE HERE

"""
Generate summary with instruction prompt (zero-shot inference)
"""
def generate_summary_with_prompt(dialogue_text, instruction_prompt="Summarize the following conversation."):

    # Create the full prompt: instruction + dialogue
    full_prompt = f"{instruction_prompt}\n\n{dialogue_text}"

    # Tokenize the full prompt
    inputs = tokenizer(full_prompt, return_tensors="pt")

    # Generate summary
    outputs = model.generate(
        inputs["input_ids"],
        max_new_tokens=50
    )

    # Decode the generated summary
    generated_summary = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return generated_summary

# Test with different instruction prompts
example_indices = [0, 42, 800]
instruction_prompts = [
    "Summarize the following conversation.",
    "Please provide a brief summary of this dialogue.",
    "What is the main point of this conversation?",
    "Summarize the following conversation in one sentence."
]

dash_line = '-' * 80

# Test each example with different prompts
for i, index in enumerate(example_indices):
    print(f"\n{'='*80}")
    print(f"EXAMPLE {i + 1} (Index {index})")
    print(f"{'='*80}")

    dialogue = dataset['test'][index]['dialogue']
    print("ORIGINAL DIALOGUE:")
    print(dialogue[:200] + "..." if len(dialogue) > 200 else dialogue)
    print(dash_line)

    print("HUMAN SUMMARY:")
    print(dataset['test'][index]['summary'])
    print(dash_line)

    # Test without prompt (for comparison)
    no_prompt_summary = generate_summary(dialogue)  # Using the function from earlier
    print("WITHOUT INSTRUCTION PROMPT:")
    print(f"Generated: {no_prompt_summary}")
    print(dash_line)

    # Test with different instruction prompts
    print("WITH INSTRUCTION PROMPTS:")
    for j, prompt in enumerate(instruction_prompts):
        summary = generate_summary_with_prompt(dialogue, prompt)
        print(f"Prompt {j+1}: '{prompt}'")
        print(f"Generated: {summary}")
        print()


EXAMPLE 1 (Index 0)
ORIGINAL DIALOGUE:
#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Per...
--------------------------------------------------------------------------------
HUMAN SUMMARY:
Ms. Dawson helps #Person1# to write a memo to inform every employee that they have to change the communication method and should not use Instant Messaging anymore.
--------------------------------------------------------------------------------
WITHOUT INSTRUCTION PROMPT:
Generated: #Person1: Ms. Dawson, please take dictation for me.
--------------------------------------------------------------------------------
WITH INSTRUCTION PROMPTS:
Prompt 1: 'Summarize the following conversation.'
Generated: #Person1#: Ms. Dawson, please take dictation for me. #Person1#: All office communications are restricted to email correspondence and official memos. The use

This is much better! But the model still does not pick up on the nuance of the conversations though.

**Exercise:** Experiment with the prompt text and see how it influences the generated output. Do the inferences change if you end the prompt with just empty string vs. `Summary: `?

In [None]:
### WRITE YOUR CODE HERE

# Experiment with different prompt endings
def test_prompt_endings(dialogue_text, base_instruction="Summarize the following conversation"):
    """
    Test how different prompt endings affect the output
    """
    # Different ways to end the prompt
    prompt_endings = [
        "",                    # Empty string ending
        ". Summary:",          # Explicit "Summary:" prompt
         ". \nSummary: "
    ]

    results = []

    for i, ending in enumerate(prompt_endings):
        full_prompt = f"{base_instruction}{ending}\n\n{dialogue_text}"

        # Tokenize and generate
        inputs = tokenizer(full_prompt, return_tensors="pt")
        outputs = model.generate(inputs["input_ids"], max_new_tokens=50)
        summary = tokenizer.decode(outputs[0], skip_special_tokens=True)

        results.append({
            'ending': ending if ending else "[empty string]",
            'full_prompt_start': f"{base_instruction}{ending}",
            'summary': summary
        })

    return results

# Test with one example dialogue
test_index = 42
dialogue = dataset['test'][test_index]['dialogue']

print("="*80)
print(f"TESTING PROMPT ENDINGS - Example {test_index}")
print("="*80)
print("DIALOGUE:")
print(dialogue[:300] + "..." if len(dialogue) > 300 else dialogue)
print("\n" + "-"*80)
print("HUMAN SUMMARY:")
print(dataset['test'][test_index]['summary'])
print("\n" + "-"*80)

# Test different endings
results = test_prompt_endings(dialogue)

print("RESULTS WITH DIFFERENT PROMPT ENDINGS:")
print("-"*80)

for i, result in enumerate(results):
    print(f"\n{i+1}. Prompt ending: '{result['ending']}'")
    print(f"   Full prompt starts with: '{result['full_prompt_start']}'")
    print(f"   Generated summary: {result['summary']}")



TESTING PROMPT ENDINGS - Example 42
DIALOGUE:
#Person1#: I don't know how to adjust my life. Would you give me a piece of advice?
#Person2#: You look a bit pale, don't you?
#Person1#: Yes, I can't sleep well every night.
#Person2#: You should get plenty of sleep.
#Person1#: I drink a lot of wine.
#Person2#: If I were you, I wouldn't drink too m...

--------------------------------------------------------------------------------
HUMAN SUMMARY:
#Person1# wants to adjust #Person1#'s life and #Person2# suggests #Person1# be positive and stay healthy.

--------------------------------------------------------------------------------
RESULTS WITH DIFFERENT PROMPT ENDINGS:
--------------------------------------------------------------------------------

1. Prompt ending: '[empty string]'
   Full prompt starts with: 'Summarize the following conversation'
   Generated summary: #Person1 is worried about his future. #Person2 gives him some advice.

2. Prompt ending: '. Summary:'
   Full prompt sta

**Exercise:** Flan-T5 has many prompt templates that are published for certain tasks [here](https://github.com/google-research/FLAN/blob/main/flan/v2/templates.py). Try using its pre-built prompts for dialogue summarization (e.g., the ones under the `"samsum"` key) and see how they influence the outputs.


In [None]:
### WRITE YOUR CODE HERE

# Flan-T5 pre-built prompt templates for dialogue summarization (samsum dataset)
samsum_templates = [
    "{dialogue}\n\nBriefly summarize that dialogue.",
    "Here is a dialogue:\n{dialogue}\n\nWrite a short summary!",
    "Dialogue:\n{dialogue}\n\nWhat is a summary of this dialogue?",
    "{dialogue}\n\nWhat was that dialogue about, in two sentences or less?",
    "Here is a dialogue:\n{dialogue}\n\nWhat were they talking about?",
    "Dialogue:\n{dialogue}\nWhat were the main points in that conversation?",
    "Dialogue:\n{dialogue}\nWhat was going on in that conversation?",
]
"""
Generate summary using Flan-T5's pre-built prompt templates
"""
def generate_summary_with_flan_template(dialogue_text, template):
    max_dialogue_length = 400
    if len(dialogue_text) > max_dialogue_length:
        dialogue_text = dialogue_text[:max_dialogue_length] + "..."

    full_prompt = template.format(dialogue=dialogue_text) # .format(): Find {dialogue} in the template string, Replace it with the value of dialogue_text

    inputs = tokenizer(full_prompt, return_tensors="pt")
    outputs = model.generate(
        inputs["input_ids"],
        max_new_tokens=50
    )
    summary = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return full_prompt, summary

test_indices = [0,42,800]

for idx, test_index in enumerate(test_indices):
    print("="*100)
    print(f"TESTING FLAN-T5 TEMPLATES - EXAMPLE {idx+1} (Index {test_index})")
    print("="*100)

    dialogue = dataset['test'][test_index]['dialogue']
    human_summary = dataset['test'][test_index]['summary']

    print("Original Dialogue:")
    print(dialogue[:400] + "..." if len(dialogue) > 400 else dialogue)
    print("\n" + "-"*80)

    print("Human Summary:")
    print(human_summary)
    print("\n" + "-"*80)

    print("FLAN-T5 TEMPLATE RESULTS:")
    print("-"*80)
    for i, template in enumerate(samsum_templates):
        full_prompt, summary = generate_summary_with_flan_template(dialogue, template)

        print(f"\nTemplate {i+1}:")
        print(f"Prompt format: '{template.replace('{dialogue}', '[DIALOGUE]')}'")
        print(f"Generated summary: {summary}")



TESTING FLAN-T5 TEMPLATES - EXAMPLE 1 (Index 0)
Original Dialogue:
#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to all employees by this afternoon. Are you ready?
#Person2#: Yes, sir. Go ahead.
#Person1#: Attention all staff... Effective immediately, all office communications are restricted to email correspondence and official memos. The use of Instant Message prog...

--------------------------------------------------------------------------------
Human Summary:
Ms. Dawson helps #Person1# to write a memo to inform every employee that they have to change the communication method and should not use Instant Messaging anymore.

--------------------------------------------------------------------------------
FLAN-T5 TEMPLATE RESULTS:
--------------------------------------------------------------------------------

Template 1:
Prompt format: '[DIALOGUE]

Briefly summarize that dialogue.'
Gene

Notice that the prompts from Flan-T5 did help, but the model still struggles to pick up on the nuance of the conversation in some cases. This is what you will try to solve with few-shot inferencing.

### 5. Summarize Dialogues with a Few-Shot Inference

**Few-shot inference** is the practice of providing an LLM with several examples of prompt-response pairs that match your task - before your actual prompt that you want completed. This is called "in-context learning" and puts your model into a state that understands your specific task.

**Exercise:** Build a function that takes a list of `in_context_example_indexes`, generates a prompt with the examples, then at the end appends the prompt that you want the model to complete (`test_example_index`). Use the same Flan-T5 prompt template from Section 3. Make sure to separate between the examples with `"\n\n\n"`.

In [None]:
def make_prompt(in_context_example_indices, test_example_index):
    ### WRITE YOUR CODE HERE

    # Choose a Flan-T5 template from Section 4
    template = "Dialogue:\n{dialogue}\n\nWhat is a summary of this dialogue?"

    # Start building the prompt with examples
    prompt_parts = []

    # Add in-context examples (few-shot examples)
    for example_index in in_context_example_indices:
        # Get the dialogue and human summary for this example
        dialogue = dataset['test'][example_index]['dialogue'][:150]
        summary = dataset['test'][example_index]['summary']

        # Format this example using the template
        example_prompt = template.format(dialogue=dialogue)
        # Add the human summary as the answer
        complete_example = f"{example_prompt}\n{summary}"
        prompt_parts.append(complete_example)

    # Add the test example (the one we want the model to complete)
    test_dialogue = dataset['test'][test_example_index]['dialogue'][:200]

    # For the test example, don't provide the summary, that's what we want the model to generate
    test_prompt = template.format(dialogue=test_dialogue)
    prompt_parts.append(test_prompt)

    prompt = "\n\n\n".join(prompt_parts)



    return prompt

In [None]:
in_context_example_indices = [0, 10, 20]
test_example_index = 800

few_shot_prompt = make_prompt(in_context_example_indices, test_example_index)
print(few_shot_prompt)

Dialogue:
#Person1#: Ms. Dawson, I need you to take a dictation for me.
#Person2#: Yes, sir...
#Person1#: This should go out as an intra-office memorandum to al

What is a summary of this dialogue?
Ms. Dawson helps #Person1# to write a memo to inform every employee that they have to change the communication method and should not use Instant Messaging anymore.


Dialogue:
#Person1#: Happy Birthday, this is for you, Brian.
#Person2#: I'm so happy you remember, please come in and enjoy the party. Everyone's here, I'm sure

What is a summary of this dialogue?
#Person1# attends Brian's birthday party. Brian thinks #Person1# looks great and charming.


Dialogue:
#Person1#: What's wrong with you? Why are you scratching so much?
#Person2#: I feel itchy! I can't stand it anymore! I think I may be coming down with

What is a summary of this dialogue?
#Person1# thinks #Person2# has chicken pox and warns #Person2# about the possible hazards but #Person2# thinks it will be fine.


Dialogue:
#Person

Now pass this prompt to the model perform a few shot inference:

In [None]:
### WRITE YOUR CODE HERE

inputs = tokenizer(few_shot_prompt, return_tensors="pt", truncation=True, max_length=512)
outputs = model.generate(
    inputs["input_ids"],
    max_new_tokens=50
)

generated_summary = tokenizer.decode(outputs[0], skip_special_tokens=True)


print("\n Model Generated Summary:")
print(generated_summary)
print("\nHuman Summary:")
print(dataset['test'][test_example_index]['summary'])




 Model Generated Summary:
Dad talks about his uncle Bill, his wife and two daughters in New Zealand.

Human Summary:
#Person2# tells #Person1# about the relationships between their family and the uncle Bill's, who will visit them next year.


**Exercise:** Experiment with the few-shot inferencing:
- Choose different dialogues - change the indices in the `in_context_example_indices` list and `test_example_index` value.
- Change the number of examples. Be sure to stay within the model's 512 context length, however.

How well does few-shot inference work with other examples?

In [None]:
### WRITE YOUR CODE HERE

def experiment_few_shot_inference():

    # experiment isa list of dictionaries
    experiments = [
        {
            "name": "Different In-Context Examples",
            "in_context_indices":[1,5,15],
            "test_index": 800,
            "description": "Different examples, same test case"
        },
        {
            "name": "Different Test Case",
            "in_context_indices": [0, 10, 20],
            "test_index": 750,
            "description": "Same examples, different test dialogue"
        }
    ]

    for i, exp in enumerate(experiments):
        print(f"\n{i+1}. {exp['name']}")
        print("-" * 60)
        print(f"Description: {exp['description']}")
        print(f"In-context examples: {exp['in_context_indices']}")
        print(f"Test example: {exp['test_index']}")

        prompt = make_prompt(exp['in_context_indices'], exp['test_index'])

        human_summary = dataset['test'][exp['test_index']]['summary']
        test_dialogue = dataset['test'][exp['test_index']]['dialogue']

        print(f"\nTest Dialogue: {test_dialogue[:200]}...")
        print(f"Human Summary: {human_summary}")
        print()

        inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
        outputs = model.generate(
            inputs["input_ids"],
            max_new_tokens=50
        )
        generated_summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
        print("\nGenerated Summary:")
        print(generated_summary)

print("Starting few-shot inference experiments...")
experiment_few_shot_inference()

Starting few-shot inference experiments...

1. Different In-Context Examples
------------------------------------------------------------
Description: Different examples, same test case
In-context examples: [1, 5, 15]
Test example: 800

Test Dialogue: #Person1#: Dad, you keep talking about family in New Zealand. Who are they?
#Person2#: Well, that's your uncle Bill, his wife and two of their daughters.
#Person1#: Is uncle Bill your brother?
#Person...
Human Summary: #Person2# tells #Person1# about the relationships between their family and the uncle Bill's, who will visit them next year.


Generated Summary:
#Person1#'s uncle Bill, his wife and two daughters live in New Zealand.

2. Different Test Case
------------------------------------------------------------
Description: Same examples, different test dialogue
In-context examples: [0, 10, 20]
Test example: 750

Test Dialogue: #Person1#: Good afternoon. I come here specially to pick up my tickets. I booked it last month. This is my r

### 6. Generative Configuration Parameters for Inference

You can change the configuration parameters of the `generate()` method to see a different output from the LLM. So far the only parameter that you have been setting was `max_new_tokens=50`, which defines the maximum number of tokens to generate. A convenient way of organizing the configuration parameters is to use `GenerationConfig` class. By setting the parameter `do_sample = True`, you can activate various decoding strategies which influence the next token from the probability distribution over the entire vocabulary. You can then adjust the outputs changing `temperature` and other parameters (such as `top_k` and `top_p`). A full list of available parameters can be found in the [Hugging Face Generation documentation](https://huggingface.co/docs/transformers/v4.29.1/en/main_classes/text_generation#transformers.GenerationConfig).

**Exercise:** Change the configuration parameters to investigate their influence on the output. Analyze your results.

In [None]:
### WRITE YOUR CODE HERE

from transformers import GenerationConfig
def experiment_with_config_class():
    test_index = 42
    dialogue = dataset['test'][test_index]['dialogue']
    human_summary = dataset['test'][test_index]['summary']

    # use zero-shot inference to see how only the generation parameters affect output
    prompt = f"Dialogue:\n{dialogue}\n\nWhat is a summary of this dialogue?"
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)

    print(f"Test Dialogue: {dialogue[:500]}...")
    print(f"Human Summary: {human_summary}")
    print()

    generation_configs = [
        {
            "name": "Config 1",
            "config": GenerationConfig(
                max_new_tokens=50,
                do_sample=True,
                temperature=0.3,
                top_p=0.95,
                pad_token_id=tokenizer.eos_token_id,
                repetition_penalty=1.1  # Reduce repetition
            )
        },
        {
            "name": "Config 2",
            "config": GenerationConfig(
                max_new_tokens=50,
                do_sample=True,
                temperature=0.7,
                top_p=0.9,
                pad_token_id=tokenizer.eos_token_id,
                repetition_penalty=1.05
            )
        },
        {
            "name": "Config 3",
            "config": GenerationConfig(
                max_new_tokens=50,
                do_sample=True,
                temperature=0.9,
                top_k=100,
                top_p=0.95,
                pad_token_id=tokenizer.eos_token_id,
                repetition_penalty=1.0
            )
        }

    ]

    for i, gen_config in enumerate(generation_configs):
        print(f"\n{i+1}. {gen_config['name']}")

        outputs = model.generate(
            inputs["input_ids"],
            generation_config=gen_config["config"]
            )

        generated_summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
        print(f"Generated Summary: {generated_summary}")

print("\nTest GenerationConfig Class")
experiment_with_config_class()



Test GenerationConfig Class
Test Dialogue: #Person1#: I don't know how to adjust my life. Would you give me a piece of advice?
#Person2#: You look a bit pale, don't you?
#Person1#: Yes, I can't sleep well every night.
#Person2#: You should get plenty of sleep.
#Person1#: I drink a lot of wine.
#Person2#: If I were you, I wouldn't drink too much.
#Person1#: I often feel so tired.
#Person2#: You better do some exercise every morning.
#Person1#: I sometimes find the shadow of death in front of me.
#Person2#: Why do you worry about your futu...
Human Summary: #Person1# wants to adjust #Person1#'s life and #Person2# suggests #Person1# be positive and stay healthy.


1. Config 1
Generated Summary: Person1 is worried about his future. He doesn't know how to adjust his life.

2. Config 2
Generated Summary: Person1 is worried about his future. He should get plenty of sleep and drink less wine.

3. Config 3
Generated Summary: Person1 wants Person2's advice on how to adjust her life.


### Analyze
Config 1 (Low temp, high top_p):
Accurate about Person1's worry and life adjustment need
Misses Person2's advice completely, too focused on problems

Config 2 (Medium temp, balanced sampling):
Captures both worry AND specific health advice (sleep, drinking)
Includes both Person1's problem and Person2's solutions

Config 3 (High temp, diverse sampling):
Correctly identifies advice-seeking relationship
Gender error ("her" vs "his"), misses specific health details
