<a href="https://colab.research.google.com/github/withpi/cookbook-withpi/blob/main/colabs/Synthetic_Data_Generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://withpi.ai"><img src="https://withpi.ai/logoFullBlack.svg" width="240"></a>

<a href="https://code.withpi.ai"><font size="4">Documentation</font></a>

<a href="https://build.withpi.ai"><font size="4">Copilot</font></a>

# Synthetic Data Generation

Many techniques require input and LLM response to drives evaluation and training, but getting high-quality data can be painful and expensive.

Generating this data with AI support can give you a higher quality set with much lower effort.  And it can be done with the same ScoringSpec that drives other techniques in Pi!

We will walk through the same `Aesop AI` example, but you can load any application you care to.

## Install and initialize SDK

You'll need a `WITHPI_API_KEY` from https://build.withpi.ai/account.  Add it to your notebook secrets (the key symbol) on the left.

Run the cell below to install packages and load the SDK

In [None]:
%%capture

%pip install withpi withpi-utils datasets tqdm litellm pandas numpy

import os
from google.colab import userdata
from withpi import PiClient

# Load the notebook secret into the environment so the Pi Client can access it.
os.environ["WITHPI_API_KEY"] = userdata.get('WITHPI_API_KEY')

pi = PiClient()


# Load a Scoring Spec

Load the `Aesop AI` example from Pi Labs cookbooks, or edit below to load a different one.


In [None]:
from withpi_utils.colab import load_scoring_spec_from_web, display_scoring_spec

aesop_scoring_spec = load_scoring_spec_from_web(
    "https://raw.githubusercontent.com/withpi/cookbook-withpi/refs/heads/main/scoring_specs/aesop_ai.json"
)

display_scoring_spec(aesop_scoring_spec)

## Generate an Example Set

Given this structured description, let's build a Dataset containing a bunch of plausible moral lessons that could be used to exercise the ScoringSpec.  This will take about 50 seconds to generate.

In [None]:
synthetic_data_generation_status = pi.data.generate_input_response_pairs.start_job(
    system_prompt="""
Write a children's story in the style of Aesop's Fables teaching a life lesson
specified by the user. Provide just the story with no extra content.
""",
    num_pairs_to_generate=12,
    seeds=[],
    batch_size=3,
    num_shots=3,
)

## Stream Results

The stream utility will yield data as it is generated, while printing status messages. The below snippet will intersperse the two.

In [None]:
from withpi_utils.jobs import stream

for data in stream(pi.data.generate_input_response_pairs, synthetic_data_generation_status):
    print(f"[OUTPUT] - {data}")

LAUNCHING
RUNNING
[INFO] Generating 10 seeds as they are not provided.
[INFO] Yielding generated 10 seeds
[OUTPUT] - {'llm_input': 'Write a fable about the importance of teamwork.', 'llm_output': 'Once upon a time in a sunlit meadow, a proud little turtle named Tilly decided she wanted to climb the tallest hill. "I can do it all by myself!" she exclaimed, puffing out her chest. The other animals watched her with concern. \n\n"Stay safe, Tilly," called Benny the rabbit. "It’s a long way up, and it’s steep! Why not ask for help?" \n\nBut Tilly shook her head. "I don’t need anyone! I’ll show you all how strong I am!" With that, she started her grueling climb, slowly but surely.\n\nAs she made her way up, Tilly quickly became tired. The rocks were slippery, and the path was steep. After an hour of struggling, she found herself stuck on a ledge, feeling weary and defeated.\n\nJust then, her friends—Benny the rabbit, Lila the squirrel, and Max the wise old owl—arrived. "We can help you!" Lil

## Take a look at the generated examples

Take a look at the returned examples (inputs + outputs)

In [None]:
synthetic_data_generation_status = pi.data.generate_input_response_pairs.retrieve(
    job_id=synthetic_data_generation_status.job_id
)

if synthetic_data_generation_status.state not in ["ERROR", "DONE"]:
    print("Please wait for the job to finish and then run this cell again...")
else:
    if synthetic_data_generation_status.state == "DONE":
        print("Printing all the generated examples below...\n")
        assert synthetic_data_generation_status.data is not None
        for example in synthetic_data_generation_status.data:
            print(example)
    else:
        print("Job ended in error")

Printing all the generated examples below...

Example(llm_input='Write a fable about the importance of teamwork.', llm_output='Once upon a time in a sunlit meadow, a proud little turtle named Tilly decided she wanted to climb the tallest hill. "I can do it all by myself!" she exclaimed, puffing out her chest. The other animals watched her with concern. \n\n"Stay safe, Tilly," called Benny the rabbit. "It’s a long way up, and it’s steep! Why not ask for help?" \n\nBut Tilly shook her head. "I don’t need anyone! I’ll show you all how strong I am!" With that, she started her grueling climb, slowly but surely.\n\nAs she made her way up, Tilly quickly became tired. The rocks were slippery, and the path was steep. After an hour of struggling, she found herself stuck on a ledge, feeling weary and defeated.\n\nJust then, her friends—Benny the rabbit, Lila the squirrel, and Max the wise old owl—arrived. "We can help you!" Lila said eagerly. Tilly frowned. “I wanted to do this alone!”\n\n“Someti

In [None]:
# @title Let's Score and manually inspect the data
from withpi_utils.colab import pretty_print_responses

for example in synthetic_data_generation_status.data:
    score = pi.scoring_system.score(
        llm_input=example.llm_input,
        llm_output=example.llm_output,
        scoring_spec=aesop_scoring_spec,
    )

    pretty_print_responses(
        header="#### Input:\n" + example.llm_input,
        response1="#### Output:\n" + example.llm_output,
        left_label="Pi Synthetic Data",
        scores_left=score,
    )
    print("\n\n")

0,1,2
"Does the story have a clear beginning, middle, and end?",1.0,
Is there a conflict introduced early in the story that drives the plot?,0.711,
Is the resolution of the conflict clear and satisfying?,1.0,
Does the story include characters that are relatable for children?,0.824,
Do the characters demonstrate growth or change by the end of the story?,1.0,
Is the dialogue between characters natural and age-appropriate?,1.0,
What makes a story engaging for children?,0.875,
Is the story engaging and likely to hold a child's interest?,0.781,
Does the story use vivid imagery to help children visualize the scenes?,0.656,
Does the story incorporate repetitive elements that aid in comprehension and retention?,0.766,







0,1,2
"Does the story have a clear beginning, middle, and end?",1.0,
Is there a conflict introduced early in the story that drives the plot?,1.0,
Is the resolution of the conflict clear and satisfying?,0.859,
Does the story include characters that are relatable for children?,0.73,
Do the characters demonstrate growth or change by the end of the story?,0.984,
Is the dialogue between characters natural and age-appropriate?,0.754,
What makes a story engaging for children?,0.828,
Is the story engaging and likely to hold a child's interest?,0.773,
Does the story use vivid imagery to help children visualize the scenes?,0.688,
Does the story incorporate repetitive elements that aid in comprehension and retention?,0.586,







0,1,2
"Does the story have a clear beginning, middle, and end?",1.0,
Is there a conflict introduced early in the story that drives the plot?,0.656,
Is the resolution of the conflict clear and satisfying?,0.973,
Does the story include characters that are relatable for children?,0.91,
Do the characters demonstrate growth or change by the end of the story?,0.805,
Is the dialogue between characters natural and age-appropriate?,0.875,
What makes a story engaging for children?,0.797,
Is the story engaging and likely to hold a child's interest?,0.738,
Does the story use vivid imagery to help children visualize the scenes?,0.609,
Does the story incorporate repetitive elements that aid in comprehension and retention?,0.688,







0,1,2
"Does the story have a clear beginning, middle, and end?",1.0,
Is there a conflict introduced early in the story that drives the plot?,0.914,
Is the resolution of the conflict clear and satisfying?,0.93,
Does the story include characters that are relatable for children?,0.746,
Do the characters demonstrate growth or change by the end of the story?,1.0,
Is the dialogue between characters natural and age-appropriate?,0.902,
What makes a story engaging for children?,0.82,
Is the story engaging and likely to hold a child's interest?,0.773,
Does the story use vivid imagery to help children visualize the scenes?,0.746,
Does the story incorporate repetitive elements that aid in comprehension and retention?,0.77,







0,1,2
"Does the story have a clear beginning, middle, and end?",1.0,
Is there a conflict introduced early in the story that drives the plot?,1.0,
Is the resolution of the conflict clear and satisfying?,0.824,
Does the story include characters that are relatable for children?,0.699,
Do the characters demonstrate growth or change by the end of the story?,0.906,
Is the dialogue between characters natural and age-appropriate?,0.789,
What makes a story engaging for children?,0.781,
Is the story engaging and likely to hold a child's interest?,0.773,
Does the story use vivid imagery to help children visualize the scenes?,0.734,
Does the story incorporate repetitive elements that aid in comprehension and retention?,0.738,







0,1,2
"Does the story have a clear beginning, middle, and end?",1.0,
Is there a conflict introduced early in the story that drives the plot?,1.0,
Is the resolution of the conflict clear and satisfying?,1.0,
Does the story include characters that are relatable for children?,0.938,
Do the characters demonstrate growth or change by the end of the story?,1.0,
Is the dialogue between characters natural and age-appropriate?,0.883,
What makes a story engaging for children?,0.984,
Is the story engaging and likely to hold a child's interest?,0.969,
Does the story use vivid imagery to help children visualize the scenes?,0.777,
Does the story incorporate repetitive elements that aid in comprehension and retention?,0.984,







0,1,2
"Does the story have a clear beginning, middle, and end?",1.0,
Is there a conflict introduced early in the story that drives the plot?,0.539,
Is the resolution of the conflict clear and satisfying?,0.91,
Does the story include characters that are relatable for children?,0.945,
Do the characters demonstrate growth or change by the end of the story?,0.723,
Is the dialogue between characters natural and age-appropriate?,0.926,
What makes a story engaging for children?,0.867,
Is the story engaging and likely to hold a child's interest?,0.805,
Does the story use vivid imagery to help children visualize the scenes?,0.715,
Does the story incorporate repetitive elements that aid in comprehension and retention?,0.734,







0,1,2
"Does the story have a clear beginning, middle, and end?",1.0,
Is there a conflict introduced early in the story that drives the plot?,0.75,
Is the resolution of the conflict clear and satisfying?,0.797,
Does the story include characters that are relatable for children?,0.93,
Do the characters demonstrate growth or change by the end of the story?,0.746,
Is the dialogue between characters natural and age-appropriate?,0.969,
What makes a story engaging for children?,0.953,
Is the story engaging and likely to hold a child's interest?,0.98,
Does the story use vivid imagery to help children visualize the scenes?,0.738,
Does the story incorporate repetitive elements that aid in comprehension and retention?,0.734,







0,1,2
"Does the story have a clear beginning, middle, and end?",1.0,
Is there a conflict introduced early in the story that drives the plot?,0.891,
Is the resolution of the conflict clear and satisfying?,0.941,
Does the story include characters that are relatable for children?,0.75,
Do the characters demonstrate growth or change by the end of the story?,1.0,
Is the dialogue between characters natural and age-appropriate?,0.785,
What makes a story engaging for children?,0.824,
Is the story engaging and likely to hold a child's interest?,0.762,
Does the story use vivid imagery to help children visualize the scenes?,0.715,
Does the story incorporate repetitive elements that aid in comprehension and retention?,0.766,







0,1,2
"Does the story have a clear beginning, middle, and end?",1.0,
Is there a conflict introduced early in the story that drives the plot?,1.0,
Is the resolution of the conflict clear and satisfying?,0.953,
Does the story include characters that are relatable for children?,1.0,
Do the characters demonstrate growth or change by the end of the story?,1.0,
Is the dialogue between characters natural and age-appropriate?,1.0,
What makes a story engaging for children?,1.0,
Is the story engaging and likely to hold a child's interest?,1.0,
Does the story use vivid imagery to help children visualize the scenes?,0.758,
Does the story incorporate repetitive elements that aid in comprehension and retention?,0.832,







0,1,2
"Does the story have a clear beginning, middle, and end?",1.0,
Is there a conflict introduced early in the story that drives the plot?,0.621,
Is the resolution of the conflict clear and satisfying?,0.922,
Does the story include characters that are relatable for children?,1.0,
Do the characters demonstrate growth or change by the end of the story?,1.0,
Is the dialogue between characters natural and age-appropriate?,1.0,
What makes a story engaging for children?,1.0,
Is the story engaging and likely to hold a child's interest?,1.0,
Does the story use vivid imagery to help children visualize the scenes?,0.93,
Does the story incorporate repetitive elements that aid in comprehension and retention?,1.0,







0,1,2
"Does the story have a clear beginning, middle, and end?",1.0,
Is there a conflict introduced early in the story that drives the plot?,0.863,
Is the resolution of the conflict clear and satisfying?,0.973,
Does the story include characters that are relatable for children?,0.766,
Do the characters demonstrate growth or change by the end of the story?,1.0,
Is the dialogue between characters natural and age-appropriate?,0.984,
What makes a story engaging for children?,0.879,
Is the story engaging and likely to hold a child's interest?,0.785,
Does the story use vivid imagery to help children visualize the scenes?,0.762,
Does the story incorporate repetitive elements that aid in comprehension and retention?,0.785,







0,1,2
"Does the story have a clear beginning, middle, and end?",1.0,
Is there a conflict introduced early in the story that drives the plot?,0.758,
Is the resolution of the conflict clear and satisfying?,1.0,
Does the story include characters that are relatable for children?,0.777,
Do the characters demonstrate growth or change by the end of the story?,0.988,
Is the dialogue between characters natural and age-appropriate?,0.996,
What makes a story engaging for children?,0.863,
Is the story engaging and likely to hold a child's interest?,0.809,
Does the story use vivid imagery to help children visualize the scenes?,0.734,
Does the story incorporate repetitive elements that aid in comprehension and retention?,0.746,







## Next Steps

This input set can drive many other techniques in Pi.  You can adjust the above methods to add seeds and steer the AI in different ways.