<a href="https://colab.research.google.com/github/withpi/cookbook-withpi/blob/main/colabs/Input_Generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://withpi.ai"><img src="https://play.withpi.ai/logo/logoFullBlack.svg" width="240"></a>

<a href="https://code.withpi.ai"><font size="4">Documentation</font></a>

<a href="https://play.withpi.ai"><font size="4">Technique Catalog</font></a>

# Input Generation

There is no Playground associated with this Colab, but it's coming soon!

Many techniques require input data to drives evaluation and training, but getting high-quality data can be painful and expensive.

Generating this data with AI support can give you a higher quality set with much lower effort.  And it can be done with the same Contract that drives other techniques in Pi!

We will walk through the same `Aesop AI` example, but you can load any contract here. Let's dig in!

## Install and initialize SDK

Connect to a regular CPU Python 3 runtime.  You won't need GPUs for this notebook.

You'll need a WITHPI_API_KEY from https://play.withpi.ai.  Add it to your notebook secrets (the key symbol) on the left.

Run the cell below to install packages and load the SDK

In [2]:
%%capture

%pip install withpi withpi-utils datasets tqdm litellm

import os
from google.colab import userdata
from withpi import PiClient

# Load the notebook secret into the environment so the Pi Client can access it.
os.environ["WITHPI_API_KEY"] = userdata.get('WITHPI_API_KEY')

client = PiClient()

# Load Scoring Spec

Load the `Aesop AI` example from Pi Labs cookbooks, or edit below to load a different one.


In [3]:
from withpi_utils.colab import load_scoring_spec_from_web, display_scoring_spec

aesop_scoring_spec = load_scoring_spec_from_web(
    "https://raw.githubusercontent.com/withpi/cookbook-withpi/refs/heads/main/scoring_specs/aesop_ai.json"
)

display_scoring_spec(aesop_scoring_spec)

## Generate an Input Set

Given this structured description, let's build a Dataset containing a bunch of plausible moral lessons that could be used to exercise the contract.  This will take about 30 seconds to generate.

In [4]:
data_generation_status = client.data.generate.start_job(
    application_description=aesop_scoring_spec.description,
    num_inputs_to_generate=10,
    seeds=[],
    batch_size=3,
    num_shots=3,
)

## Stream the messages as the inputs are generated

The messages provide detail about what is being done to generate the data.

In [8]:
from withpi_utils.colab import stream_response

data_generation_status = stream_response(
    data_generation_status.job_id, client.data.generate
)

Detailed Status for generation_jobs:09ac227c912792130a876568ea872593308c0d4b3d7c896ec7991f041cbeedd8:a695bf41-d103-4fa9-9f92-91ec94af6c9c
LAUNCHING
RUNNING
[INFO] Generating 3 seeds as they are not provided.
[INFO] Progress=> Good: 0/10 Bad: 0 Similar: 0
[INFO] Generated themes: ['Honesty and Trust', 'Kindness and Empathy', 'Responsibility and Duty', 'Courage and Bravery', 'Perseverance and Hard Work', 'Patience and Persistence', 'Teamwork and Collaboration', 'Respect and Politeness', 'Self-Control and Discipline', 'Gratitude and Appreciation']
[INFO] Using selected theme: 'Self-Control and Discipline' for this batch of generation
[INFO] Generation LLM temperature fixed or updated to 1.0
[INFO] Data Generation Ongoing => Good Inputs: 3/10. Bad Inputs: 0. Similar Inputs: 0
[INFO] Progress=> Good: 3/10 Bad: 0 Similar: 0
[INFO] Using selected theme: 'Self-Control and Discipline' for this batch of generation
[INFO] Generation LLM temperature fixed or updated to 1.0
[INFO] Data Generation O

One can also stream the inputs instead as shown in the cell below

In [9]:
from withpi_utils.colab import stream_data

data_generation_status = stream_data(
    data_generation_status.job_id, client.data.generate
)


Tell a fable about a clever fox who learns that lying can lead to trouble.
Create a story that shows how sharing can build trust among friends using animal characters.
Generate a tale that illustrates the importance of keeping promises, featuring a loyal dog and a forgetful turtle.
Create a fable about a young turtle who learns that patience can lead to great rewards in life.
Tell a story of a squirrel who discovers that resisting temptation leads to greater success in gathering food for winter.
Write a fable about a stubborn donkey who learns that self-control is essential to avoid danger in the forest.
Create a fable that illustrates how teamwork can help a group of animals achieve their goal of reaching a distant berry bush.
Write a story about two birds who learn that collaborating together makes it easier to build a nest.
Generate a tale featuring a clever fox and a slow turtle who find that working together can overcome their individual limitations.
Tell a fable about a clever fo

## Take a look at the returned data

Take a look at the returned inputs

In [10]:
# Print all the data now that the job is complete.
if data_generation_status.state not in ["ERROR", "DONE"]:
  print("Please wait for the job to finish and then run this cell again...")
else:
    if data_generation_status.state == "DONE":
        print("Printing all the generated inputs below...")
        assert data_generation_status.data is not None
        for input in data_generation_status.data:
            print(input)
    else:
        print("Job ended in error")

Printing all the generated inputs below...
Tell a fable about a clever fox who learns that lying can lead to trouble.
Create a story that shows how sharing can build trust among friends using animal characters.
Generate a tale that illustrates the importance of keeping promises, featuring a loyal dog and a forgetful turtle.
Create a fable about a young turtle who learns that patience can lead to great rewards in life.
Tell a story of a squirrel who discovers that resisting temptation leads to greater success in gathering food for winter.
Write a fable about a stubborn donkey who learns that self-control is essential to avoid danger in the forest.
Create a fable that illustrates how teamwork can help a group of animals achieve their goal of reaching a distant berry bush.
Write a story about two birds who learn that collaborating together makes it easier to build a nest.
Generate a tale featuring a clever fox and a slow turtle who find that working together can overcome their individual 

# Augment with responses

Now run inference with your favorite LLM to generate responses to the default prompt.  We will optimize this later.

The below cell uses LiteLLM.  You can get a Gemini key from the left side for free to try it out.  You may need a small amount of money in your account to run to completion.

In [21]:
# @title Generate responses
import litellm
from tqdm import tqdm

os.environ["GEMINI_API_KEY"] = userdata.get('GOOGLE_API_KEY')

def generate(system: str, user: str, model: str) -> str:
    """generate passes the provided system and user prompts into the given model
    via LiteLLM"""
    messages = [
        {"content": system, "role": "system"},
        {"content": user, "role": "user"},
    ]
    return litellm.completion(model=model, messages=messages).choices[0].message.content

responses = []
for input in tqdm(data_generation_status.data):
  responses.append(generate(
        system=aesop_scoring_spec.description,
        user=input,
        model="gemini/gemini-1.5-flash-8b",
    ))

100%|██████████| 11/11 [00:25<00:00,  2.33s/it]


In [19]:
responses[0]

'B'

In [22]:
# @title Let's Score and manually inspect the data
from withpi_utils.colab import pretty_print_responses

for i in range(len(responses)):
    score = client.scoring_system.score(
        llm_input=data_generation_status.data[i],
        llm_output=responses[i],
        scoring_spec=aesop_scoring_spec,
    )

    pretty_print_responses(
        header="#### Input:\n" + data_generation_status.data[i],
        response1="#### Output:\n" + responses[i],
        left_label="gemini/gemini-1.5-flash-8b",
        scores_left=score,
    )
    print("\n\n")

0,1,2
Story Structure,,0.871
,Plot Structure,1.0
,Conflict Introduction,0.852
,Resolution Clarity,0.762
Character Development,,0.842
,Character Presence,0.754
,Character Development,0.773
,Dialogue Quality,1.0
Narrative Engagement,,0.781
,Engaging Narrative,0.781







0,1,2
Story Structure,,1.0
,Plot Structure,1.0
,Conflict Introduction,1.0
,Resolution Clarity,1.0
Character Development,,1.0
,Character Presence,1.0
,Character Development,1.0
,Dialogue Quality,1.0
Narrative Engagement,,0.914
,Engaging Narrative,1.0







0,1,2
Story Structure,,0.913
,Plot Structure,1.0
,Conflict Introduction,0.758
,Resolution Clarity,0.98
Character Development,,1.0
,Character Presence,1.0
,Character Development,1.0
,Dialogue Quality,1.0
Narrative Engagement,,0.75
,Engaging Narrative,0.746







0,1,2
Story Structure,,1.0
,Plot Structure,1.0
,Conflict Introduction,1.0
,Resolution Clarity,1.0
Character Development,,0.944
,Character Presence,0.832
,Character Development,1.0
,Dialogue Quality,1.0
Narrative Engagement,,1.0
,Engaging Narrative,1.0







0,1,2
Story Structure,,1.0
,Plot Structure,1.0
,Conflict Introduction,1.0
,Resolution Clarity,1.0
Character Development,,1.0
,Character Presence,1.0
,Character Development,1.0
,Dialogue Quality,1.0
Narrative Engagement,,0.936
,Engaging Narrative,1.0







0,1,2
Story Structure,,0.928
,Plot Structure,1.0
,Conflict Introduction,1.0
,Resolution Clarity,0.785
Character Development,,0.927
,Character Presence,0.781
,Character Development,1.0
,Dialogue Quality,1.0
Narrative Engagement,,0.826
,Engaging Narrative,0.934







0,1,2
Story Structure,,0.85
,Plot Structure,1.0
,Conflict Introduction,0.781
,Resolution Clarity,0.77
Character Development,,0.77
,Character Presence,0.762
,Character Development,0.766
,Dialogue Quality,0.781
Narrative Engagement,,0.757
,Engaging Narrative,0.754







0,1,2
Story Structure,,1.0
,Plot Structure,1.0
,Conflict Introduction,1.0
,Resolution Clarity,1.0
Character Development,,0.922
,Character Presence,0.766
,Character Development,1.0
,Dialogue Quality,1.0
Narrative Engagement,,0.861
,Engaging Narrative,0.824







0,1,2
Story Structure,,0.923
,Plot Structure,1.0
,Conflict Introduction,1.0
,Resolution Clarity,0.77
Character Development,,0.904
,Character Presence,0.949
,Character Development,0.762
,Dialogue Quality,1.0
Narrative Engagement,,0.915
,Engaging Narrative,1.0







0,1,2
Story Structure,,1.0
,Plot Structure,1.0
,Conflict Introduction,1.0
,Resolution Clarity,1.0
Character Development,,1.0
,Character Presence,1.0
,Character Development,1.0
,Dialogue Quality,1.0
Narrative Engagement,,0.858
,Engaging Narrative,1.0







0,1,2
Story Structure,,1.0
,Plot Structure,1.0
,Conflict Introduction,1.0
,Resolution Clarity,1.0
Character Development,,1.0
,Character Presence,1.0
,Character Development,1.0
,Dialogue Quality,1.0
Narrative Engagement,,1.0
,Engaging Narrative,1.0







## Save the set

We will come back to this in a future colab, so it's useful to capture.  Store it as a Parquet table, which you can download.

Alternatively, upload to Hugging Face.

In [25]:
from datasets import Dataset

dataset = Dataset.from_dict({
    "input": data_generation_status.data,
    "response": responses
})

print(dataset)
# dataset.push_to_hub("...")

Dataset({
    features: ['input', 'response'],
    num_rows: 11
})


## Next Steps

This input set can drive many other techniques in Pi.  You can adjust the above methods to add seeds and steer the AI in different ways.