<a href="https://colab.research.google.com/github/withpi/cookbook-withpi/blob/main/colabs/Dataset_Import.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://withpi.ai"><img src="https://withpi.ai/logoFullBlack.svg" width="240"></a>

<a href="https://code.withpi.ai"><font size="4">Documentation</font></a>

<a href="https://play.withpi.ai"><font size="4">Technique Catalog</font></a>

# WithPi Input Generation

This colab assumes that you already went through [Contract Creation](https://colab.research.google.com/github/withpi/cookbook-withpi/blob/main/colabs/Contract_Creation.ipynb), and now you wish to build a small input set.

We will walk through the same `Aesop AI` example, but you can load any contract here. Let's dig in!

This should take about **15 minutes**, even if you're unfamiliar with Colab.

## Install and initialize SDK

Connect to a regular CPU Python 3 runtime.  You won't need GPUs for this notebook.

You'll need a WITHPI_API_KEY from https://play.withpi.ai.  Add it to your notebook secrets (the key symbol) on the left.

Run the cell below to install packages and load the SDK

In [None]:
%%capture

%pip install withpi litellm

import os
from google.colab import userdata
from litellm import completion
from withpi import PiClient

os.environ["WITHPI_API_KEY"] = userdata.get('WITHPI_API_KEY')

client = PiClient()

def print_contract(contract):
  for dimension in contract.dimensions:
    print(dimension.label)
  for sub_dimension in dimension.sub_dimensions:
    print(f"\t{sub_dimension.description}")

def generate(system: str, user: str, model: str) -> str:
  messages = [
    {
      "content": system,
      "role": "system"
    },
    {
      "content": prompt,
      "role": "user"
    }
  ]
  return completion(model=model,
                    messages=messages).choices[0].message.content

class printer(str):
  def __repr__(self):
    return self
def prettyprint(response: str):
  display(printer(response))

def print_scores(pi_scores):
  for dimension_name, dimension_scores in pi_scores.dimension_scores.items():
    print(f"{dimension_name}: {dimension_scores.total_score}")
    for subdimension_name, subdimension_score in dimension_scores.subdimension_scores.items():
      print(f"\t{subdimension_name}: {subdimension_score}")
    print("\n")
  print("---------------------")
  print(f"Total score: {pi_scores.total_score}")

# Load contract

Load the `Aesop AI` example from Pi Labs cookbooks, or edit below to load a different one.


In [None]:
import httpx
from withpi.types import Contract

resp = httpx.get("https://raw.githubusercontent.com/withpi/cookbook-withpi/refs/heads/main/contracts/aesop_ai.json")

aesop_contract = Contract.model_validate_json(resp.content)

for dimension in aesop_contract.dimensions:
  print(dimension.label)
  for sub_dimension in dimension.sub_dimensions:
    print(f"\t{sub_dimension.description}")

## Generate an Input Set

Given this structured description, let's build a Dataset containing a bunch of plausible moral lessons that could be used to exercise the contract.  This will take about 30 seconds to generate.

In [None]:
import pandas as pd
from google.colab import data_table

data_generation_status = client.data.inputs.generate_seeds(
    contract_description=aesop_contract.description,
    num_inputs=10,
)

data_table.enable_dataframe_formatter()
df = pd.DataFrame({"input": data_generation_status.data})
df

# Augment with responses

Now run inference with your favorite LLM to generate responses to the default prompt.  We will optimize this later.

The below cell uses LiteLLM.  You can get a Gemini key from the left side for free to try it out.  You may need a small amount of money in your account to run to completion.

In [None]:
from concurrent.futures import ThreadPoolExecutor
from tqdm.notebook import tqdm
from litellm import completion
import os
from google.colab import userdata

os.environ["GEMINI_API_KEY"] = userdata.get('GOOGLE_API_KEY')

def generate(prompt: str, pbar) -> str:
  messages = [
      {
          "content": aesop_contract.description,
          "role": "system"
      },
      {
          "content": prompt,
          "role": "user"
      },
  ]
  result = completion(model="gemini/gemini-1.5-flash-8b-latest",
                      messages=messages).choices[0].message.content
  pbar.update(1)
  return result

def do_inference():
  futures = []
  pbar = tqdm(total=len(df))
  with ThreadPoolExecutor(max_workers=4) as executor:
    for index, row in df.iterrows():
      futures.append(executor.submit(generate, row["input"], pbar))
  return [future.result() for future in futures]

df["output"] = do_inference()
df

## Save the set

We will come back to this in a future colab, so it's useful to capture.  Store it as a Parquet table, which you can download.

In [None]:
from google.colab import files

filename = "aesop_ai_examples.parquet"
df.to_parquet(filename)
files.download(filename)

## Next Steps

Now you can try optimizing your prompt with the [Prompt Optimization](https://colab.research.google.com/github/withpi/cookbook-withpi/blob/main/colabs/Prompt_Optimization.ipynb) colab to put this to work.