<a href="https://colab.research.google.com/github/withpi/cookbook-withpi/blob/main/colabs/Prompt_Optimization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://withpi.ai"><img src="https://withpi.ai/logoFullBlack.svg" width="240"></a>

<a href="https://code.withpi.ai"><font size="4">Documentation</font></a>

<a href="https://play.withpi.ai"><font size="4">Technique Catalog</font></a>

# Prompt Optimization with Scorer

This Colab is the companion to the "Prompt optimization with scorers and diff view" Playground, which introduces the core concept of Pi, the **Contract**.

A **Contract** is a **human and machine readable** description of what **goodness** means to you and is the cornerstone of our approach because it lets you measure improvements mechanically, while still being explainable.  See [Key Concepts](https://code.withpi.ai/key-concepts) for more information.

This colab will walk you through generating a **Contract**, scoring some responses with it, and tinkering with your application description to improve it.

## Install and initialize SDK

Connect to a regular CPU Python 3 runtime.  You won't need GPUs for this notebook.

You'll need a WITHPI_API_KEY from https://play.withpi.ai.  Add it to your notebook secrets (the key symbol) on the left.

Run the cell below to install packages and load the SDK

In [None]:
%%capture

%pip install withpi litellm

import os
from google.colab import userdata
from litellm import completion
from withpi import PiClient

os.environ["WITHPI_API_KEY"] = userdata.get('WITHPI_API_KEY')

client = PiClient()

def print_contract(contract):
  for dimension in contract.dimensions:
    print(dimension.label)
  for sub_dimension in dimension.sub_dimensions:
    print(f"\t{sub_dimension.description}")

def generate(system: str, user: str, model: str) -> str:
  messages = [
    {
      "content": system,
      "role": "system"
    },
    {
      "content": prompt,
      "role": "user"
    }
  ]
  return completion(model=model,
                    messages=messages).choices[0].message.content

class printer(str):
  def __repr__(self):
    return self
def prettyprint(response: str):
  display(printer(response))

def print_scores(pi_scores):
  for dimension_name, dimension_scores in pi_scores.dimension_scores.items():
    print(f"{dimension_name}: {dimension_scores.total_score}")
    for subdimension_name, subdimension_score in dimension_scores.subdimension_scores.items():
      print(f"\t{subdimension_name}: {subdimension_score}")
    print("\n")
  print("---------------------")
  print(f"Total score: {pi_scores.total_score}")

# Make a contract

Let's say you want to build an application that generates children's stories teaching a life lesson.  Call it `AesopAI`.

Start by creating a first cut contract based on that general input, proposed in the following cell:


In [None]:
aesop_contract = client.contracts.generate_dimensions(
    contract_description=(
        "Write a children's story in the style of Aesop's Fables "
        "teaching a life lesson specified by the user. Provide just the "
        "story with no extra content."
    ),
)

print_contract(aesop_contract)

A contract is essentially a hierarchical rubric for grading a response.  A bunch of "simple" questions add up to broader categories, which yield a final score.  Output will vary somewhat, but the table above should have reasonable grading questions for the application.

## Score the contract

Let's see how it performs! The below cell uses Gemini to generate a response, but any suitable model will work fine.

Adjust to pick a different model and supply your own key with docs at https://docs.litellm.ai/docs/.

You can import a Google Gemini key from AI Studio on the left pane, which populates a GOOGLE_API_KEY secret.  At low rates it's free.

In [None]:
import os
from google.colab import userdata

os.environ["GEMINI_API_KEY"] = userdata.get('GOOGLE_API_KEY')

prompt = "The importance of sharing"
response = generate(system=aesop_contract.description, user=prompt, model="gemini/gemini-1.5-flash-8b")

## Score it!

Take the generated response and see how it scores with Pi.

The below cell will run Pi Scoring, evaluating each dimension in the contract, offering a score from 1 (excellent!) to 0 (terrible!).  The current contract is **uncalibrated**, meaning that all the dimensions are equally important, but it's a starting point for learning which are **actually** imporant based on your preferences.

In [None]:
pi_scores = client.contracts.score(
    contract=aesop_contract,
    llm_input=prompt,
    llm_output=response,
)

print_scores(pi_scores)

## Save it!

Finally, save the Contract so you can come back to it later.

A contract is a simple Pydantic model, which can be serialized to JSON and stored locally.

The cell below will offer a download of the contract.

In [None]:
from pathlib import Path
from google.colab import files

filename = 'aesop_ai.json'
Path(filename).write_text(aesop_contract.model_dump_json(indent=2))
files.download(filename)

## Next Steps

Go back and try different system prompts to see how they respond to outputs.  Try a different model.  Manually tweak the dimensions. Get a feel for what's happening.

When you're ready to move beyond basic vibe checking, you'll need to take a systematic approach.  To do that, you'll need input data.  Fortunately, we have tools to help build a representative set.  Head over to the input data playground for this.