<a href="https://colab.research.google.com/github/withpi/cookbook-withpi/blob/main/colabs/Contract_Creation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# WithPi Contract Creation

You have a generative AI application, but you aren't happy with its responses to user prompts.  The **WithPi** SDK helps you build a feedback loop, making your responses get better automatically without significant Quality expertise.  Let's dig in!

This Colab walks you through the first step, creating a **Pi Contract**.  This is a **human and machine readable** description of what **goodness** means to you and is the cornerstone of our approach.

This should take about **15 minutes**, even if you're unfamiliar with Colab.

## Install and initialize SDK

Connect to a regular CPU Python 3 runtime.  You won't need GPUs for this notebook.

You'll need a WITHPI_API_KEY from https://play.withpi.ai.  Add it to your notebook secrets (the key symbol) on the left.

Run the cell below to install packages and load the SDK

In [1]:
%pip install withpi litellm

import os
from google.colab import userdata
from withpi import PiClient

os.environ["WITHPI_API_KEY"] = userdata.get('WITHPI_API_KEY')

client = PiClient()

Collecting withpi
  Downloading withpi-0.1.0a30-py3-none-any.whl.metadata (20 kB)
Collecting litellm
  Downloading litellm-1.59.9-py3-none-any.whl.metadata (36 kB)
Collecting httpx<1,>=0.23.0 (from withpi)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting python-dotenv>=0.2.0 (from litellm)
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting tiktoken>=0.7.0 (from litellm)
  Downloading tiktoken-0.8.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Downloading withpi-0.1.0a30-py3-none-any.whl (102 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m102.8/102.8 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading litellm-1.59.9-py3-none-any.whl (6.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.7/6.7 MB[0m [31m24.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpx-0.27.2-py3-none-any.whl (76 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7

# Make a contract

Let's say you want to build an application that generates children's stories teaching a life lesson.  Call it `AesopAI`.

Start by creating a first cut contract based on that general input, proposed in the following cell:


In [2]:
aesop_contract = client.contracts.generate_dimensions(
    contract_description=(
        "Write a children's story in the style of Aesop's "
        "Fables teaching a life lesson specified by the user"
    ),
)

for dimension in aesop_contract.dimensions:
  print(dimension.label)
  for sub_dimension in dimension.sub_dimensions:
    print(f"\t{sub_dimension.description}")

Story Structure
	Does the story have a clear beginning, middle, and end?
	Is there a conflict introduced early in the story that drives the plot?
	Is the resolution of the conflict clear and satisfying?
Character Development
	Does the story include characters that are relatable for children?
	Do the characters demonstrate growth or change by the end of the story?
	Is the dialogue between characters natural and age-appropriate?
Narrative Engagement
	Is the story engaging and likely to hold a child's interest?
	Does the story use vivid imagery to help children visualize the scenes?
	Does the story incorporate repetitive elements that aid in comprehension and retention?
Language Appropriateness
	Is the language used in the story appropriate for children's understanding?
	Is the story an appropriate length for a children's tale (not exceeding 1000 words)?
Moral and Cultural Sensitivity
	Does the story clearly convey a specified life lesson?
	Does the story reiterate the moral lesson at the

A contract is essentially a hierarchical rubric for grading a response.  A bunch of "simple" questions add up to broader categories, which yield a final score.  Output will vary somewhat, but the table above should have reasonable grading questions for the application.

## Smoke test the contract

Let's see how it performs with no further tuning.  The below cell uses Gemini to generate a response, but any suitable model will work fine.

Adjust to pick a different model and supply your own key with docs at https://docs.litellm.ai/docs/.

You can import a Google Gemini key from AI Studio on the left pane, which populates a GOOGLE_API_KEY secret essentially for free.

In [25]:
from litellm import completion
import os
from google.colab import userdata

os.environ["GEMINI_API_KEY"] = userdata.get('GOOGLE_API_KEY')

def generate(prompt: str) -> str:
  messages = [
      {
          "content": (
              "Write a children's story in the style of Aesop's Fables "
              "teaching a life lesson specified by the user. Provide just the "
              "story with no extra content."
          ),
          "role": "system"
      },
      {
          "content": prompt,
          "role": "user"
      },
  ]
  return completion(model="gemini/gemini-2.0-flash-exp",
                    messages=messages).choices[0].message.content

prompt = "The importance of sharing"
response = generate(prompt)

# Print with line wrapping and explicit newlines.
class printer(str):
    def __repr__(self):
       return self
display(printer(response))

Barnaby Bear had a beautiful berry bush, the biggest and best in the whole forest. It was laden with plump, juicy berries, redder than rubies and sweeter than honey. Barnaby loved his berries more than anything. He ate them every day, stuffing his face until his tummy was round and tight. He never offered any to Rosie Rabbit, who lived in a burrow nearby, nor to Finley Fox, who often peeked longingly from behind a tree.

One day, a terrible storm raged through the forest. The wind howled, and the rain poured down. Barnaby’s berry bush, usually so sturdy, was tossed and turned. When the storm finally passed, the bush was bare, all its berries washed away. Barnaby was devastated. He looked around, his heart heavy, and his tummy rumbling.

Rosie Rabbit, who had managed to find some clover patches under a large rock, hopped over to Barnaby. She noticed his sad face and asked what was wrong. When he told her about his lost berries, Rosie pulled out a handful of fresh clover and offered it t

## Score it!

Take the generated response and see how it scores with Pi.

The below cell will run Pi Scoring, evaluating each dimension in the contract, offering a score from 1 (excellent!) to 0 (terrible!).  The current contract is **uncalibrated**, meaning that all the dimensions are equally important, but it's a starting point for learning which are **actually** imporant based on your preferences.

In [32]:
pi_scores = client.contracts.score(
    contract=aesop_contract,
    llm_input=prompt,
    llm_output=response,
)

for dimension_name, dimension_scores in pi_scores.dimension_scores.items():
  print(f"{dimension_name}: {dimension_scores.total_score}")
  for subdimension_name, subdimension_score in dimension_scores.subdimension_scores.items():
    print(f"\t{subdimension_name}: {subdimension_score}")
  print("\n")
print("---------------------")
print(f"Total score: {pi_scores.total_score}")

Story Structure: 0.4578685228748797
	Plot Structure: 0.94375
	Conflict Introduction: 0.286328125
	Resolution Clarity: 0.5


Character Development: 0.33869350451017033
	Character Presence: 0.6375
	Character Development: 0.409375
	Dialogue Quality: 0.20634765625


Narrative Engagement: 0.6351023822061761
	Engaging Narrative: 0.796875
	Imagery Use: 0.615625
	Repetitive Elements: 0.5421875


Language Appropriateness: 0.7710652834008097
	Language Simplicity: 0.796875
	Length Appropriateness: 0.746875


Moral and Cultural Sensitivity: 0.4362543285879185
	Moral Clarity: 0.525
	Life Lesson Reiteration: 0.290625
	Cultural Sensitivity: 0.653125


Illustration Integration: 0.20257568359375
	Illustration Suggestions: 0.20257568359375


---------------------
Total score: 0.3937866028893455


## Save it!

Finally, save the Contract so you can come back to it later.

Contracts are stored on Hugging Face (http://huggingface.co) so that you can freely examine them, version them, etc.

The below call will persist your contract in a **public** repository in the WithPi organization.  You can provide your own token if you wish to write to your own organization.  Only you can **write** to this repository, though anyone can **read** from it.



In [33]:
contract_name = "withpi/aesop_ai" # @param {"type":"string"}
resp = client.contracts.write_to_hf(
    contract=aesop_contract,
    hf_contract_name=contract_name,
)
print(resp)

InternalServerError: Error code: 500 - {'exception': ["huggingface_hub.errors.HfHubHTTPError: (Request ID: Root=1-679a4820-7c13b64f3660821a285e22c4;8640cac8-6b82-42a5-ad75-352de3e82f71)\n\n403 Forbidden: You don't have the required permissions to complete this action.\nCannot access content at: https://huggingface.co/api/collections/withpi/contracts-678bd298a38c4a3862eaae5c/items.\nMake sure your token has the correct permissions.\n"]}

## Next Steps

Now that you have a basic scorer, you can generate test sets to evaluate performance, improve your prompt, tune your contract, and even deploy a custom model that performs better.