<a href="https://colab.research.google.com/github/withpi/cookbook-withpi/blob/main/colabs/Quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://withpi.ai"><img src="https://play.withpi.ai/logo/logoFullBlack.svg" width="240"></a>

<a href="https://code.withpi.ai"><font size="4">Documentation</font></a>

<a href="https://build.withpi.ai"><font size="4">Copilot</font></a>

# Quickstart

This Colab walks you through scoring something with Pi!  It is the companion to our [Getting Started](https://code.withpi.ai/introduction) guide.

You will generate some questions and use Pi Scorer to score them.

## Install and initialize SDK

You'll need a `WITHPI_API_KEY` from https://build.withpi.ai/account.  Add it to your notebook secrets (the key symbol) on the left.

Run the cell below to install packages and load the SDK

In [1]:
%%capture

%pip install withpi withpi-utils datasets tqdm litellm pandas numpy

import os
from google.colab import userdata
from withpi import PiClient

# Load the notebook secret into the environment so the Pi Client can access it.
os.environ["WITHPI_API_KEY"] = userdata.get('WITHPI_API_KEY')

pi = PiClient()


# Build some questions

Let's say you want to build an application that generates children's stories teaching a life lesson.  Call it `AesopAI`.


In [2]:
from withpi_utils.colab import display_scoring_spec

aesop_application_description = """
Write a children's story in the style of Aesop's Fables teaching a life lesson
specified by the user. Provide just the story with no extra content.
"""

scoring_spec = pi.scoring_system.generate(
    application_description=aesop_application_description,
)

display_scoring_spec(scoring_spec)

These questions are suggestions for ways you might score a generated story.  Pi Scorer rates each one from **0.0 to 1.0** for a given story, aggregating them into a final "goodness" score.

## Generate a response
Let's see how it performs! The below cell uses Gemini to generate a response, but any suitable model will work fine.

You can import a Google Gemini key from AI Studio on the left pane, which populates a `GOOGLE_API_KEY` secret.  At low rates it's free.  Or adjust to a model of your choice with a key using docs at https://docs.litellm.ai/docs/.



In [3]:
import litellm
from withpi_utils.colab import pretty_print_responses

os.environ["GEMINI_API_KEY"] = userdata.get('GOOGLE_API_KEY')

prompt = "The importance of sharing"

response = litellm.completion(
    model="gemini/gemini-2.0-flash-lite",
    messages=[
        {"content": aesop_application_description, "role": "system"},
        {"content": prompt, "role": "user"}
    ]).choices[0].message.content

pretty_print_responses(response)

## Score it!

Take the generated response and see how it scores with Pi.

The below cell will run Pi Scoring, evaluating question in the scoring spec, offering a score from 1 (excellent!) to 0 (terrible!).  The current scoring spec is **uncalibrated**, meaning that all the dimensions are equally important, but it's a starting point for learning which are **actually** imporant based on your preferences.

In [4]:
from withpi_utils.colab import pretty_print_responses

score = pi.scoring_system.score(
    scoring_spec=scoring_spec,
    llm_input=prompt,
    llm_output=response,
)

pretty_print_responses(
    header="#### Prompt:\n" + prompt,
    response1="#### Response:\n" + response,
    scores_left=score,
)

0,1,2
"Does the response contain a clear beginning, middle, and end?",1.0,
Does the story follow a logical progression of events?,1.0,
Does the story resolve the conflict in a satisfying manner?,0.961,
Is the life lesson clearly conveyed in the story?,1.0,
Is the life lesson relevant to the input provided by the user?,1.0,
Can the life lesson be applied to real-life situations by children?,1.0,
Does the story reinforce the life lesson through its narrative?,1.0,
Are the characters in the story well-defined and relevant to the lesson?,1.0,
Do the characters' actions align with the lesson being taught?,1.0,
Are the characters relatable to children?,0.992,


## Next Steps

Go back and try different system prompts to see how they respond to outputs.  Try a different model.  Manually tweak the questions. Get a feel for what's happening.