<a href="https://colab.research.google.com/github/withpi/cookbook-withpi/blob/main/colabs/Quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://withpi.ai"><img src="https://play.withpi.ai/logo/logoFullBlack.svg" width="240"></a>

<a href="https://code.withpi.ai"><font size="4">Documentation</font></a>

<a href="https://play.withpi.ai"><font size="4">Technique Catalog</font></a>

# Quickstart

This Colab walks you through scoring something with Pi!  It is the companion to our [Getting Started](https://code.withpi.ai/introduction) guide.

You will generate some questions and use Pi Scorer to score with them.

## Install and initialize SDK


You'll need a WITHPI_API_KEY from your [account page](https://build.withpi.ai/account) (for now this is **free**, but in the future we expect to charge for scoring calls).  Add it to your notebook secrets (the key symbol) on the left.

Run the cell below to install packages and load the SDK.

In [6]:
%%capture

%pip install withpi withpi-utils litellm

import os
from google.colab import userdata
from withpi import PiClient

# Load the notebook secret into the environment so the Pi Client can access it.
os.environ["WITHPI_API_KEY"] = userdata.get('WITHPI_API_KEY')

pi = PiClient()

# Build some questions

Let's say you want to build an application that generates children's stories teaching a life lesson.  Call it `AesopAI`.


In [4]:
questions = pi.scoring_system.generate(
    application_description=(
        "Write a children's story in the style of Aesop's Fables "
        "teaching a life lesson specified by the user. Provide just the "
        "story with no extra content."
    ),
)

display([q.question for q in questions])

['Does the story have a clear beginning, middle, and end?',
 'Does the story include a conflict that is resolved in a way that teaches the life lesson?',
 'Is the resolution of the story satisfying and meaningful?',
 'Are the characters in the story well-defined and relevant to the life lesson?',
 'Are the characters relatable to children?',
 "Do the characters' actions logically lead to the life lesson?",
 'Is the specified life lesson clearly conveyed in the story?',
 'Is the life lesson seamlessly integrated into the narrative?',
 'Does the story include an explicit moral statement at the end?',
 'Is the moral of the story clear and easy to understand?',
 'Is the moral relevant to the specified life lesson?',
 'Is the life lesson presented in a memorable way?',
 "Does the story emulate the style of Aesop's Fables, including anthropomorphic characters and a moral?",
 'Does the story maintain a positive and encouraging tone?',
 'Is the story consistent in tone and style throughout?',


These questions are suggestions for ways you might score a generated story.  Pi Scorer rates each one from **0.0 to 1.0** for a given story, aggregating them into a final "goodness" score.

## Generate a response
Let's see how it performs! The below cell uses Gemini to generate a response, but any suitable model will work fine.

Adjust to pick a different model and supply your own key with docs at https://docs.litellm.ai/docs/.

You can import a Google Gemini key from AI Studio on the left pane, which populates a GOOGLE_API_KEY secret.  At low rates it's free.

In [7]:
import litellm

os.environ["GEMINI_API_KEY"] = userdata.get('GOOGLE_API_KEY')

response = litellm.completion(
    model="gemini/gemini-2.0-flash-lite",
    messages=[
        {"content": "Write a children's story in the style of Aesop's Fables "
          "teaching a life lesson specified by the user. Provide just the "
          "story with no extra content.", "role": "system"},
        {"content": "The importance of sharing", "role": "user"}
    ]).choices[0].message.content

print(response)

Once upon a time, in a lush green meadow, lived a grumpy hedgehog named Horace. Horace had a beautiful, juicy red apple tree all to himself. He would guard his apples fiercely, scowling at any creature who dared to look longingly at his bounty.

One hot summer day, a little field mouse, Millie, scurried up to Horace. "Mr. Hedgehog," she squeaked, her voice trembling. "I haven't eaten all day, and I see your lovely apples. Might I have just one, please?"

Horace grumbled, "Go away! These apples are mine! Find your own food." He turned his back and began to munch his own apple.

Next came Barnaby Badger, his tummy rumbling. "Horace, old friend," he wheezed. "My family and I haven't found anything to eat. Could we possibly have a few of your delicious apples?"

"No!" Horace snapped, guarding his tree. "They are for me, and me alone!"

The sun beat down, and as the days grew hotter, the meadow dried up. The river shrank, and the berries on the bushes withered. Horace’s apple tree remained 

## Score it!

Take the generated response and see how it scores with Pi.

The below cell will run Pi Scoring, evaluating question in the scoring spec, offering a score from 1 (excellent!) to 0 (terrible!).  The current scoring spec is **uncalibrated**, meaning that all the dimensions are equally important, but it's a starting point for learning which are **actually** imporant based on your preferences.

In [None]:
from withpi_utils.colab import pretty_print_responses

score = client.scoring_system.score(
    scoring_spec=aesop_scoring_spec,
    llm_input=prompt,
    llm_output=response,
)

pretty_print_responses(
    header="#### Prompt:\n" + prompt,
    response1="#### Response:\n" + response,
    left_label="gemini/gemini-1.5-flash-8b",
    scores_left=score,
)

0,1,2
Story Structure,,0.943
,Story Completeness,1.0
,Conflict Resolution,0.996
,Narrative Flow,1.0
,Appropriate Length,0.777
Moral and Lesson,,1.0
,Life Lesson Inclusion,1.0
,Lesson Clarity,1.0
,Moral Statement Presence,1.0
,Lesson Integration,1.0


## Save it!

Finally, save the ScoringSpec so you can come back to it later.

A scoring spec is a simple Pydantic model, which can be serialized to JSON and stored locally.

The cell below will offer a download of the scoring spec.

In [None]:
with open("aesop_ai.json", "w") as file:
    file.write(aesop_scoring_spec.model_dump_json(indent=2))

## Next Steps

Go back and try different system prompts to see how they respond to outputs.  Try a different model.  Manually tweak the dimensions. Get a feel for what's happening.

When you're ready to move beyond basic vibe checking, you'll need to take a systematic approach.  To do that, you'll need input data.  Fortunately, we have tools to help build a representative set.  Head over to the input data playground for this.