<a href="https://colab.research.google.com/github/withpi/cookbook-withpi/blob/main/colabs/Scoring.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://withpi.ai"><img src="https://play.withpi.ai/logo/logoFullBlack.svg" width="240"></a>

<a href="https://code.withpi.ai"><font size="4">Documentation</font></a>

<a href="https://build.withpi.ai"><font size="4">Copilot</font></a>

# Scoring

This Colab walks you through scoring something with Pi!  It is the companion to our [Getting Started](https://code.withpi.ai/introduction) guide.

## Install and initialize SDK

You'll need a `WITHPI_API_KEY` from https://build.withpi.ai/account/keys.  Add it to your notebook secrets (the key symbol) on the left.

Run the cell below to install packages and load the SDK

In [1]:
%%capture

%pip install withpi withpi-utils litellm

import os
from google.colab import userdata
from withpi import PiClient

# Load the notebook secret into the environment so the Pi Client can access it.
os.environ["WITHPI_API_KEY"] = userdata.get('WITHPI_API_KEY')

pi = PiClient()

## Setup scoring system

Let's say we're building an AI to generate stories in the style of Aesop's Fables.  In good test-driven development, we need to decide what we're looking for out of our system.  Initialize a Scoring System and score function:

In [2]:
scoring_spec = [{'question': q} for q in [
    "Does the response contain a clear beginning, middle, and end?",
    "Does the story follow a logical progression of events?",
    "Does the story resolve the conflict in a satisfying manner?",
    "Is the life lesson clearly conveyed in the story?",
    "Is the life lesson relevant to the input provided by the user?"
]]

def score(input, output):
  return pi.scoring_system.score(
    scoring_spec=scoring_spec,
    llm_input=input,
    llm_output=output,
)

## Generate and score a response

The below cell uses Gemini to generate a response, but any suitable model will do fine.

You can import a Google Gemini key from AI Studio on the left pane, which populates a `GOOGLE_API_KEY` secret.  At low rates it's free.  Or adjust to a model of your choice with a key using docs at https://docs.litellm.ai/docs/.

In [3]:
import litellm
from withpi_utils.colab import pretty_print_responses

os.environ["GEMINI_API_KEY"] = userdata.get('GOOGLE_API_KEY')

system_prompt = """Write a children's story in the style of Aesop's Fables
teaching a life lesson specified by the user. Provide just the story with no
extra content."""

test_prompt = "The importance of sharing"

response = litellm.completion(
    model="gemini/gemini-2.0-flash-lite",
    messages=[
        {"content": system_prompt, "role": "system"},
        {"content": test_prompt, "role": "user"}
    ]).choices[0].message.content

pretty_print_responses(
    header="#### Prompt:\n" + test_prompt,
    response1="#### Response:\n" + response,
    scores_left=score(test_prompt, response),
)

0,1,2
"Does the response contain a clear beginning, middle, and end?",1.0,
Does the story follow a logical progression of events?,1.0,
Does the story resolve the conflict in a satisfying manner?,0.863,
Is the life lesson clearly conveyed in the story?,1.0,
Is the life lesson relevant to the input provided by the user?,1.0,
Total score,,0.973


## Next Steps

That's it!  Each question gets a score between **0 and 1**, aggregated to a final **goodness** score you can make decisions on.

Go back and try different responses to see how the scores change.  Try a different model.  Manually tweak the questions. Get a feel for what's happening.