<a href="https://colab.research.google.com/github/withpi/cookbook-withpi/blob/main/colabs/Cascading.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://withpi.ai"><img src="https://play.withpi.ai/logo/logoFullBlack.svg" width="240"></a>

<a href="https://code.withpi.ai"><font size="4">Documentation</font></a>

<a href="https://play.withpi.ai"><font size="4">Technique Catalog</font></a>

# Cascading

This Colab is the companion to the Cascading playground.

Let's say you have two models, an expensive, slow, but high quality one, and a fast, cheap, but relatively lower quality one.

You can improve both cost and performance by using the faster model for simple tasks and the slower model for complex ones.  The trick is to know which input is which.  Contracts help you do that.

## Install and initialize SDK

Connect to a regular CPU Python 3 runtime.  You won't need GPUs for this notebook.

You'll need a WITHPI_API_KEY from https://play.withpi.ai.  Add it to your notebook secrets (the key symbol) on the left.

Run the cell below to install packages and load the SDK

In [5]:
%%capture

%pip install withpi withpi-utils datasets tqdm litellm

import os
from google.colab import userdata
from withpi import PiClient

# Load the notebook secret into the environment so the Pi Client can access it.
os.environ["WITHPI_API_KEY"] = userdata.get('WITHPI_API_KEY')

client = PiClient()

# Load a Pi-Scorer and a Dataset

We'll keep using a pre-built scoring spec with sample inputs, but feel free to bring your own


In [4]:
# @title Load Scoring Spec
from withpi_utils.colab import load_scoring_spec_from_web, display_scoring_spec

aesop_scoring_spec = load_scoring_spec_from_web(
    "https://raw.githubusercontent.com/withpi/cookbook-withpi/refs/heads/main/contracts/aesop_ai.json"
)

display_scoring_spec(aesop_scoring_spec)

In [3]:
# @title Load dataset
from datasets import load_dataset

aesop_dataset = load_dataset("withpi/aesop", split="train")

print(aesop_dataset)

README.md:   0%|          | 0.00/302 [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/55.0k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/23 [00:00<?, ? examples/s]

Dataset({
    features: ['input', 'output'],
    num_rows: 23
})


## Evaluate the scoring spec on different models.

Let's try generating responses from a "big" model and a "small" one to compare scores.

Adjust to pick a different model and supply your own key with docs at https://docs.litellm.ai/docs/.

You can import a Google Gemini key from AI Studio on the left pane, which populates a GOOGLE_API_KEY secret essentially for free.

In [9]:
from withpi_utils.colab import pretty_print_responses
import litellm

os.environ["GEMINI_API_KEY"] = userdata.get('GOOGLE_API_KEY')

def generate(system: str, user: str, model: str) -> str:
    """generate passes the provided system and user prompts into the given model
    via LiteLLM"""
    messages = [
        {"content": system, "role": "system"},
        {"content": user, "role": "user"},
    ]
    return litellm.completion(model=model, messages=messages).choices[0].message.content


for i in range(5):
  row = aesop_dataset[i]
  small_model_output = generate(system=aesop_scoring_spec.description, user=row["input"], model="gemini/gemini-1.5-flash-8b")
  big_model_output = generate(system=aesop_scoring_spec.description, user=row["input"], model="gemini/gemini-2.0-flash")

  small_score = client.scoring_system.score(llm_input=row["input"], llm_output=small_model_output, scoring_spec=aesop_scoring_spec)
  big_score = client.scoring_system.score(llm_input=row["input"], llm_output=big_model_output, scoring_spec=aesop_scoring_spec)

  pretty_print_responses(
      header="#### Input:\n" + row["input"],
        response1="#### Output:\n" + small_model_output,
        response2="#### Output:\n" + big_model_output,
        left_label="gemini/gemini-1.5-flash-8b",
        right_label="gemini/gemini-2.0-flash",
        scores_left=small_score,
        scores_right=big_score,
  )
  print("\n\n")

0,1,2
Story Structure,,0.674
,Plot Structure,0.766
,Conflict Introduction,0.73
,Resolution Clarity,0.527
Character Development,,0.676
,Character Presence,0.754
,Character Development,0.52
,Dialogue Quality,0.754
Narrative Engagement,,0.667
,Engaging Narrative,0.742

0,1,2
Story Structure,,0.764
,Plot Structure,0.844
,Conflict Introduction,0.727
,Resolution Clarity,0.723
Character Development,,0.801
,Character Presence,0.875
,Character Development,0.75
,Dialogue Quality,0.777
Narrative Engagement,,0.743
,Engaging Narrative,0.75







0,1,2
Story Structure,,0.734
,Plot Structure,1.0
,Conflict Introduction,0.441
,Resolution Clarity,0.762
Character Development,,0.84
,Character Presence,0.762
,Character Development,0.758
,Dialogue Quality,1.0
Narrative Engagement,,0.751
,Engaging Narrative,0.75

0,1,2
Story Structure,,0.918
,Plot Structure,1.0
,Conflict Introduction,0.754
,Resolution Clarity,1.0
Character Development,,0.837
,Character Presence,0.754
,Character Development,0.758
,Dialogue Quality,1.0
Narrative Engagement,,0.764
,Engaging Narrative,0.773







0,1,2
Story Structure,,0.934
,Plot Structure,1.0
,Conflict Introduction,0.801
,Resolution Clarity,1.0
Character Development,,0.922
,Character Presence,0.766
,Character Development,1.0
,Dialogue Quality,1.0
Narrative Engagement,,0.802
,Engaging Narrative,0.758

0,1,2
Story Structure,,0.923
,Plot Structure,1.0
,Conflict Introduction,1.0
,Resolution Clarity,0.77
Character Development,,0.844
,Character Presence,0.773
,Character Development,0.758
,Dialogue Quality,1.0
Narrative Engagement,,0.766
,Engaging Narrative,0.781







0,1,2
Story Structure,,0.828
,Plot Structure,1.0
,Conflict Introduction,0.738
,Resolution Clarity,0.746
Character Development,,0.789
,Character Presence,0.574
,Character Development,0.852
,Dialogue Quality,0.941
Narrative Engagement,,0.651
,Engaging Narrative,0.73

0,1,2
Story Structure,,0.81
,Plot Structure,0.891
,Conflict Introduction,0.797
,Resolution Clarity,0.742
Character Development,,0.742
,Character Presence,0.754
,Character Development,0.648
,Dialogue Quality,0.824
Narrative Engagement,,0.677
,Engaging Narrative,0.746







0,1,2
Story Structure,,1.0
,Plot Structure,1.0
,Conflict Introduction,1.0
,Resolution Clarity,1.0
Character Development,,1.0
,Character Presence,1.0
,Character Development,1.0
,Dialogue Quality,1.0
Narrative Engagement,,0.905
,Engaging Narrative,1.0

0,1,2
Story Structure,,0.839
,Plot Structure,1.0
,Conflict Introduction,0.762
,Resolution Clarity,0.754
Character Development,,0.749
,Character Presence,0.77
,Character Development,0.482
,Dialogue Quality,0.996
Narrative Engagement,,0.682
,Engaging Narrative,0.758





Barnaby the badger was known throughout the Whispering Woods for his magnificent burrow. Tunnels branched like roots, leading to cozy chambers filled with soft moss and sparkling stones. He spent all his time expanding it, perfecting it, sure that someday, a truly special someone would appreciate its wonders.

One day, Penelope the porcupine arrived in the Whispering Woods. She wasn't looking for a magnificent burrow. She was looking for sunlight and dandelions. Barnaby, seeing her near his burrow entrance, puffed out his chest. “Welcome, Penelope! I’m Barnaby, and I must insist you explore my humble abode. It’s quite draft-proof, you know, and features a state-of-the-art pebble-sorting system.”

Penelope politely peered inside. “It looks very…organized, Barnaby. But I was hoping to find a sunny patch for a picnic. Dandelions are best enjoyed in the light, you see.”

Barnaby, crestfallen, followed her. “Dandelions? But I have the finest collection of root vegetables back inside! Car

## Next Steps

Now you can imagine unconditionally applying a cheap model, scoring it, then if it scores below a threshold falling back to the big model.

An improvement to this technique is to train a classifier that can predict whether a particular prompt is likely to require a more powerful model.  Then start with the big one if warranted. That is a followup Colab.