<a href="https://colab.research.google.com/github/withpi/cookbook-withpi/blob/main/colabs/Input_Generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://withpi.ai"><img src="https://withpi.ai/logoFullBlack.svg" width="240"></a>

<a href="https://code.withpi.ai"><font size="4">Documentation</font></a>

<a href="https://play.withpi.ai"><font size="4">Technique Catalog</font></a>

# Query Fanout

There is no Playground associated with this Colab, but it's coming soon!

In many LLM applications, models are asked questions that require an initially broad query to be decomposed into a set of narrow queries to satisfy the information needs of the original request. These narrow queries, or "fan-outs", can be used retrieve documents, issue queries or commands to downstream tools or agents, and form the foundation of multi-hop reasoning, which each hop generating new "fan-outs" with the additional, accrued context.

Generating these fanouts with AI support can improve any question/answer component of your LLM system. Additionally, we demonstrate the usage of Pi's Contract-based evaluation to assess the quality of these fanout queries.

We will walk through the generation of more generic fanouts using the Pi SDK, before also highlighting how providing few-shot examples can improve fanout quality for a specific vertical.

## Install and initialize SDK

Connect to a regular CPU Python 3 runtime.  You won't need GPUs for this notebook.

You'll need a WITHPI_API_KEY from https://play.withpi.ai.  Add it to your notebook secrets (the key symbol) on the left.

Run the cell below to install packages and load the SDK

In [None]:
%%capture

import os
from google.colab import files, userdata

# Load the notebook secret into the environment so the Pi Client can access it.
os.environ["WITHPI_API_KEY"] = userdata.get('WITHPI_API_KEY')

%pip install withpi litellm httpx datasets jinja2 tqdm

# Import a bunch of useful libraries for later.
from concurrent.futures import ThreadPoolExecutor
import json
from pathlib import Path
import pandas as pd
import re

import datasets
import httpx
import jinja2
from litellm import completion
from tqdm.notebook import tqdm
from withpi import PiClient
from withpi.types import Contract

from rich.console import Console
from rich.table import Table
from rich.live import Live

console = Console()

client = PiClient()

def print_contract(contract: Contract):
  """print_contract pretty-prints a contract"""
  for dimension in contract.dimensions:
    print(dimension.label)
    for sub_dimension in dimension.sub_dimensions:
      print(f"\t{sub_dimension.description}")
      
def print_scores(pi_scores):
  """print_scores pretty-prints a Pi Score response as a table."""
  for dimension_name, dimension_scores in pi_scores.dimension_scores.items():
    print(f"{dimension_name}: {dimension_scores.total_score}")
    for subdimension_name, subdimension_score in dimension_scores.subdimension_scores.items():
      print(f"\t{subdimension_name}: {subdimension_score}")
    print("\n")
  print("---------------------")
  print(f"Total score: {pi_scores.total_score}")

def generate_fanout_queries(query: str, example_fanout_queries: list[dict] = [], num_fanout_queries: int = 5):
    try:
        response = client.queries.generate_fanouts(
              queries=[query],
              example_fanout_queries=example_fanout_queries,
              num_fanout_queries=num_fanout_queries
        )
        return {"query" : query, "fanout_queries" : response[0].fanout_queries}

    except Exception as e:
        if hasattr(e, 'response'):
            logger.error(f"Status code: {e.response.status_code}")
            logger.error(f"Response headers: {e.response.headers}")
            logger.error(f"Response body: {e.response.text}")
            logger.error(f"Request URL: {e.response.request.url}")
            logger.error(f"Request headers: {e.response.request.headers}")
            logger.error(f"Request body: {e.response.request.content}")
        raise
      
def score_with_contract(llm_input, llm_output, contract):
    try:
        input_evaluation_metrics = client.contracts.score(
              contract=contract,
              llm_input=llm_input,
              llm_output= llm_output
        )
        return input_evaluation_metrics

    except Exception as e:
        if hasattr(e, 'response'):
            logger.error(f"Status code: {e.response.status_code}")
            logger.error(f"Response headers: {e.response.headers}")
            logger.error(f"Response body: {e.response.text}")
            logger.error(f"Request URL: {e.response.request.url}")
            logger.error(f"Request headers: {e.response.request.headers}")
            logger.error(f"Request body: {e.response.request.content}")
        raise

def load_contract(url: str) -> Contract:
  """load_contract pulls a Contract JSON blob locally with validation."""
  resp = httpx.get(url)
  return Contract.model_validate_json(resp.content)

# Load Query Fanout contract

Load the `Query Fanout` example from Pi Labs cookbooks, or edit below to load a different one.


In [None]:
query_fanout_contract = load_contract("https://raw.githubusercontent.com/withpi/cookbook-withpi/refs/heads/main/contracts/query_fanout.json")
print_contract(query_fanout_contract)

## Define an Input Set

Let's build a Dataset containing a some examples of broad queries that are amenable to generated fanouts.

In [None]:
INPUT_FIELD = 'query'
OUTPUT_FIELD = 'fanout_queries'

example_queries_for_fanout_generation = [
    "Which cities in Australia are known for their art festivals, and what are the dates for the most popular festival in each city?",
    "List the top three coffee-producing countries in 2024, and specify the unique flavor profile of coffee from each country.",
    "Identify the top five largest deserts in the world and their total areas in square kilometers.",
    "Which airport is owned by Roosevelt County and is located in a city with a population of 810?",
    "Who are the current female cabinet members of the UK Prime Minister and in which city or town were they born?",
    "What are the titles of the last three albums released by the artist who sang 'Rolling in the Deep', and what is the name of the producer for each album?"
]

example_queries_df = pd.from_dict({INPUT_FIELD : example_queries_for_fanout_generation})


# Generate Fanouts

Now use the Pi API to generate fanout queries for each input

In [None]:
import logging
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

try:
    example_queries_df[OUTPUT_FIELD] = None
    for index, row in example_queries_df.iterrows():
      print(f"Generating fanouts for {index}: {row[INPUT_FIELD]}")
      result = generate_fanout_queries(example_queries_df.at[index, INPUT_FIELD])
      example_queries_df.at[index, OUTPUT_FIELD] = result["fanout_queries"]
except Exception as e:
    logger.error("Request failed", exc_info=True)

## Define Few-Shot Examples

Let's augment our initial dataset with a few examples that are specific to our fanout use-case. Suppose our LLM application passes these generated fanouts to downstream tools that perform retrieval and potentially other business-logic. Our generated fanouts should be more concise and include the specific tool which the fanout should be directed to. 

In [None]:
example_fanout_queries = [
  {
      "query": "Identify the three most influential environmental organizations globally, and describe one major campaign each has led in the past year.",
      "fanout_queries" : [
        'Greenpeace major campaigns [GENERIC_SEARCH]',
        'World Wildlife Fund major campaigns [GENERIC_SEARCH]'
        'The Nature Conservancy major campaigns [GENERIC_SEARCH]'
        'Three most influential environmental organizations [COT_REASONING]'
      ]
  },
  {
      "query": "What is the GDP in US dollars of the top three nations with the most international tourists?",
      "fanout_queries" : [
        'GDP in USD of France [FINANCE]',
        'GDP in USD of Spain [FINANCE]'
        'GDP in USD of United States [FINANCE]'
        'Top tourism nations [GENERIC_SEARCH]'
        'Top five GDPs [SUMMARIZE]'
      ]
  },
  {
      "query": "Who were two of Malin Maria Akerman's costars in the comedy film '27 Dresses'?",
      "fanout_queries" : [
        "Malin Akerman's female costars in '27 Dresses' [ENTERTAINMENT]",
        'Two costars [SUMMARIZE}'
      ]
  }
]


# Generate Fanouts with Few-Shot Examples

Now use the Pi API to generate fanout queries for each input, also supplying our vertical-specific few-shot examples.

In [None]:
OUTPUT_FIELD = 'fanout_queries_few_shot'


try:
    example_queries_df[OUTPUT_FIELD] = None
    for index, row in example_queries_df.iterrows():
      print(f"Generating fanouts for {index}: {row[INPUT_FIELD]}")
      result = generate_fanout_queries(example_queries_df.at[index, INPUT_FIELD], example_fanout_queries)
      example_queries_df.at[index, OUTPUT_FIELD] = result["fanout_queries"]
except Exception as e:
    logger.error("Request failed", exc_info=True)

# Evaluate Fanouts with Contract
Now use the Query Fanout Contract we specified earlier to evaluate the generated fanouts using the Pi scoring system.

In [None]:
INPUT_FIELD = 'query'
OUTPUT_FIELD = 'fanout_queries' # or fanout_queries_few_shot to evaluate the few-shot fanouts
try:
    example_queries_df['Total Score'] = None
    example_queries_df['Result'] = None
    for index, row in example_queries_df.iterrows():
      print(f"Processing {index}: {row[INPUT_FIELD]}")
      result = score_with_contract(example_queries_df.at[index, INPUT_FIELD], example_queries_df.at[index, OUTPUT_FIELD], query_fanout_contract)
      example_queries_df.at[index, 'Total Score'] = result.total_score
      example_queries_df.at[index, 'Result'] = result
      print(f"Processing {index}: {result}")
except Exception as e:
    logger.error("Request failed", exc_info=True)

In [None]:
for index, row in example_queries_df.iterrows():
  metrics = row['Result']
  print_scores(metrics)

## Next Steps

This query fanout generation can drive many other techniques in Pi.  You can adjust the above methods to steer the AI in different ways.