<a href="https://colab.research.google.com/github/withpi/cookbook-withpi/blob/main/colabs/Query_Fanout.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://withpi.ai"><img src="https://play.withpi.ai/logo/logoFullBlack.svg" width="240"></a>

<a href="https://code.withpi.ai"><font size="4">Documentation</font></a>

<a href="https://play.withpi.ai"><font size="4">Technique Catalog</font></a>

# Query Fanout

There is no Playground associated with this Colab, but it's coming soon!

In many LLM applications, models are asked questions that require an initially broad query to be decomposed into a set of narrow queries to satisfy the information needs of the original request. These narrow queries, or "fan-outs", can be used retrieve documents, issue queries or commands to downstream tools or agents, and form the foundation of multi-hop reasoning, which each hop generating new "fan-outs" with the additional, accrued context.

Generating these fanouts with AI support can improve any question/answer component of your LLM system. Additionally, we demonstrate the usage of Pi's Scoring System to assess the quality of these fanout queries.

We will walk through the generation of more generic fanouts using the Pi SDK, before also highlighting how providing few-shot examples can improve fanout quality for a specific vertical.

## Install and initialize SDK

Connect to a regular CPU Python 3 runtime.  You won't need GPUs for this notebook.

You'll need a WITHPI_API_KEY from https://play.withpi.ai.  Add it to your notebook secrets (the key symbol) on the left.

Run the cell below to install packages and load the SDK

In [None]:
%%capture

%pip install withpi withpi-utils datasets tqdm litellm

import os
from google.colab import userdata
from withpi import PiClient

# Load the notebook secret into the environment so the Pi Client can access it.
os.environ["WITHPI_API_KEY"] = userdata.get('WITHPI_API_KEY')

client = PiClient()

# Load Query Fanout Scoring Spec

Load the `Query Fanout` example from Pi Labs cookbooks, or edit below to load a different one.


In [None]:
# @title Load Scoring Spec
from withpi_utils.colab import load_scoring_spec_from_web, display_scoring_spec

query_fanout_spec = load_scoring_spec_from_web(
    "https://raw.githubusercontent.com/withpi/cookbook-withpi/refs/heads/main/scoring_specs/query_fanout.json"
)

display_scoring_spec(query_fanout_spec)

## Define an Input Set

Let's build a Dataset containing a some examples of broad queries that are amenable to generated fanouts.

In [None]:
import pandas as pd

INPUT_FIELD = 'query'
OUTPUT_FIELD = 'fanout_queries'

example_queries_for_fanout_generation = [
    "Which cities in Australia are known for their art festivals, and what are the dates for the most popular festival in each city?",
    "List the top three coffee-producing countries in 2024, and specify the unique flavor profile of coffee from each country.",
    "Identify the top five largest deserts in the world and their total areas in square kilometers.",
    "Which airport is owned by Roosevelt County and is located in a city with a population of 810?",
    "Who are the current female cabinet members of the UK Prime Minister and in which city or town were they born?",
    "What are the titles of the last three albums released by the artist who sang 'Rolling in the Deep', and what is the name of the producer for each album?"
]

example_queries_df = pd.DataFrame.from_dict({INPUT_FIELD : example_queries_for_fanout_generation})


# Generate Fanouts

Now use the Pi API to generate fanout queries for each input

In [None]:
example_queries_df[OUTPUT_FIELD] = None
for index, row in example_queries_df.iterrows():
    print(f"Generating fanouts for {index}: {row[INPUT_FIELD]}")
    result = client.search.query_fanout.generate(queries=example_queries_df.at[index, INPUT_FIELD])
    example_queries_df.at[index, OUTPUT_FIELD] = result.fanout_queries

In [None]:
example_queries_df[OUTPUT_FIELD].head()

## Define Few-Shot Examples

Let's augment our initial dataset with a few examples that are specific to our fanout use-case. Suppose our LLM application passes these generated fanouts to downstream tools that perform retrieval and potentially other business-logic. Our generated fanouts should be increasingly concise and narrowly-directed.

In [None]:
example_fanout_queries = [
  {
      "query": "Identify the three most influential environmental organizations globally, and describe one major campaign each has led in the past year.",
      "fanout_queries" : [
        'Greenpeace major campaigns',
        'World Wildlife Fund major campaigns'
        'The Nature Conservancy major campaigns'
        'Three most influential environmental organizations'
      ]
  },
  {
      "query": "What is the GDP in US dollars of the top three nations with the most international tourists?",
      "fanout_queries" : [
        'GDP in USD of France',
        'GDP in USD of Spain'
        'GDP in USD of United States'
        'Top tourism nations'
        'Top five GDPs'
      ]
  },
  {
      "query": "Who were two of Malin Maria Akerman's costars in the comedy film '27 Dresses'?",
      "fanout_queries" : [
        "Malin Akerman's female costars in '27 Dresses'",
        'Two costars'
      ]
  }
]


# Generate Fanouts with Few-Shot Examples

Now use the Pi API to generate fanout queries for each input, also supplying our vertical-specific few-shot examples.

In [None]:
OUTPUT_FIELD = 'fanout_queries_few_shot'


try:
    example_queries_df[OUTPUT_FIELD] = None
    for index, row in example_queries_df.iterrows():
      print(f"Generating fanouts for {index}: {row[INPUT_FIELD]}")
      result = client.search.query_fanout.generate(example_queries_df.at[index, INPUT_FIELD], example_fanout_queries)
      example_queries_df.at[index, OUTPUT_FIELD] = result["fanout_queries"]
except Exception as e:
    logger.error("Request failed", exc_info=True)

In [None]:
example_queries_df[OUTPUT_FIELD].head()

# Evaluate Fanouts with Scoring Spec
Now use the Query Fanout Scoring Spec we specified earlier to evaluate the generated fanouts using the Pi scoring system.

In [None]:
INPUT_FIELD = 'query'
OUTPUT_FIELD = 'fanout_queries' # or fanout_queries_few_shot to evaluate the few-shot fanouts
try:
    example_queries_df['Total Score'] = None
    example_queries_df['Result'] = None
    for index, row in example_queries_df.iterrows():
      print(f"Processing {index}: {row[INPUT_FIELD]}")
      fanout_queries = ",".join(example_queries_df.at[index, OUTPUT_FIELD])
      result = client.scoring_system.score(example_queries_df.at[index, INPUT_FIELD], fanout_queries, query_fanout_spec)
      example_queries_df.at[index, 'Total Score'] = result.total_score
      example_queries_df.at[index, 'Result'] = result
      print(f"Processing {index}: {result}")
except Exception as e:
    logger.error("Request failed", exc_info=True)

In [None]:
for index, row in example_queries_df.iterrows():
  metrics = row['Result']
  print("Scores for fanouts for query " + row[INPUT_FIELD])
  print()
  print_scores(metrics)
  print()

## Next Steps

This query fanout generation can drive many other techniques in Pi.  You can adjust the above methods to steer the AI in different ways.