<a href="https://colab.research.google.com/github/withpi/cookbook-withpi/blob/main/colabs/Feedback_Clustering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://withpi.ai"><img src="https://withpi.ai/logoFullBlack.svg" width="240"></a>

<a href="https://code.withpi.ai"><font size="4">Documentation</font></a>

<a href="https://play.withpi.ai"><font size="4">Technique Catalog</font></a>

# WithPi Feedback Clustering

This colab assumes that you have a contract and want to see how clutering works on Feedback.

We will walk through the same `Aesop AI` example, but you can load any contract here.  Ideally you would have some real feedback, but for this Colab we'll demonstrate with synthetic feedback.

This should take about **15 minutes**, even if you're unfamiliar with Colab.

## Install and initialize SDK

Connect to a regular CPU Python 3 runtime.  You won't need GPUs for this notebook.

You'll need a WITHPI_API_KEY from https://play.withpi.ai.  Add it to your notebook secrets (the key symbol) on the left.

Run the cell below to install packages and load the SDK

In [ ]:
%%capture

%pip install withpi litellm

import os
from google.colab import userdata
from withpi import PiClient

os.environ["WITHPI_API_KEY"] = userdata.get('WITHPI_API_KEY')

client = PiClient()

# Load contract and Dataset

Load the `Aesop AI` example and example set from Pi Labs cookbooks, or edit below to load a different one.


In [None]:
import httpx
import pandas as pd
from google.colab import data_table
from withpi.types import Contract

resp = httpx.get("https://raw.githubusercontent.com/withpi/cookbook-withpi/refs/heads/main/contracts/aesop_ai_calibrated.json")

aesop_contract = Contract.model_validate_json(resp.content)

for dimension in aesop_contract.dimensions:
  print(dimension.label)
  for sub_dimension in dimension.sub_dimensions:
    print(f"\t{sub_dimension.description}")

df = pd.read_parquet("https://raw.githubusercontent.com/withpi/cookbook-withpi/refs/heads/main/datasets/aesop_ai_examples.parquet")
data_table.enable_dataframe_formatter()
df


## Generate Feedback

In the real world, you would have feedback from either internal to your team, your users, or some other source.  We will generate feedback from a fairly capable LLM instead, though this method will be unlikely to give you useful examples to learn from.

In [None]:
from concurrent.futures import ThreadPoolExecutor
from tqdm.notebook import tqdm
from litellm import completion
import os
from google.colab import userdata

os.environ["GEMINI_API_KEY"] = userdata.get('GOOGLE_API_KEY')

def generate(llm_input: str, llm_output: str, pbar) -> str:
  messages = [
      {
          "content": f"You will get an input (wrapped in <input></input> tags) and an output (wrapped in <output></output> tags) produced by an LLM trying to implement an application described by <application>{aesop_contract.description}</application>. Generate one line of feedback as if you were a user rating this output. Do not include any tags, just the feedback.",
          "role": "system"
      },
      {
          "content": f"<input>{llm_input}</input><output>{llm_output}</output>",
          "role": "user"
      },
  ]
  result = completion(model="gemini/gemini-1.5-flash-8b-latest",
                      messages=messages).choices[0].message.content
  pbar.update(1)
  return result

def do_inference():
  futures = []
  pbar = tqdm(total=len(df))
  with ThreadPoolExecutor(max_workers=4) as executor:
    for index, row in df.iterrows():
      futures.append(executor.submit(generate, row["input"], row["output"], pbar))
  return [future.result() for future in futures]

df["feedback"] = do_inference()
df

## Cluster feedback

Now try clustering the feedback provided.

We only generated textual feedbacks, but if the user had a way to rate a response, that can be included as well.

In [None]:
feedback_clusters = client.feedback.cluster(
    feedbacks=[
        {"identifier": str(index),
         "llm_input": row["input"],
         "llm_output": row["output"],
         "comment": row["feedback"],
         "sources": ["Synthetic"], # Can include labels from different sources.
         "rating": "Neutral" # Clustering can include the rating if available.
         } for index, row in df.iterrows()],
)

for cluster in feedback_clusters:
  display(f"Cluster name: {cluster.topic}, Count: {len(cluster.feedback)}")

df['cluster'] = ['']*len(df)
for cluster in feedback_clusters:
  for identifier in cluster.feedback:
    df.loc[int(identifier),'cluster'] = cluster.topic
df

## Next steps

Now it may be useful to slice and dice this data to identify areas for improvement.  If a lot of users are saying similar things, make sure there is a Dimension that focuses on it.  This will penalize responses that fail that attribute, meaning you can return to [Prompt Optimization](https://colab.research.google.com/github/withpi/cookbook-withpi/blob/main/colabs/Prompt_Optimization.ipynb) to automatically incorporate this feedback.  Coming soon you'll be able to deploy a model with reinforcement learning that is steered in this direction.

