# Use SuperPipe to experiment and evaluate different approaches
There are many ways to build a labeling pipeline that all will accomplish the same result. The goal of `SuperPipe` is to empower rapid and robust experimentation so that you can understand the performance, accuracy, and cost tradeoffs between approaches.

In this example, we'll experiment with a few different approaches to a categorization pipeline we want to build. `SuperPipe` will make this experimentation quick and at the end we'll have a solid understanding of how different approaches perform. 


### Task
The task at hand is to categorize furniture items into a multi-level taxonomy based on their name and description. 

For example
Name: `Blair Table by homestyles`

Description: `This Blair Table by homestyles is perfect for Sunday brunches or game night. The round pedestal table is available as shown, or as part of a five-piece set. Features solid hardwood construction in a black finish that can easily match a traditional or contemporary aesthetic. Measures: 30"H x 42" Diameter`

Correct classification: `Tables & Desks > Dining Tables`

### Approaches
There are two different approaches we want to try.
1. LLMs + Embedding 
2. Heiarchical prompting


In [1]:
from dotenv import load_dotenv
load_dotenv()

import os
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
COHERE_API_KEY = os.getenv('COHERE_API_KEY')

In [2]:
# %pip install cohere

import pandas as pd
from superpipe import *
from pydantic import BaseModel, Field
import cohere
import os
import numpy as np
from typing import List

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


## Data processing
We'll start out with reading in our data and building our taxonomy. The process of building a taxonomy is a project in and of itself. There are also many taxonomies available online that you can use. In our case, we're building our taxonomy based on our ground truth dataset. Since we have such a large dataset we can be reasonably confident that all values are represented. As you'll see our approach does not use the ground truth data as training data so it will be easy for us to expand the taxonomy without needing additional data. 

In [None]:
df = pd.read_csv('./furniture_clean.csv')

In [None]:
# Remove the 'Furniture > ' from each string in the 'category' column since they all start with Furniture.
df['category_new'] = df['category'].str.replace('Furniture > ', '')

For our embeddings approach we want the taxonomy to be a single string. We'll create the taxonomy from the ground truth data. 

In [None]:
taxonomy = list(set(df['category_new']))
taxonomy[0:5]


['Outdoor Seating > Outdoor Loveseats',
 'Outdoor Tables > Outdoor Coffee Tables',
 'Nursery > Cribs',
 'Beds & Headboards > Bedframes',
 'Chairs > Dining Chairs']

However, for our heiarchical approach we need to understand the taxonomy a little more so we'll create a lookup table between first and second level categories.

In [None]:
# Create a lookup table with first level taxonomy as keys and second level as values
lookup_table = df['category_new'].str.split(' > ', expand=True).groupby(0)[1].apply(list).apply(set)
lookup_table['Chairs']

{'Accent Chairs', 'Desk Chairs', 'Dining Chairs', 'Recliners'}

## Building our pipeline using Superpipe

### Approach 1: Embeddings
The first approach is similar to the approach we took in the `Product Categorization` example we gave in the project repo. We are omitting the Google Search step because we already have item descriptions. 
1. Write a simple description of the product given name and description
2. Vector embedding search for top N categories
3. LLM: pick the best category



In [None]:
short_description_prompt = lambda row: f"""
You are given a product name and description for a piece of furniture.
Return a single sentence decribing the product.
Product name: {row['name']}
Product description: {row['description']}
"""

class ShortDescription(BaseModel):
  short_description: str = Field(description="A single sentence describing the product")
  
short_description_step = steps.LLMStructuredStep(
  prompt=short_description_prompt,
  model=models.gpt35,
  out_schema=ShortDescription,
  name="short_description"
)

We are using Cohere to embed both our description and the taxonomy but you can substitute in any embeddings provider with the `EmbeddingSearchStep`. Unlike LLMs that are good at ignoring irrelevent information, we've learned from experience that short, simple descriptions work better in embedding space than trying to include too much. This is something you can and should experiment with. 

In [None]:
# set your cohere api key as an env var or set it directly here
COHERE_API_KEY = os.environ.get('COHERE_API_KEY')
co = cohere.Client(COHERE_API_KEY)

def embed_fn(texts: List[str]):
  embeddings = co.embed(
    model="embed-english-v3.0",
    texts=texts,
    input_type='classification'
  ).embeddings
  return np.array(embeddings).astype('float32')

embedding_search_prompt = lambda row: row["short_description"]

embedding_search_step = steps.EmbeddingSearchStep(
  search_prompt= embedding_search_prompt,
  embed_fn=embed_fn,
  k=5,    
  candidates=taxonomy,
  name="embedding_search"
)

We now take the result of the embeddings and ask the LLM to pick the best response. It's important that our embedding search is optimized for recall because if the correct answer doesn't exist in the response our categorize step will have no chance of succeeding. 

In [None]:
def categorize_prompt(row):
    categories = ""
    i = 1
    while f"category{i}" in row:
        categories += f'{i}. {row[f"category{i}"]}\n'
        i += 1

    return f"""
    You are given a product description and {i-1} options for the product's category.
    Pick the index of the most accurate category.
    The index must be between 1 and {i-1}.
    Product description: {row['short_description']}
    Categories:
    {categories}
    """
    
class CategoryIndex(BaseModel):
    category_index: int = Field(description="The index of the most accurate category")
    
categorize_step = steps.LLMStructuredStep(
  prompt=categorize_prompt,
  model=models.gpt35,
  out_schema=CategoryIndex,
  name="categorize"
)

By returning just the index we can ensure that the actual string we use is in the taxonomy since LLMs sometimes hallucinate characters. Additionally, we don't need to waste response tokens on printing the entire string.

In [None]:
predicated_category_step = steps.CustomStep(
  transform=lambda row: row[f'category{row["category_index"]}'],
  name="predicated_category"
)

We'd like to test our end to end pipeline to make sure it works before we go any further. We'll make a copy of the first five rows of the dataframe and run the pipeline to make sure it works

In [None]:
test_df = df.head(5).copy()

In [None]:
evaluate = lambda row: row['predicted_category'].lower() == row['category_new'].lower()

categorizer = pipeline.Pipeline([
  short_description_step, 
  embedding_search_step, 
  categorize_step,
  predicated_category_step
], evaluation_fn=evaluate)

categorizer.run(test_df)

Running step short_description...


100%|██████████| 5/5 [00:07<00:00,  1.48s/it]


Running step embedding_search...
Running step categorize...


100%|██████████| 5/5 [00:02<00:00,  1.79it/s]


Running step select_category...


100%|██████████| 5/5 [00:00<00:00, 8182.41it/s]


Unnamed: 0,name,description,category,brand.name,category_new,__short_description__,short_description,category1,category2,category3,category4,category5,__categorize__,category_index,predicted_category
0,EnGauge Deluxe Bedframe,Introducing the Engauge Deluxe Bedframe - the ...,Furniture > Beds & Headboards > Bedframes,,Beds & Headboards > Bedframes,"{'input_tokens': 313, 'output_tokens': 59, 'in...",Introducing the Engauge Deluxe Bedframe - the ...,Beds & Headboards > Bedframes,Mattresses & Box Springs > Mattresses,Mattresses & Box Springs > Box Springs & Found...,Beds & Headboards > Beds,Beds & Headboards > Headboards,"{'input_tokens': 205, 'output_tokens': 10, 'in...",1,Beds & Headboards > Bedframes
1,Sparrow & Wren Sullivan King Channel-Stitched ...,"85""L x 83""W x 56""H | Total weight: 150 lbs. | ...",Furniture > Beds & Headboards > Beds,Sparrow & Wren,Beds & Headboards > Beds,"{'input_tokens': 169, 'output_tokens': 50, 'in...",The Sparrow & Wren Sullivan King Channel-Stitc...,Beds & Headboards > Beds,Beds & Headboards > Headboards,Kids Beds & Headboards > Kid's Beds,Beds & Headboards > Bedframes,Sets > Bedroom Furniture Sets,"{'input_tokens': 191, 'output_tokens': 10, 'in...",1,Beds & Headboards > Beds
2,Queen Bed With Frame,Dimensions:Head Board -49H x 63.75W x 1.5DFoot...,Furniture > Beds & Headboards > Beds,Hillsdale,Beds & Headboards > Beds,"{'input_tokens': 124, 'output_tokens': 57, 'in...",Queen Bed With Frame featuring a head board wi...,Beds & Headboards > Bedframes,Beds & Headboards > Headboards,Beds & Headboards > Beds,Kids Beds & Headboards > Kid's Beds,Sets > Bedroom Furniture Sets,"{'input_tokens': 199, 'output_tokens': 10, 'in...",3,Beds & Headboards > Beds
3,Dylan Queen Bed,Add a touch of a modern farmhouse to your bedr...,Furniture > Beds & Headboards > Beds,,Beds & Headboards > Beds,"{'input_tokens': 140, 'output_tokens': 47, 'in...",Add a touch of modern farmhouse charm to your ...,Beds & Headboards > Headboards,Beds & Headboards > Beds,Beds & Headboards > Bedframes,Sets > Bedroom Furniture Sets,Kids Beds & Headboards > Kid's Beds,"{'input_tokens': 189, 'output_tokens': 10, 'in...",2,Beds & Headboards > Beds
4,Sparrow & Wren Mara Full Diamond-Tufted Bed,"78""L x 56""W x 51""H | Total weight: 130 lbs. | ...",Furniture > Beds & Headboards > Beds,Sparrow & Wren,Beds & Headboards > Beds,"{'input_tokens': 168, 'output_tokens': 72, 'in...",The Sparrow & Wren Mara Full Diamond-Tufted Be...,Beds & Headboards > Headboards,Beds & Headboards > Beds,Beds & Headboards > Bedframes,Mattresses & Box Springs > Mattresses,Kids Beds & Headboards > Kid's Beds,"{'input_tokens': 217, 'output_tokens': 10, 'in...",2,Beds & Headboards > Beds


Let's print our pipeline statistics and see how it's doing

In [None]:
print(categorizer.statistics)

+---------------+------------------------------+
|     score     |             1.0              |
+---------------+------------------------------+
|  input_tokens | {'gpt-3.5-turbo-0125': 1915} |
+---------------+------------------------------+
| output_tokens | {'gpt-3.5-turbo-0125': 335}  |
+---------------+------------------------------+
|   input_cost  |          $0.0009575          |
+---------------+------------------------------+
|  output_cost  |          $0.0005025          |
+---------------+------------------------------+
|  num_success  |              5               |
+---------------+------------------------------+
|  num_failure  |              0               |
+---------------+------------------------------+
| total_latency |      10.184797000139952      |
+---------------+------------------------------+


Our pipeline is doing well but that's only on 5 data points. Let's try it on a few more.

In [None]:
test_df100 = df.head(100).copy()
categorizer.run(test_df100)

Running step short_description...


100%|██████████| 100/100 [02:25<00:00,  1.45s/it]


Running step embedding_search...
Running step categorize...


100%|██████████| 100/100 [03:02<00:00,  1.83s/it]


Running step select_category...


100%|██████████| 100/100 [00:00<00:00, 26109.96it/s]


Unnamed: 0,name,description,category,brand.name,category_new,__short_description__,short_description,category1,category2,category3,category4,category5,__categorize__,category_index,predicted_category
0,EnGauge Deluxe Bedframe,Introducing the Engauge Deluxe Bedframe - the ...,Furniture > Beds & Headboards > Bedframes,,Beds & Headboards > Bedframes,"{'input_tokens': 313, 'output_tokens': 89, 'in...",The EnGauge Deluxe Bedframe is the ultimate so...,Beds & Headboards > Bedframes,Beds & Headboards > Beds,Beds & Headboards > Headboards,Mattresses & Box Springs > Mattresses,Mattresses & Box Springs > Box Springs & Found...,"{'input_tokens': 235, 'output_tokens': 10, 'in...",1,Beds & Headboards > Bedframes
1,Sparrow & Wren Sullivan King Channel-Stitched ...,"85""L x 83""W x 56""H | Total weight: 150 lbs. | ...",Furniture > Beds & Headboards > Beds,Sparrow & Wren,Beds & Headboards > Beds,"{'input_tokens': 169, 'output_tokens': 99, 'in...",The Sparrow & Wren Sullivan King Channel-Stitc...,Beds & Headboards > Beds,Beds & Headboards > Headboards,Beds & Headboards > Bedframes,Mattresses & Box Springs > Mattresses,Kids Beds & Headboards > Kid's Beds,"{'input_tokens': 244, 'output_tokens': 10, 'in...",1,Beds & Headboards > Beds
2,Queen Bed With Frame,Dimensions:Head Board -49H x 63.75W x 1.5DFoot...,Furniture > Beds & Headboards > Beds,Hillsdale,Beds & Headboards > Beds,"{'input_tokens': 124, 'output_tokens': 58, 'in...",The Queen Bed With Frame features a head board...,Beds & Headboards > Bedframes,Beds & Headboards > Beds,Beds & Headboards > Headboards,Kids Beds & Headboards > Kid's Beds,Sets > Bedroom Furniture Sets,"{'input_tokens': 200, 'output_tokens': 10, 'in...",1,Beds & Headboards > Bedframes
3,Dylan Queen Bed,Add a touch of a modern farmhouse to your bedr...,Furniture > Beds & Headboards > Beds,,Beds & Headboards > Beds,"{'input_tokens': 140, 'output_tokens': 41, 'in...",Add a touch of modern farmhouse charm to your ...,Beds & Headboards > Headboards,Beds & Headboards > Beds,Beds & Headboards > Bedframes,Sets > Bedroom Furniture Sets,Kids Beds & Headboards > Kid's Beds,"{'input_tokens': 183, 'output_tokens': 10, 'in...",2,Beds & Headboards > Beds
4,Sparrow & Wren Mara Full Diamond-Tufted Bed,"78""L x 56""W x 51""H | Total weight: 130 lbs. | ...",Furniture > Beds & Headboards > Beds,Sparrow & Wren,Beds & Headboards > Beds,"{'input_tokens': 168, 'output_tokens': 53, 'in...",The Sparrow & Wren Mara Full Diamond-Tufted Be...,Beds & Headboards > Headboards,Beds & Headboards > Beds,Beds & Headboards > Bedframes,Kids Beds & Headboards > Kid's Beds,Sets > Bedroom Furniture Sets,"{'input_tokens': 195, 'output_tokens': 10, 'in...",2,Beds & Headboards > Beds
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,Modway Melanie Tufted Button Upholstered Fabri...,"Twin | Clean lines, a straightforward profile,...",Furniture > Beds & Headboards > Beds,Modway,Beds & Headboards > Beds,"{'input_tokens': 225, 'output_tokens': 52, 'in...",The Modway Melanie Tufted Button Upholstered F...,Beds & Headboards > Headboards,Beds & Headboards > Beds,Sets > Bedroom Furniture Sets,Beds & Headboards > Bedframes,Kids Beds & Headboards > Kid's Beds,"{'input_tokens': 194, 'output_tokens': 10, 'in...",2,Beds & Headboards > Beds
96,Concord Queen Panel Bed,Looking for a new bed that has it all? Check o...,Furniture > Beds & Headboards > Beds,Daniel's Amish,Beds & Headboards > Beds,"{'input_tokens': 205, 'output_tokens': 49, 'in...",The Concord Queen Panel Bed features a contemp...,Beds & Headboards > Headboards,Beds & Headboards > Beds,Beds & Headboards > Bedframes,Kids Beds & Headboards > Kid's Beds,Sets > Bedroom Furniture Sets,"{'input_tokens': 191, 'output_tokens': 10, 'in...",2,Beds & Headboards > Beds
97,Sparrow & Wren Myers King Bed,"Dimensions: 85""L x 82""W x 56""H | Headboard hei...",Furniture > Beds & Headboards > Beds,Sparrow & Wren,Beds & Headboards > Beds,"{'input_tokens': 271, 'output_tokens': 73, 'in...",The Sparrow & Wren Myers King Bed is a luxurio...,Beds & Headboards > Beds,Beds & Headboards > Headboards,Beds & Headboards > Bedframes,Kids Beds & Headboards > Kid's Beds,Mattresses & Box Springs > Mattresses,"{'input_tokens': 218, 'output_tokens': 10, 'in...",1,Beds & Headboards > Beds
98,Loden Beige 3 Pc Queen Upholstered Bed with 2 ...,A classic design and sophisticated silhouette ...,Furniture > Beds & Headboards > Beds,Rooms To Go,Beds & Headboards > Beds,"{'input_tokens': 181, 'output_tokens': 62, 'in...",The Loden Beige 3 Pc Queen Upholstered Bed wit...,Beds & Headboards > Headboards,Beds & Headboards > Beds,Storage > Dressers,Beds & Headboards > Bedframes,Storage > Nightstands,"{'input_tokens': 198, 'output_tokens': 10, 'in...",2,Beds & Headboards > Beds


In [None]:
print(categorizer.statistics)

+---------------+-------------------------------+
|     score     |              0.9              |
+---------------+-------------------------------+
|  input_tokens | {'gpt-3.5-turbo-0125': 39918} |
+---------------+-------------------------------+
| output_tokens |  {'gpt-3.5-turbo-0125': 6888} |
+---------------+-------------------------------+
|   input_cost  |     $0.019959000000000005     |
+---------------+-------------------------------+
|  output_cost  |     $0.010332000000000001     |
+---------------+-------------------------------+
|  num_success  |              100              |
+---------------+-------------------------------+
|  num_failure  |               0               |
+---------------+-------------------------------+
| total_latency |       338.0330282483483       |
+---------------+-------------------------------+


At current gpt-3.5-turbo pricing this batch of 100 requests cost $0.030291 and took five minutes and a half minutes to run for 90% accuracy. Let's see how heiarchical prompting does. 

### Approach 2: Heiarchical prompting
Next we want to try forgoing embeddings all together and simply stuffing all of the categories into the prompt. There are too many categories to do this all in one go but we can use the fact that our categories are heiarchical and take a step by step approach.
1. LLM: given product name, description, and first level categories, pick the best one.
2. LLM: given product name, description, and second level categories, pick the best one.

We may want to iterate a bit on this process. For example, we may want to use one model in step 1 and a different model in step 2. `Superpipe` makes this type of hyperparameter tuning easy and robust.

In our first step we're just asking the model to pick the right top level category. This is a relatively easy task if the categories are non-overlapping or can be very difficult if there are multiple correct answers. We'll only know by trying and inspecting our losses.

In [None]:
first_level_categories = list(lookup_table.keys())

def first_level_category_prompt(row):
    i = len(first_level_categories)

    return f"""
    You are given a product name, description and {i} options for the product's top level category.
    Pick the index of the most accurate category.
    The index must be between 1 and {i}.
    Product description: {row['description']}
    Product name: {row['name']}
    Categories:
    {first_level_categories}
    """
    
class FirstLevelCategoryIndex(BaseModel):
    first_category_index: int = Field(description="The index of the most accurate first level category")
    
first_level_category_step = steps.LLMStructuredStep(
  prompt=first_level_category_prompt,
  model=models.gpt35,
  out_schema=FirstLevelCategoryIndex,
  name="first_categorize"
)

In [None]:
select_first_category_step = steps.CustomStep(
  transform=lambda row: first_level_categories[row["first_category_index"] - 1],
  name="predicted_first_category"
)

Next we'll give the second layer of the taxonomy to the model to classify. Just as before are trying to predict the index to make sure our final output is valid. 

In [None]:
def second_level_category_prompt(row):
    second_level_categories = list(lookup_table[row['predicted_first_category']])
    i = len(second_level_categories)

    return f"""
    You are given a product name, description, first level category 
    and {i} options for the product's second level category.
    Pick the index of the most accurate category.
    The index must be between 1 and {i}.
    Product description: {row['description']}
    Product name: {row['name']}
    First level category: {row['predicted_first_category']}
    Categories:
    {second_level_categories}
    """
    
class SecondLevelCategoryIndex(BaseModel):
    second_category_index: int = Field(description="The index of the most accurate second level category")
    
second_level_category_step = steps.LLMStructuredStep(
  prompt=second_level_category_prompt,
  model=models.gpt35,
  out_schema=SecondLevelCategoryIndex,
  name="second_categorize"
)

In [None]:
select_second_category_step = steps.CustomStep(
  transform=lambda row: list(lookup_table[row['predicted_first_category']])[row["second_category_index"] - 1],
  name="predicted_second_category"
)

Let's combine our results so we can properly compare to our ground truth column. 

In [None]:
combine_taxonomy_step = steps.CustomStep(
    transform=lambda row: f"{row['predicted_first_category']} > {row['predicted_second_category']}",
    name='combine_taxonomy'
)

In [None]:
test_df2 = df.head(5).copy()

evaluate2 = lambda row: row['predicted_taxonomy'].lower() == row['category_new'].lower()

categorizer_llm = pipeline.Pipeline([
  first_level_category_step, 
  select_first_category_step,
  second_level_category_step,
  select_second_category_step,
  combine_taxonomy_step
], evaluation_fn=evaluate2)

categorizer_llm.run(test_df2)

Running step first_categorize...


100%|██████████| 5/5 [00:03<00:00,  1.44it/s]


Running step select_first_category...


100%|██████████| 5/5 [00:00<00:00, 8771.02it/s]


Running step second_categorize...


100%|██████████| 5/5 [00:03<00:00,  1.59it/s]


Running step select_second_category...


100%|██████████| 5/5 [00:00<00:00, 6857.92it/s]


Running step combine_taxonomy...


100%|██████████| 5/5 [00:00<00:00, 6458.74it/s]


Unnamed: 0,name,description,category,brand.name,category_new,__first_categorize__,first_category_index,predicted_first_category,__second_categorize__,second_category_index,predicted_second_category,predicted_taxonomy
0,EnGauge Deluxe Bedframe,Introducing the Engauge Deluxe Bedframe - the ...,Furniture > Beds & Headboards > Bedframes,,Beds & Headboards > Bedframes,"{'input_tokens': 419, 'output_tokens': 11, 'in...",1,Beds & Headboards,"{'input_tokens': 372, 'output_tokens': 11, 'in...",3,Bedframes,Beds & Headboards > Bedframes
1,Sparrow & Wren Sullivan King Channel-Stitched ...,"85""L x 83""W x 56""H | Total weight: 150 lbs. | ...",Furniture > Beds & Headboards > Beds,Sparrow & Wren,Beds & Headboards > Beds,"{'input_tokens': 275, 'output_tokens': 11, 'in...",1,Beds & Headboards,"{'input_tokens': 228, 'output_tokens': 11, 'in...",1,Beds,Beds & Headboards > Beds
2,Queen Bed With Frame,Dimensions:Head Board -49H x 63.75W x 1.5DFoot...,Furniture > Beds & Headboards > Beds,Hillsdale,Beds & Headboards > Beds,"{'input_tokens': 230, 'output_tokens': 11, 'in...",1,Beds & Headboards,"{'input_tokens': 183, 'output_tokens': 11, 'in...",3,Bedframes,Beds & Headboards > Bedframes
3,Dylan Queen Bed,Add a touch of a modern farmhouse to your bedr...,Furniture > Beds & Headboards > Beds,,Beds & Headboards > Beds,"{'input_tokens': 246, 'output_tokens': 11, 'in...",1,Beds & Headboards,"{'input_tokens': 199, 'output_tokens': 11, 'in...",1,Beds,Beds & Headboards > Beds
4,Sparrow & Wren Mara Full Diamond-Tufted Bed,"78""L x 56""W x 51""H | Total weight: 130 lbs. | ...",Furniture > Beds & Headboards > Beds,Sparrow & Wren,Beds & Headboards > Beds,"{'input_tokens': 274, 'output_tokens': 11, 'in...",1,Beds & Headboards,"{'input_tokens': 227, 'output_tokens': 11, 'in...",1,Beds,Beds & Headboards > Beds


In [None]:
print(categorizer_llm.statistics)

+---------------+------------------------------+
|     score     |             0.8              |
+---------------+------------------------------+
|  input_tokens | {'gpt-3.5-turbo-0125': 2653} |
+---------------+------------------------------+
| output_tokens | {'gpt-3.5-turbo-0125': 110}  |
+---------------+------------------------------+
|   input_cost  |          $0.0013265          |
+---------------+------------------------------+
|  output_cost  |          $0.000165           |
+---------------+------------------------------+
|  num_success  |              5               |
+---------------+------------------------------+
|  num_failure  |              0               |
+---------------+------------------------------+
| total_latency |      6.587614875927102       |
+---------------+------------------------------+


It works, let's run it on some more data like we did before. 

In [None]:
test_df2_100 = df.head(100).copy()
categorizer_llm.run(test_df2_100)

Running step first_categorize...


100%|██████████| 100/100 [01:07<00:00,  1.48it/s]


Running step select_first_category...


100%|██████████| 100/100 [00:00<00:00, 40650.36it/s]


Running step second_categorize...


100%|██████████| 100/100 [01:50<00:00,  1.10s/it]


Running step select_second_category...


100%|██████████| 100/100 [00:00<00:00, 37134.17it/s]


Running step combine_taxonomy...


100%|██████████| 100/100 [00:00<00:00, 40784.75it/s]


Unnamed: 0,name,description,category,brand.name,category_new,__first_categorize__,first_category_index,predicted_first_category,__second_categorize__,second_category_index,predicted_second_category,predicted_taxonomy
0,EnGauge Deluxe Bedframe,Introducing the Engauge Deluxe Bedframe - the ...,Furniture > Beds & Headboards > Bedframes,,Beds & Headboards > Bedframes,"{'input_tokens': 419, 'output_tokens': 11, 'in...",1,Beds & Headboards,"{'input_tokens': 372, 'output_tokens': 11, 'in...",3,Bedframes,Beds & Headboards > Bedframes
1,Sparrow & Wren Sullivan King Channel-Stitched ...,"85""L x 83""W x 56""H | Total weight: 150 lbs. | ...",Furniture > Beds & Headboards > Beds,Sparrow & Wren,Beds & Headboards > Beds,"{'input_tokens': 275, 'output_tokens': 11, 'in...",1,Beds & Headboards,"{'input_tokens': 228, 'output_tokens': 11, 'in...",1,Beds,Beds & Headboards > Beds
2,Queen Bed With Frame,Dimensions:Head Board -49H x 63.75W x 1.5DFoot...,Furniture > Beds & Headboards > Beds,Hillsdale,Beds & Headboards > Beds,"{'input_tokens': 230, 'output_tokens': 11, 'in...",1,Beds & Headboards,"{'input_tokens': 183, 'output_tokens': 11, 'in...",1,Beds,Beds & Headboards > Beds
3,Dylan Queen Bed,Add a touch of a modern farmhouse to your bedr...,Furniture > Beds & Headboards > Beds,,Beds & Headboards > Beds,"{'input_tokens': 246, 'output_tokens': 11, 'in...",1,Beds & Headboards,"{'input_tokens': 199, 'output_tokens': 11, 'in...",1,Beds,Beds & Headboards > Beds
4,Sparrow & Wren Mara Full Diamond-Tufted Bed,"78""L x 56""W x 51""H | Total weight: 130 lbs. | ...",Furniture > Beds & Headboards > Beds,Sparrow & Wren,Beds & Headboards > Beds,"{'input_tokens': 274, 'output_tokens': 11, 'in...",1,Beds & Headboards,"{'input_tokens': 227, 'output_tokens': 11, 'in...",1,Beds,Beds & Headboards > Beds
...,...,...,...,...,...,...,...,...,...,...,...,...
95,Modway Melanie Tufted Button Upholstered Fabri...,"Twin | Clean lines, a straightforward profile,...",Furniture > Beds & Headboards > Beds,Modway,Beds & Headboards > Beds,"{'input_tokens': 331, 'output_tokens': 11, 'in...",1,Beds & Headboards,"{'input_tokens': 284, 'output_tokens': 11, 'in...",1,Beds,Beds & Headboards > Beds
96,Concord Queen Panel Bed,Looking for a new bed that has it all? Check o...,Furniture > Beds & Headboards > Beds,Daniel's Amish,Beds & Headboards > Beds,"{'input_tokens': 311, 'output_tokens': 11, 'in...",1,Beds & Headboards,"{'input_tokens': 264, 'output_tokens': 11, 'in...",1,Beds,Beds & Headboards > Beds
97,Sparrow & Wren Myers King Bed,"Dimensions: 85""L x 82""W x 56""H | Headboard hei...",Furniture > Beds & Headboards > Beds,Sparrow & Wren,Beds & Headboards > Beds,"{'input_tokens': 377, 'output_tokens': 11, 'in...",1,Beds & Headboards,"{'input_tokens': 330, 'output_tokens': 11, 'in...",1,Beds,Beds & Headboards > Beds
98,Loden Beige 3 Pc Queen Upholstered Bed with 2 ...,A classic design and sophisticated silhouette ...,Furniture > Beds & Headboards > Beds,Rooms To Go,Beds & Headboards > Beds,"{'input_tokens': 287, 'output_tokens': 11, 'in...",1,Beds & Headboards,"{'input_tokens': 240, 'output_tokens': 11, 'in...",1,Beds,Beds & Headboards > Beds


Let's compare approach 1 to approach 2. 

In [None]:
print(categorizer.statistics)
print(f"Total cost: ${categorizer.statistics.input_cost + categorizer.statistics.output_cost}")
print(categorizer_llm.statistics)
print(f"Total cost: ${categorizer_llm.statistics.input_cost + categorizer_llm.statistics.output_cost}")



+---------------+-------------------------------+
|     score     |              0.9              |
+---------------+-------------------------------+
|  input_tokens | {'gpt-3.5-turbo-0125': 39918} |
+---------------+-------------------------------+
| output_tokens |  {'gpt-3.5-turbo-0125': 6888} |
+---------------+-------------------------------+
|   input_cost  |     $0.019959000000000005     |
+---------------+-------------------------------+
|  output_cost  |     $0.010332000000000001     |
+---------------+-------------------------------+
|  num_success  |              100              |
+---------------+-------------------------------+
|  num_failure  |               0               |
+---------------+-------------------------------+
| total_latency |       338.0330282483483       |
+---------------+-------------------------------+
Total cost: $0.030291000000000005
+---------------+-------------------------------+
|     score     |              0.94             |
+---------------

Our heiarchical approach cost just a bit more at $0.032814 / 100 rows. It was much faster and seemed to perform better on accuracy as well. However, we're not done just yet. The power of `SuperPipe` is that we can easily try many different permuations of our pipeline using a grid search. There might be a better pipeline out there.

## Grid search

Our first pipeline has three steps we want to search over.
1. Short description: vary the model
2. Embedding search: vary the number of results
3. Categorize: vary the model

It's not clear which permutation will work the best so we'll try all of them.

In [None]:
from superpipe import grid_search

params_grid = {
    short_description_step.name: {
        'model': [models.gpt35, models.gpt4], 
    },
    embedding_search_step.name: {
        'k': [3, 5, 7],  
    },
    categorize_step.name: {
        'model': [models.gpt35, models.gpt4], 
    },
}

small_df = df.head(30).copy()


search_embeddings = grid_search.GridSearch(categorizer, params_grid)
search_embeddings.run(small_df)

Iteration 1 of 12
Params:  {'short_description': {'model': 'gpt-3.5-turbo-0125'}, 'embedding_search': {'k': 3}, 'categorize': {'model': 'gpt-3.5-turbo-0125'}}
Result:  {'short_description__model': 'gpt-3.5-turbo-0125', 'embedding_search__k': 3, 'categorize__model': 'gpt-3.5-turbo-0125', 'score': 0.8333333333333334, 'input_tokens': defaultdict(<class 'int'>, {'gpt-3.5-turbo-0125': 11315}), 'output_tokens': defaultdict(<class 'int'>, {'gpt-3.5-turbo-0125': 2108}), 'input_cost': 0.005657499999999999, 'output_cost': 0.003162, 'num_success': 30, 'num_failure': 0, 'total_latency': 103.46415858116234, 'index': -7791233023527820859}
Iteration 2 of 12
Params:  {'short_description': {'model': 'gpt-3.5-turbo-0125'}, 'embedding_search': {'k': 3}, 'categorize': {'model': 'gpt-4-turbo-preview'}}
Result:  {'short_description__model': 'gpt-3.5-turbo-0125', 'embedding_search__k': 3, 'categorize__model': 'gpt-4-turbo-preview', 'score': 0.9333333333333333, 'input_tokens': defaultdict(<class 'int'>, {'gpt

Unnamed: 0,short_description__model,embedding_search__k,categorize__model,score,input_tokens,output_tokens,input_cost,output_cost,num_success,num_failure,total_latency,index
0,gpt-3.5-turbo-0125,3,gpt-3.5-turbo-0125,0.833333,{'gpt-3.5-turbo-0125': 11315},{'gpt-3.5-turbo-0125': 2108},0.005657,0.003162,30,0,103.464159,-7791233023527820859
1,gpt-3.5-turbo-0125,3,gpt-4-turbo-preview,0.933333,"{'gpt-3.5-turbo-0125': 5852, 'gpt-4-turbo-prev...","{'gpt-3.5-turbo-0125': 1837, 'gpt-4-turbo-prev...",0.057896,0.011756,30,0,82.123847,-1229872059569985205
2,gpt-3.5-turbo-0125,5,gpt-3.5-turbo-0125,0.9,{'gpt-3.5-turbo-0125': 11824},{'gpt-3.5-turbo-0125': 1998},0.005912,0.002997,30,0,60.67743,-2156008638839003309
3,gpt-3.5-turbo-0125,5,gpt-4-turbo-preview,0.966667,"{'gpt-3.5-turbo-0125': 5852, 'gpt-4-turbo-prev...","{'gpt-3.5-turbo-0125': 1792, 'gpt-4-turbo-prev...",0.063456,0.011688,30,0,85.082716,-373516568509500608
4,gpt-3.5-turbo-0125,7,gpt-3.5-turbo-0125,0.9,{'gpt-3.5-turbo-0125': 12575},{'gpt-3.5-turbo-0125': 2141},0.006287,0.003211,30,0,149.574122,5513717612912975259
5,gpt-3.5-turbo-0125,7,gpt-4-turbo-preview,0.966667,"{'gpt-3.5-turbo-0125': 5852, 'gpt-4-turbo-prev...","{'gpt-3.5-turbo-0125': 1733, 'gpt-4-turbo-prev...",0.069126,0.011599,30,0,78.444735,2766483574959374285
6,gpt-4-turbo-preview,3,gpt-3.5-turbo-0125,0.866667,"{'gpt-4-turbo-preview': 5852, 'gpt-3.5-turbo-0...","{'gpt-4-turbo-preview': 1836, 'gpt-3.5-turbo-0...",0.06126,0.055532,30,0,138.30416,7602228094953899657
7,gpt-4-turbo-preview,3,gpt-4-turbo-preview,0.866667,{'gpt-4-turbo-preview': 11298},{'gpt-4-turbo-preview': 2095},0.11298,0.06285,30,0,164.999652,-6892174709507839108
8,gpt-4-turbo-preview,5,gpt-3.5-turbo-0125,0.866667,"{'gpt-4-turbo-preview': 5852, 'gpt-3.5-turbo-0...","{'gpt-4-turbo-preview': 1803, 'gpt-3.5-turbo-0...",0.061548,0.054541,30,0,140.513508,-8924542522527535100
9,gpt-4-turbo-preview,5,gpt-4-turbo-preview,0.966667,{'gpt-4-turbo-preview': 11977},{'gpt-4-turbo-preview': 2158},0.11977,0.06474,30,0,178.206688,-9078237607708088845


The results of our grid search are conveniently put into a dataframe for us to review.

Its seems that GPT-3.5 is more than sufficient for our description step and that 5 embeddings results is as well. For the last step, we have a cost/latency vs. accuracy tradeoff we need to make between the two models. 

This search was only run on 30 rows so we'd want to run it more extensively before making decisions for production but at least now we can reasonably confidently narrow down our search space. 

Let's do the same for our heiarchical prompting approach. This time we'll just vary the model selection for each step. 

In [None]:
categorizer_llm.update_params({first_level_category_step})

In [None]:
params_grid = {
    first_level_category_step.name: {
        'model': [models.gpt35, models.gpt4],  
    },
    second_level_category_step.name: {
        'model': [models.gpt35, models.gpt4],  
    },
}

small_df2 = df.head(30).copy()

search_llm = grid_search.GridSearch(categorizer_llm, params_grid)
search_llm.run(small_df2)

Iteration 1 of 4
Params:  {'first_categorize': {'model': 'gpt-3.5-turbo-0125'}, 'second_categorize': {'model': 'gpt-3.5-turbo-0125'}}
Result:  {'first_categorize__model': 'gpt-3.5-turbo-0125', 'second_categorize__model': 'gpt-3.5-turbo-0125', 'score': 0.9, 'input_tokens': defaultdict(<class 'int'>, {'gpt-3.5-turbo-0125': 16648}), 'output_tokens': defaultdict(<class 'int'>, {'gpt-3.5-turbo-0125': 660}), 'input_cost': 0.008323999999999998, 'output_cost': 0.0009900000000000004, 'num_success': 30, 'num_failure': 0, 'total_latency': 32.440515251946636, 'index': 7093490588454389251}
Iteration 2 of 4
Params:  {'first_categorize': {'model': 'gpt-3.5-turbo-0125'}, 'second_categorize': {'model': 'gpt-4-turbo-preview'}}
Result:  {'first_categorize__model': 'gpt-3.5-turbo-0125', 'second_categorize__model': 'gpt-4-turbo-preview', 'score': 0.9333333333333333, 'input_tokens': defaultdict(<class 'int'>, {'gpt-3.5-turbo-0125': 9032, 'gpt-4-turbo-preview': 7616}), 'output_tokens': defaultdict(<class 'in

Unnamed: 0,first_categorize__model,second_categorize__model,score,input_tokens,output_tokens,input_cost,output_cost,num_success,num_failure,total_latency,index
0,gpt-3.5-turbo-0125,gpt-3.5-turbo-0125,0.9,{'gpt-3.5-turbo-0125': 16648},{'gpt-3.5-turbo-0125': 660},0.008324,0.00099,30,0,32.440515,7093490588454389251
1,gpt-3.5-turbo-0125,gpt-4-turbo-preview,0.933333,"{'gpt-3.5-turbo-0125': 9032, 'gpt-4-turbo-prev...","{'gpt-3.5-turbo-0125': 330, 'gpt-4-turbo-previ...",0.080676,0.010395,30,0,55.998379,6690483959441912481
2,gpt-4-turbo-preview,gpt-3.5-turbo-0125,0.866667,"{'gpt-4-turbo-preview': 9032, 'gpt-3.5-turbo-0...","{'gpt-4-turbo-preview': 330, 'gpt-3.5-turbo-01...",0.094135,0.010395,30,0,88.018512,6375515903791300472
3,gpt-4-turbo-preview,gpt-4-turbo-preview,0.9,{'gpt-4-turbo-preview': 16663},{'gpt-4-turbo-preview': 660},0.16663,0.0198,30,0,84.081201,6691335389976999983


These results highlight the importance of experimentation and optimization. As we can see, the GPT-3.5 + GPT-4 heiarchical pipleine performs the best with relatively low latency with the GPT-3.5 only aproach performing about as well as the GPT-3.5 only + 5 embedding approach. 

If we only care about accuracy, it looks like an embeddings based approach is our best bet. However, we may have other considerations. We're faced with a cost, accuracy, and latency tradeoff with no clear "best" option. Depending on what metric we care we'll choose a different approach. This is a decision we're now empowered to make with our Superpipe pipeline results. 