In [33]:
from textwrap import dedent

# Homework 2: Recipe Bot Error Analysis

This notebook walks through the complete error analysis process for a Recipe Bot. We'll identify failure modes, generate test queries, and analyze bot responses to build a taxonomy of errors.

**Note:** This uses the pre-existing queries and bot responses in `results_20250518_215844.csv` as our data source.


In [34]:
import pandas as pd
import json
import matplotlib.pyplot as plt
from collections import Counter
import random
import httpx

# Load the data
synthetic_queries = pd.read_csv('synthetic_queries_for_analysis.csv')
bot_results = pd.read_csv('results_20250518_215844.csv')

print(f"Loaded {len(synthetic_queries)} synthetic queries")
print(f"Loaded {len(bot_results)} bot responses")


Loaded 250 synthetic queries
Loaded 250 bot responses


In [35]:
from claudette import models, Client
model = models[1]
c = Client(model)
model

'claude-sonnet-4-20250514'

## Part 1: Define Dimensions & Generate Initial Queries

### Identify Key Dimensions



1. **Dietary Restrictions**: What dietary limits does the user have?
   - Low Carb
   - Keto
   - No Seafood

2. **What For**: What is the meal for?
   - Potluck
   - Dinner party
   - Cooking for the family
   - Snacks

3. **Time Available**: How much time do they have?
   - Under 15
   - 30 minutes
   - 1 hour

4. **Ingredient Base**:
   - Beans
   - Pasta
   - Ground Beef

5. **Meal Time**:
   - Breakfast
   - Lunch
   - Dinner


### Generate Unique Combinations

In [36]:
prompt = dedent('''\
    I am designing a Recipe Bot and want to test it with a diverse set of user scenarios. Please generate 20 unique combinations (tuples) using the following key dimensions and their possible values:

    - Dietary Restrictions: Low Carb, Keto, No Seafood
    - What For: Potluck, Dinner party, Cooking for the family, Snacks
    - Time Available: Under 15, 30 minutes, 1 hour
    - Ingredient Base: Beans, Pasta, Ground Beef
    - Meal Time: Breakfast, Lunch, Dinner

    Each combination should select one value from each dimension. Present the results as a list of tuples, where each tuple contains one value for each dimension in the following order: (Dietary Restrictions, What For, Time Available, Ingredient Base, Meal Time). Ensure that the combinations are varied and realistic.''')


In [16]:
# c(prompt)

In [18]:
dimension_examples = (
    # Beans aren't keto, should bot offer alternative?
    ('Keto', 'Snacks', '1 hour', 'Beans', 'Lunch'),
    ('Low Carb', 'Dinner party', '1 hour', 'Ground Beef', 'Dinner'),
    # Fairly quick thing for family using an ingredient that's easy to get
    ('Keto', 'Cooking for the family', '30 minutes', 'Ground Beef', 'Lunch'),
    # Pasta seems good for potlucks since you can make a lot of it
    ('No Seafood', 'Potluck', '1 hour', 'Pasta', 'Dinner'), 
    # Nice for protein snack like lettuce cups
    ('Low Carb', 'Snacks', 'Under 15', 'Ground Beef', 'Lunch'),
    # Often I have beans on hand and want to use them for something
    ('Keto', 'Cooking for the family', 'Under 15', 'Beans', 'Breakfast'),
    ('No Seafood', 'Dinner party', '30 minutes', 'Pasta', 'Dinner'),
    ('Low Carb', 'Cooking for the family', '1 hour', 'Beans', 'Dinner'),
    ('Keto', 'Potluck', '30 minutes', 'Ground Beef', 'Dinner'),
    ('No Seafood', 'Snacks', 'Under 15', 'Beans', 'Lunch'),
    ('Low Carb', 'Potluck', '30 minutes', 'Ground Beef', 'Lunch'),
    ('No Seafood', 'Cooking for the family', '1 hour', 'Beans', 'Dinner'),
    ('Low Carb', 'Snacks', '30 minutes', 'Beans', 'Breakfast'),
    ('Low Carb', 'Dinner party', '30 minutes', 'Beans', 'Lunch'),
    ('Keto', 'Cooking for the family', '1 hour', 'Pasta', 'Dinner'),
    ('No Seafood', 'Snacks', '30 minutes', 'Ground Beef', 'Breakfast'),
    ('Low Carb', 'Cooking for the family', 'Under 15', 'Pasta', 'Breakfast'),
    # Breakfast potluck and dinner party?  Who does that?
    # ('Keto', 'Potluck', '1 hour', 'Beans', 'Breakfast')
    # ('No Seafood', 'Potluck', 'Under 15', 'Pasta', 'Breakfast'),
    # ('Keto', 'Dinner party', 'Under 15', 'Ground Beef', 'Breakfast'),

)

### Generate Nature Language Queries

In [29]:

followup_prompt = dedent('''\
   Convert these dimension combinations into realistic user queries for a recipe bot. Create natural, conversational queries that reflect how real users would interact in chat interfaces like Discord or ChatGPT. Include variations in:
   - Writing style (formal vs casual)
   - Sentence structure (complete vs incomplete)
   - Common typos and informal grammar
   - Natural language patterns
   - Realistic context and urgency
                         
    Include only 1 example per `dimension_example`.
                         
    <dimension_examples>
    {dimension_examples}
    </dimension_examples>''')

In [37]:
# random.seed(42)
# dimension_samples_for_nlp = random.sample(dimension_examples, 7)
# dimension_samples_for_nlp

In [32]:
# c(followup_prompt.format(dimension_examples=dimension_samples_for_nlp))

**1. No Seafood + Potluck + 1 hour + Pasta + Dinner**

Hey! Need help with a pasta dish for tonight's potluck dinner - something that takes about an hour to make and NO seafood please (allergies in the group). Any ideas?

**2. Keto + Snacks + 1 hour + Beans + Lunch**

can i make keto lunch snacks with beans? have about an hour to prep

**3. No Seafood + Family + 1 hour + Beans + Dinner**

Looking for a family-friendly bean dinner recipe that I can prepare within an hour. Please ensure it contains no seafood as my youngest is allergic.

**4. Low Carb + Snacks + Under 15 + Ground Beef + Lunch**

quick low carb ground beef snack for lunch?? need something in 15 min or less

**5. Low Carb + Family + Under 15 + Pasta + Breakfast**

weird request but need low carb pasta breakfast for the fam in under 15 mins - running late for school!

**6. Low Carb + Snacks + 30 minutes + Beans + Breakfast**

Morning! Could you suggest a low-carb bean snack that works for breakfast? I have about 30 minutes to spare.

**7. Keto + Family + 30 minutes + Ground Beef + Lunch**

family keto lunch with ground beef - 30 min max, kids are getting hangry lol

In [41]:
test_messages = [
    '''Hey! Need help with a pasta dish for tonight's potluck dinner - something that takes about an hour to make and NO seafood please (allergies in the group). Any ideas?''',
    '''can i make keto lunch snacks with beans? have about an hour to prep''',
    '''Looking for a family-friendly bean dinner recipe that I can prepare within an hour. Please ensure it contains no seafood as my youngest is allergic.''',
    '''quick low carb ground beef snack for lunch?? need something in 15 min or less''',
    '''weird request but need low carb pasta breakfast for the fam in under 15 mins - running late for school!''',
    '''Morning! Could you suggest a low-carb bean snack that works for breakfast? I have about 30 minutes to spare.''',
    '''family keto lunch with ground beef - 30 min max, kids are getting hangry lol''']

## Part 2: Initial Error Analysis

### Run bot on synthetic queries

I decided at this point to implement automated tracing.  Copying and pasting from the UI felt annoying and I didn't want to do that.  So I felt like I had 2 main options:

1. Implement functions that can call the backend programatically
2. Implement automated tracing

I opted for option #2 because I wanted to be a user of my product more, and did not want to fully automate away the experience of using the actual application.

So I implemented the simplest tracing mechanism I could think of to start with.  Saving JSON files to disk.


```python
    traces_dir = Path(__file__).parent.parent / "annotation" / "traces"
    traces_dir.mkdir(parents=True, exist_ok=True)
    ts = datetime.datetime.now().strftime("%Y%m%d_%H%M%S_%f")
    trace_path = traces_dir / f"trace_{ts}.json"
    with open(trace_path, "w") as f:
        json.dump({
            "request": payload.model_dump(),
            "response": response.model_dump()
        }, f)
```

I took each of the synthetic queries and ran them through the app to generate the traces.  I then copied them into a `golden_dataset` folder which is what ill use for my open coding dataset for this excersize.

### Open Coding

To do open coding I opted to create an annotation app with fasthtml.  You can see it in `annotation/annotation.py` and run it with `python annotation.py`.  This reads the json files from the `golden_dataset` folder directly, and then saves any of my open coding notes back in the json file.  I only solved for open coding first.

![](imgs/open_coding_dashboard.png)

![](imgs/open_coding_notes.png)

UX things I noticed along the way I will improve over time:
- Kinda annoying not to have a next button and have to go back to the dashboard
- Dashboard needs some indication as to what's been done so when I come back to it it's not lost

I adressed this by using an href for next and previous, and added a single emoji for it open coding was done.  I then extended it to give 2 emojis if both open coding and axial coding was done.

![](./imgs/NewDashboard.png)

### Axial Coding and Taxonomy Definition

I then went through and did axial coding.  I did this by adding MonsterUI's insertable select and saving things back to json.

The insertable select saves to the json as well and lets you search and add new codes as you go if one doesne exist

Findings:

- I failure modes had just 1 or 2 traces in them.  This tells me that I probably have not seen all the failure modes and have not reached saturation.  I need to do more
- Maybe the original instruction for no follow up quesetions was bad.  If someone asks for keto + beans it's impossible to comply with both, and seems like in that case it makes sense to have a follow up question.