<a href="https://colab.research.google.com/github/yongsa-nut/SF323_CN408_AIEngineer/blob/main/HW11.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# HW11: Recipe Chatbot Evals (5 points)

Adapted from https://github.com/ai-evals-course/recipe-chatbot/tree/main

## Part 1. Write a System Prompt for Thai Recipe Chatbot (0.5 points)

### **Important Note**: Bot ต้องตอบเป็นภาษาไทย

---
  *   **Define the Bot's Role & Objective**: Clearly state what the bot is. (e.g., "You are a friendly and creative culinary assistant specializing in suggesting easy-to-follow recipes.")
  *   **Instructions & Response Rules**: Be specific.
      *   What should it *always* do? (e.g., "Always provide ingredient lists with precise measurements using standard units.", "Always include clear, step-by-step instructions.")
      *   What should it *never* do? (e.g., "Never suggest recipes that require extremely rare or unobtainable ingredients without providing readily available alternatives.", "Never use offensive or derogatory language.")
      *   Safety Clause: (e.g., "If a user asks for a recipe that is unsafe, unethical, or promotes harmful activities, politely decline and state you cannot fulfill that request, without being preachy.")
  *   **LLM Agency – How Much Freedom?**:
      *   Define its creativity level. (e.g., "Feel free to suggest common variations or substitutions for ingredients. If a direct recipe isn't found, you can creatively combine elements from known recipes, clearly stating if it's a novel suggestion.")
      *   Should it stick strictly to known recipes or invent new ones if appropriate? (Be explicit).
  *   **Output Formatting (Crucial for a good user experience)**:
      *   "Structure all your recipe responses clearly using Markdown for formatting."
      *   "Begin every recipe response with the recipe name as a Level 2 Heading (e.g., `## Amazing Blueberry Muffins`)."
      *   "Immediately follow with a brief, enticing description of the dish (1-3 sentences)."
      *   "Next, include a section titled `### Ingredients`. List all ingredients using a Markdown unordered list (bullet points)."
      *   "Following ingredients, include a section titled `### Instructions`. Provide step-by-step directions using a Markdown ordered list (numbered steps)."
      *   "Optionally, if relevant, add a `### Notes`, `### Tips`, or `### Variations` section for extra advice or alternatives."
  *   **ตัวอย่างของ Markdown structure for a recipe response** (ของคุณต้องเป็นภาษาไทย):

        ```markdown
        ## Golden Pan-Fried Salmon

        A quick and delicious way to prepare salmon with a crispy skin and moist interior, perfect for a weeknight dinner.

        ### Ingredients
        * 2 salmon fillets (approx. 6oz each, skin-on)
        * 1 tbsp olive oil
        * Salt, to taste
        * Black pepper, to taste
        * 1 lemon, cut into wedges (for serving)

        ### Instructions
        1. Pat the salmon fillets completely dry with a paper towel, especially the skin.
        2. Season both sides of the salmon with salt and pepper.
        3. Heat olive oil in a non-stick skillet over medium-high heat until shimmering.
        4. Place salmon fillets skin-side down in the hot pan.
        5. Cook for 4-6 minutes on the skin side, pressing down gently with a spatula for the first minute to ensure crispy skin.
        6. Flip the salmon and cook for another 2-4 minutes on the flesh side, or until cooked through to your liking.
        7. Serve immediately with lemon wedges.

        ### Tips
        * For extra flavor, add a clove of garlic (smashed) and a sprig of rosemary to the pan while cooking.
        * Ensure the pan is hot before adding the salmon for the best sear.
        ```

In [None]:
system_prompt = "เติมคำตอบของคุณที่นี้" # Fill in your answer here

## Part 2. Logging with LangFuse (0.5 points)

- สำหรับการบ้านนี้เราจะ Log traces ด้วย Langfuse
- **Importance**: ทำเสร็จแล้วให้แนบ sceenshot ของ หน้าหลักของ traces ทั้งหมดมาด้วย (ไม่ต้องโชว์แต่ละtrace)

In [None]:
!pip install langfuse

In [None]:
from google.colab import userdata
from langfuse.openai import openai
import os

# Get keys for your project from the project settings page: https://cloud.langfuse.com
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-..."   # เติม secret key ของ LangFuse
os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-..."   # เติม secret key ของ LangFuse
os.environ["LANGFUSE_BASE_URL"] = "https://cloud.langfuse.com" # 🇪🇺 EU region
# os.environ["LANGFUSE_BASE_URL"] = "https://us.cloud.langfuse.com" # 🇺🇸 US region

# Your openai key
os.environ["OPENAI_API_KEY"] = userdata.get('openrouter')  # เติม openrouter key ของคุณ

In [None]:
@observe()
def generate(query):
    client = openai.OpenAI(
        base_url="https://openrouter.ai/api/v1",
    )
    completion = client.chat.completions.create(
        name="hw11",
        model="google/gemini-2.5-flash-lite-preview-09-2025",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": query}],
        temperature=0,
        metadata={"langfuse_session_id": "session_1"}
    )
    return completion.choices[0].message.content

In [None]:
# Testing your system prompt
generate('อยากได้วิธีทำกระเพราะไก่')

## Part 3. Define Dimensions & Generate Initial Queries (2 points)

1. **Identify Key Dimensions**: (i.e., key aspects or variables of user inputs you'll use to generate diverse test queries, such as `cuisine_type`, `dietary_restriction`, or `meal_type` for your recipe bot)

  - Identify 3-4 key dimensions relevant to your Recipe Bot's functionality and potential user inputs.
  - For each dimension, list at least 3 example values.

2. **Generate Unique Combinations (Tuples)**:

  - Write a prompt for a Large Language Model (LLM) to generate 15-20 unique combinations (tuples) of these dimension values.

3. **Generate Natural Language User Queries**:

  - Write a second prompt for an LLM to take 5-7 of the generated tuples and create a natural language user query for your Recipe Bot for each selected tuple.
  
  - Review these generated queries to ensure they are realistic and representative of how a user might interact with your bot.

## Part 4. Initial Error Analysis (2 points)

1. **Run Bot on Synthetic Queries**:

  - Execute your Recipe Bot using the synthetic queries generated in Part 1.
  -  Record the full interaction traces for each query.


2. **Open Coding**: (an initial analysis step where you review interaction traces, assigning descriptive labels/notes to identify patterns and potential errors without preconceived categories, as detailed in Sec 3.2 of the provided chapter)

  - Review the recorded traces.
  - Perform open coding to identify initial themes, patterns, and potential errors or areas for improvement in the bot's responses.

3. **Axial Coding & Taxonomy Definition**: (a follow-up step where you group the initial open codes into broader, structured categories or 'failure modes' to build an error taxonomy, as described in Sec 3.3 of the provided chapter)

  - Group the observations from open coding into broader categories or failure modes.
  - For each identified failure mode, create a clear and concise taxonomy. This should include:
     - A clear Title for the failure mode.
     - A concise one-sentence Definition explaining the failure mode.
     - 1-2 Illustrative Examples taken directly from your bot's behavior during the tests. If a failure mode is plausible but not directly observed, you can provide a well-reasoned hypothetical example.