<a href="https://colab.research.google.com/github/yongsa-nut/SF323_CN408_AIEngineer/blob/main/SF323_CN408_Lecture_3_Demo_(updated).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lecture 3 Demo

In [None]:
from google.colab import userdata
from openai import OpenAI

client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key=userdata.get('openrouter'),
)

In [None]:
# Check that you successfully connect to vertex
def gen_answer(prompt, model='google/gemini-2.5-flash-lite'):
    response = client.chat.completions.create(
      model=model,
      messages=[
          {"role": "user", "content": prompt}
      ],
      temperature=0
    )

    return response.choices[0].message.content

gen_answer('Hello')

'Hello there! How can I help you today?'

# Lecture 3 Demo

**Outlines**:
- Create an Eval Dataset and LLM-as-a-judge
- Draft a prompt
- Feed through the model
- Feed through a grader
- Change Prompt and repeat


## Task: A Simple Recipe Genreator

**Input**:
```python
inputs_spec = {
    'dietary_restrictions': 'Food limitations to follow (e.g., vegetarian, gluten-free, or none)',
    'cooking_theme': 'Cuisine style or type (e.g., Italian, Asian, comfort food)',
    'ingredients': 'List of 2-4 specific ingredients that must be used',
    'time': 'Maximum cooking time in minutes (e.g., 15, 30, or 45)'
}
```

**Output**:
```
[RECIPE TITLE]

TIME: [X minutes]

INGREDIENTS:
- [Ingredient 1 - amount]
- [Ingredient 2 - amount]
- [Ingredient 3 - amount]
[Maximum 8 ingredients total]

INSTRUCTIONS:
1. [Action verb + specific details]
2. [Action verb + specific details]
3. [Action verb + specific details]
[Maximum 8 steps]
```
**Example**:
```
Quick Asian Garlic Chicken Stir-Fry

TIME: 15 minutes

INGREDIENTS:
- Chicken breast - 1 lb, cubed
- White rice - 2 cups, cooked
- Garlic - 4 cloves, minced
- Soy sauce - 3 tablespoons
- Vegetable oil - 2 tablespoons
- Green onions - 2 stalks, sliced
- Sesame oil - 1 teaspoon

INSTRUCTIONS:
1. Heat vegetable oil in wok over high heat until shimmering (about 1 minute).
2. Add cubed chicken and cook for 3-4 minutes until golden brown on all sides.
3. Push chicken to sides and add minced garlic to center, stir-fry for 30 seconds until fragrant.
4. Pour soy sauce over chicken and toss everything for 1 minute until well-coated.
5. Add cooked rice and stir-fry for 2 minutes until heated through.
6. Drizzle sesame oil and garnish with sliced green onions before serving.
```

In [None]:
# A simple dataset - LLM generated
test_cases = [
    {
        'dietary_restrictions': 'vegetarian',
        'cooking_theme': 'Italian',
        'ingredients': ['tomatoes', 'pasta'],
        'time': '30 minutes'
    },
    {
        'dietary_restrictions': 'none',
        'cooking_theme': 'Thai',
        'ingredients': ['chicken', 'rice'],
        'time': '15 minutes'
    },
    {
        'dietary_restrictions': 'none',
        'cooking_theme': 'Italian',
        'ingredients': ['shrimp', 'garlic', 'olive oil'],
        'time': '15 minutes'
    },
    {
        'dietary_restrictions': 'vegan',
        'cooking_theme': 'French pastry',
        'ingredients': ['flour', 'coconut oil', 'maple syrup'],
        'time': '20 minutes'
    },
    {
        'dietary_restrictions': 'paleo',
        'cooking_theme': 'Japanese breakfast',
        'ingredients': ['eggs', 'mushrooms', 'spinach', 'avocado'],
        'time': '25 minutes'
    }
]

## LLM-as-a-judge

- Adapted from https://anthropic.skilljar.com/claude-with-the-anthropic-api/287745

In [None]:
eval_prompt = """Your task is to evaluate the following AI-generated recipe with EXTREME RIGOR.

Original task description:
<task_description>
Generate a single-serving recipe based on the given dietary restrictions, cooking theme, ingredients, and time constraint. The recipe must follow a specific format with sections for Title, Time, Ingredients, and Instructions.
</task_description>

Original task inputs:
<task_inputs>
{prompt_inputs}
</task_inputs>

Solution to Evaluate:
<solution>
{output}
</solution>

Criteria you should use to evaluate the solution:
<criteria>
MANDATORY REQUIREMENTS (violation = score 3 or lower):
1. Section compliance: Must have exactly these sections in order: Title, TIME, INGREDIENTS, INSTRUCTIONS.
2. Ingredient usage: ALL specified ingredients from the input must be used in the recipe
3. Dietary compliance: Must strictly follow dietary restrictions (no meat in vegetarian, no gluten ingredients in gluten-free)

SECONDARY CRITERIA:
4. Time feasibility: The recipe can realistically be completed in the stated time by an average home cook
5. Theme authenticity: The recipe genuinely reflects the specified cuisine theme (Italian/Asian/comfort food)
6. Instruction clarity: Each step starts with an action verb, includes specific details (time/temperature/visual cues), and contains one clear action. No vague terms like "cook until done"
7. Format fules:
  - Title should be plain text (no special characters or markers)
  - TIME: must be followed by the number and "minutes"
  - INGREDIENTS: must use bullet points (•) with "ingredient - amount" format
  - INSTRUCTIONS: must use numbered list (1. 2. 3. etc)
  - NO other sections allowed
8. Length control: Maximum 8 ingredients, maximum 6 instruction steps, no extra sections/stories/tips

<format_example>
Quick Asian Garlic Chicken Stir-Fry

TIME: 15 minutes

INGREDIENTS:
- Chicken breast - 1 lb, cubed
- White rice - 2 cups, cooked
- Garlic - 4 cloves, minced
- Soy sauce - 3 tablespoons
- Vegetable oil - 2 tablespoons
- Green onions - 2 stalks, sliced
- Sesame oil - 1 teaspoon

INSTRUCTIONS:
1. Heat vegetable oil in wok over high heat until shimmering (about 1 minute).
2. Add cubed chicken and cook for 3-4 minutes until golden brown on all sides.
3. Push chicken to sides and add minced garlic to center, stir-fry for 30 seconds until fragrant.
4. Pour soy sauce over chicken and toss everything for 1 minute until well-coated.
5. Add cooked rice and stir-fry for 2 minutes until heated through.
6. Drizzle sesame oil and garnish with sliced green onions before serving.
</format_example>

</criteria>

Scoring Guidelines:
* Score 1-3: Solution fails to meet one or more MANDATORY requirements
* Score 4-6: Solution meets all mandatory requirements but has significant deficiencies in secondary criteria
* Score 7-8: Solution meets all mandatory requirements and most secondary criteria, with minor issues
* Score 9-10: Solution meets all mandatory and secondary criteria

IMPORTANT SCORING INSTRUCTIONS:
* Grade the output based ONLY on the listed criteria. Do not add your own extra requirements.
* If a solution meets all of the mandatory and secondary criteria give it a 10
* Don't complain that the solution "only" meets the mandatory and secondary criteria. Solutions shouldn't go above and beyond - they should meet the exact listed criteria.
* ANY violation of a mandatory requirement MUST result in a score of 3 or lower
* The full 1-10 scale should be utilized - don't hesitate to give low scores when warranted

Output Format
Provide your evaluation as a structured JSON object with the following fields, in this specific order:
- "strengths": An array of 1-3 key strengths
- "weaknesses": An array of 1-3 key areas for improvement
- "reasoning": A concise explanation of your overall assessment
- "score": A number between 1-10

Respond with JSON. Keep your response concise and direct.
Example response shape:
{
      "strengths": string[],
      "weaknesses": string[],
      "reasoning": string,
      "score": number
}
"""

In [None]:
# Helper functions
import re
import json
import pandas as pd

# Function to fill values in { } in the prompt template
def fill_prompt(template_string, variables):
    placeholders = re.findall(r"{([^{}]+)}", template_string)

    result = template_string
    for placeholder in placeholders:
         if placeholder in variables:
            result = result.replace(
                "{" + placeholder + "}", str(variables[placeholder])
            )

    return result.replace("{{", "{").replace("}}", "}")

# Basic function to do evals
def gen_eval(prompt, testset, eval_prompt):
    results = []
    for test in testset:
        filled_prompt = fill_prompt(prompt, test)
        response = gen_answer(filled_prompt)

        temp_variables = {
            'prompt_inputs':filled_prompt,
            'output':response
        }
        filled_eval_prompt = fill_prompt(eval_prompt, temp_variables)
        eval_result = gen_answer(filled_eval_prompt, GEMINI25_PRO)

        eval_result = json.loads(eval_result.replace('```json', '').replace('```', ''))
        result = temp_variables | eval_result
        results.append(result)

    return pd.DataFrame(results)

## 0) Initial Prompt

In [None]:
# Test 0 (take ~2.30 mins to run)
base_prompt = """Recipe that satisfies the following:

ingredients: {ingredients}
dietary_restrictions: {dietary_restrictions}
cooking_theme: {cooking_theme}
time: {time}
"""
results = gen_eval(base_prompt, test_cases, eval_prompt)
results.to_csv('base_prompt.csv')
print(f"mean score = {results['score'].mean()}")
results

mean score = 1.8


Unnamed: 0,prompt_inputs,output,strengths,weaknesses,reasoning,score
0,Recipe that satisfies the following:\n\ningred...,"Okay, here's a simple and delicious vegetarian...",[The recipe is genuinely vegetarian and correc...,[The solution fails the mandatory section comp...,The solution violates a mandatory requirement ...,2
1,Recipe that satisfies the following:\n\ningred...,"Okay, here's a super quick and easy Thai-inspi...",[The recipe is thematically appropriate for a ...,[The solution violates the mandatory 'Section ...,The solution receives a low score because it v...,2
2,Recipe that satisfies the following:\n\ningred...,## Garlic Shrimp Scampi (Ready in 15 Minutes!)...,[The recipe correctly uses all the specified i...,[The solution violates the mandatory 'Section ...,The solution receives a very low score because...,2
3,Recipe that satisfies the following:\n\ningred...,"Okay, here's a recipe for a quick and easy veg...",[The recipe successfully uses all the required...,[The solution violates the mandatory section c...,The solution receives a very low score because...,1
4,Recipe that satisfies the following:\n\ningred...,"Okay, here's a Paleo-friendly Japanese-inspire...",[The recipe successfully incorporates all the ...,[The solution violates Mandatory Requirement #...,The solution receives a very low score because...,2


## 1) Clear and Direct

In [None]:
# Test 1 (take ~2.30 mins to run)
clear_direct_prompt = """Generate a daily recipe that follows the following constraints:

ingredients: {ingredients}
dietary_restrictions: {dietary_restrictions}
cooking_theme: {cooking_theme}
time: {time}

The report must include only four sections: Title, Time, Ingredients, and Instructions.
"""
results = gen_eval(clear_direct_prompt, test_cases, eval_prompt)
results.to_csv('clear_direct_prompt.csv')
print(f"mean score = {results['score'].mean()}")
results

mean score = 4.2


Unnamed: 0,prompt_inputs,output,strengths,weaknesses,reasoning,score
0,Generate a daily recipe that follows the follo...,**Title:** Quick & Easy Tomato Pasta\n\n**Time...,[The recipe successfully meets all mandatory r...,[The recipe fails the length control criterion...,"The solution meets all mandatory requirements,...",5
1,Generate a daily recipe that follows the follo...,**Title:** Quick Thai Chicken & Rice\n\n**Time...,[The recipe is highly feasible within the 15-m...,[The solution violates a mandatory requirement...,The solution fails a mandatory requirement by ...,3
2,Generate a daily recipe that follows the follo...,**Title:** Garlic Shrimp Scampi\n\n**Time:** 1...,[The recipe is thematically appropriate (Itali...,[The solution fails to follow several specific...,The solution successfully meets all mandatory ...,6
3,Generate a daily recipe that follows the follo...,**Title:** Vegan Coconut Oil & Maple Syrup Mad...,[The recipe successfully incorporates all the ...,[The recipe is chemically non-viable and will ...,While the solution meets all mandatory require...,4
4,Generate a daily recipe that follows the follo...,**Title:** Paleo Japanese-Inspired Egg & Avoca...,[The recipe successfully incorporates all the ...,[The recipe instructions are for two servings ...,The solution fails on a mandatory requirement ...,3


## 2) Being Specific

In [None]:
# Test 2
specific_prompt = """Generate a daily recipe that follows the following constraints:

ingredients: {ingredients}
dietary_restrictions: {dietary_restrictions}
cooking_theme: {cooking_theme}
time: {time}

Guidelines:
- The report must include ONLY four sections in order: Title, Time, Ingredients, and Instructions.
- Follow the formatting Rules strictly:
  - Title should be plain text (no special characters or markers)
  - TIME: must be followed by the number and "minutes"
  - INGREDIENTS: must use bullet points (•) with "ingredient - amount" format
  - INSTRUCTIONS: must use numbered list (1. 2. 3. etc)
  - NO other sections allowed
- The portion must be a single-serving.
- The recipe must contain maximum 8 ingredients and maximum 6 instruction steps.
- Time: The recipe can realistically be completed in the stated time by an average home cook.
- Instruction clarity: Each step starts with an action verb, includes specific details (time/temperature/visual cues), and contains one clear action.
"""
results = gen_eval(specific_prompt, test_cases, eval_prompt)
results.to_csv('specific_prompt.csv')
print(f"mean score = {results['score'].mean()}")
results

mean score = 7.8


Unnamed: 0,prompt_inputs,output,strengths,weaknesses,reasoning,score
0,Generate a daily recipe that follows the follo...,Tomato Pasta\n\nTIME: 25 minutes\n\nINGREDIENT...,[The solution perfectly adheres to all mandato...,[Several instruction steps combine multiple di...,The solution meets all mandatory requirements ...,8
1,Generate a daily recipe that follows the follo...,Thai Chicken and Rice Bowl\n\nTIME: 15 minutes...,[The solution perfectly adheres to all formatt...,[],The solution is exemplary. It meets all mandat...,10
2,Generate a daily recipe that follows the follo...,Shrimp Scampi\nTIME: 15 minutes\nINGREDIENTS:\...,[The solution perfectly adheres to all formatt...,[],The solution is exemplary. It flawlessly meets...,10
3,Generate a daily recipe that follows the follo...,Coconut Oil Palmiers\n\nTIME: 20 minutes\n\nIN...,[The recipe successfully adheres to all mandat...,[The 20-minute time is not feasible. It fails ...,The solution meets all mandatory requirements ...,4
4,Generate a daily recipe that follows the follo...,Mushroom Spinach Egg Drop Soup with Avocado\n\...,[The recipe successfully meets all mandatory r...,"[The title, ""Mushroom Spinach Egg Drop Soup wi...",The solution is very strong in meeting all man...,7


## 3) Structure Prompt and Examples

In [None]:
# Test 3
specific_prompt = """Generate a daily recipe that follows the following constraints:

ingredients: {ingredients}
dietary_restrictions: {dietary_restrictions}
cooking_theme: {cooking_theme}
time: {time}

Guidelines:
- The report must include ONLY four sections in order: Title, Time, Ingredients, and Instructions.
- Follow the formatting Rules strictly:
  - Title should be plain text (no special characters or markers)
  - TIME: must be followed by the number and "minutes"
  - INGREDIENTS: must use bullet points (•) with "ingredient - amount" format
  - INSTRUCTIONS: must use numbered list (1. 2. 3. etc)
  - NO other sections allowed
- The portion must be a single-serving.
- The recipe must contain maximum 8 ingredients and maximum 6 instruction steps.
- Time: The recipe can realistically be completed in the stated time by an average home cook.
- Instruction clarity: Each step starts with an action verb, includes specific details (time/temperature/visual cues), and contains one clear action.

<example_output>
Quick Asian Garlic Chicken Stir-Fry

TIME: 15 minutes

INGREDIENTS:
- Chicken breast - 1 lb, cubed
- White rice - 2 cups, cooked
- Garlic - 4 cloves, minced
- Soy sauce - 3 tablespoons
- Vegetable oil - 2 tablespoons
- Green onions - 2 stalks, sliced
- Sesame oil - 1 teaspoon

INSTRUCTIONS:
1. Heat vegetable oil in wok over high heat until shimmering (about 1 minute).
2. Add cubed chicken and cook for 3-4 minutes until golden brown on all sides.
3. Push chicken to sides and add minced garlic to center, stir-fry for 30 seconds until fragrant.
4. Pour soy sauce over chicken and toss everything for 1 minute until well-coated.
5. Add cooked rice and stir-fry for 2 minutes until heated through.
6. Drizzle sesame oil and garnish with sliced green onions before serving.
</example_output>
"""
results = gen_eval(specific_prompt, test_cases, eval_prompt)
results.to_csv('specific_prompt.csv')
print(f"mean score = {results['score'].mean()}")
results

mean score = 8.6


Unnamed: 0,prompt_inputs,output,strengths,weaknesses,reasoning,score
0,Generate a daily recipe that follows the follo...,Simple Tomato Pasta\n\nTIME: 25 minutes\n\nING...,[The solution perfectly adheres to all mandato...,[Instruction step 6 combines three distinct ac...,The solution is excellent and meets all mandat...,9
1,Generate a daily recipe that follows the follo...,Quick Thai Chicken and Rice\n\nTIME: 15 minute...,[The recipe perfectly adheres to all formattin...,[],The solution is an exemplary response that mee...,10
2,Generate a daily recipe that follows the follo...,Speedy Shrimp Scampi\n\nTIME: 15 minutes\n\nIN...,[The solution perfectly adheres to all formatt...,[],The solution is a perfect response that meets ...,10
3,Generate a daily recipe that follows the follo...,Vegan French Palmiers\n\nTIME: 20 minutes\n\nI...,[The solution correctly follows all formatting...,[The 20-minute time is not feasible. The recip...,The solution meets all mandatory requirements ...,4
4,Generate a daily recipe that follows the follo...,Paleo Japanese Breakfast Bowl\n\nTIME: 25 minu...,[The solution perfectly adheres to all mandato...,[The recipe is more 'Japanese-inspired' than a...,The solution is outstanding. It flawlessly mee...,10


<br>

---

<br>

## Prefilling

In [None]:
response = client.chat.completions.create(
      model='google/gemini-2.5-flash-lite',
      messages=[
          {"role": "user", "content": "What color do you like?"},
          {"role": "assistant", "content":"I like green because"} # try commenting this out
      ],
      temperature=0
    )
print(response.choices[0].message.content)

 it is often associated with nature, growth, and harmony. It's a very calming and refreshing color!



For more info about OpenAI API see [doc](https://platform.openai.com/docs/api-reference/chat/create)

In [None]:
response = client.chat.completions.create(
      model='google/gemini-2.5-flash-lite',
      messages=[
          {"role": "user", "content": "Generate a json structure of a mock-up student's scores for three subjects."},
          {"role": "assistant", "content":"```json"}  # try commenting this out
      ],
      temperature=0,
      stop=['```']  # you can also add a stop seqeunce
    )
print(response.choices[0].message.content)


{
  "student_id": "S12345",
  "student_name": "Alice Wonderland",
  "class": "10A",
  "scores": {
    "Mathematics": {
      "midterm": 85,
      "final": 92,
      "homework": [90, 88, 95, 82, 98],
      "participation": 10,
      "total": 90.5,
      "grade": "A"
    },
    "Science": {
      "midterm": 78,
      "final": 85,
      "lab_reports": [75, 80, 88, 92],
      "quiz": [65, 70, 75],
      "total": 80.2,
      "grade": "B"
    },
    "English": {
      "midterm": 90,
      "final": 95,
      "essays": [88, 92, 95],
      "presentation": 98,
      "total": 93.8,
      "grade": "A+"
    }
  },
  "overall_gpa": 3.7
}

