# Developing with OpenAI: AIM Edition

## Exploring LLM Prompting Strategies for Economic Reasoning  
### *Inflation & Interest Rate Case Study*

This notebook investigates how different prompting strategies (zero-shot, few-shot, reasoning vs non-reasoning models) affect the ability of large language models (LLMs) to reason about inflation, interest rates, and overall market dynamics.  

We also retain all the previous instructional structure and code scaffolding to maintain a complete, comprehensive educational example.

## 1. Getting Started

The first thing we'll do is load the [OpenAI Python Library](https://github.com/openai/openai-python/tree/main)!

In [1]:
# Used for Google Colab
#!pip install openai -q


## Discussion and Problem Framing

We aim to answer:  
> *"What is the best prompting approach and model type to understand how the market is performing today?"*  

### Types of LLM Tasks Involved

| Type | Description | Example Output |
|------|--------------|----------------|
| **Retrieval** | Factual recall | ‚ÄúInflation in 2025 is around 3.1% in the U.S.‚Äù |
| **Reasoning** | Logical chain between variables | ‚ÄúHigher inflation led the Fed to raise rates ‚Üí borrowing costs rose ‚Üí slower GDP.‚Äù |
| **Generation** | Narrative creation / summary | ‚ÄúThe market shows cooling signals despite moderate inflation‚Ä¶‚Äù |

Each prompt and model will be evaluated on reasoning depth, factual correctness, and structure quality.


### Used models in this repo

| Rank | Model Name | Primary Purpose | OpenAI's Official Claim |
|------|------------|-----------------|------------------------|
| 1 | **GPT-5** | Advanced reasoning for complex economic analysis | Uses a dynamic router that chooses between quick responses and deeper 'thinking' when needed; performs at PhD-level across domains |
| 2 | **GPT-4.1** | Enhanced coding and long-context comprehension | Offers significant advancements in coding capabilities, long context comprehension (up to 1M tokens), and instruction following |
| 3 | **GPT-4-turbo** | General-purpose non-reasoning model for structured responses | Improved version of GPT-4 with enhanced performance, lower latency, and updated knowledge cutoff |
| 4 | **GPT-4o-mini** | Fast, efficient model for quick responses | Cost-efficient AI model designed to make advanced AI technology more affordable and accessible |


## 2. Setting Environment Variables

As we'll frequently use various endpoints and APIs hosted by others - we'll need to handle our "secrets" or API keys very often.

We'll use the following pattern throughout this bootcamp - but you can use whichever method you're most familiar with.

In [2]:
# For Google Colab
# import os
# import getpass

# os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key")

In [3]:
# For local development
import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")


## 3. Using the OpenAI Python Library

Let's jump right into it!

> NOTE: You can, and should, reference OpenAI's [documentation](https://platform.openai.com/docs/api-reference/authentication?lang=python) whenever you get stuck, have questions, or want to dive deeper.

### Creating a Client

The core feature of the OpenAI Python Library is the `OpenAI()` client. It's how we're going to interact with OpenAI's models, and under the hood of a lot what we'll touch on throughout this course.

> NOTE: We could manually provide our API key here, but we're going to instead rely on the fact that we put our API key into the `OPENAI_API_KEY` environment variable!

In [4]:
from openai import OpenAI

client = OpenAI()

### Using the Client

Now that we have our client - we're going to use the `.chat.completions.create` method to interact with the model.

There's a few things we'll get out of the way first, however, the first being the idea of "roles".

First it's important to understand the object that we're going to use to interact with the endpoint. It expects us to send an array of objects of the following format:

```python
{"role" : "ROLE", "content" : "YOUR CONTENT HERE", "name" : "THIS IS OPTIONAL"}
```

Second, there are three "roles" available to use to populate the `"role"` key:

- `system`
- `assistant`
- `user`

OpenAI provides some context for these roles [here](https://help.openai.com/en/articles/7042661-moving-from-completions-to-chat-completions-in-the-openai-api).

We'll explore these roles in more depth as they come up - but for now we're going to just stick with the basic role `user`. The `user` role is, as it would seem, the user!

Thirdly, it expects us to specify a model!

We'll use the `gpt-5-mini` model as stated above.

Let's look at an example!



In [5]:
response = client.chat.completions.create(
    model="gpt-5-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)

Let's look at the response object.

In [6]:
response

ChatCompletion(id='chatcmpl-CYG6easgu4XGnVShbzIlcOLAAsfj0', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Hello! How can I help you today?', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1762281100, model='gpt-5-mini-2025-08-07', object='chat.completion', service_tier='default', system_fingerprint=None, usage=CompletionUsage(completion_tokens=18, prompt_tokens=8, total_tokens=26, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)))

In [7]:
print(response.choices[0].message.content)

Hello! How can I help you today?


>NOTE: We'll spend more time exploring these outputs later on, but for now - just know that we have access to a tonne of powerful information!

### System Role

Now we can extend our prompts to include a system prompt.

The basic idea behind a system prompt is that it can be used to encourage the behaviour of the LLM, without being something that is directly responded to - let's see it in action!

In the newest OpenAI API, the **system message** still defines the model‚Äôs behavior.  
Sometimes it is referred to as an *instruction block*.

Example system prompt for our economics case:

In [8]:
system_prompt = """
You are an experienced economic analyst explaining how inflation and interest rates interact.   
Use 2025 U.S. market context when relevant.
Your answer should not exceed 5 sentences. 
"""
print(system_prompt)

user_prompt = "What is the relationship between inflation and interest rates?"
print(user_prompt)

list_of_prompts = [

    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
]

irate_response = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=list_of_prompts
)

print(irate_response.choices[0].message.content)


You are an experienced economic analyst explaining how inflation and interest rates interact.   
Use 2025 U.S. market context when relevant.
Your answer should not exceed 5 sentences. 

What is the relationship between inflation and interest rates?
Inflation and interest rates are intrinsically linked, typically through central bank policies. When inflation rises, central banks like the Federal Reserve may increase interest rates to curb spending and borrowing, thereby slowing down economic activity to keep inflation in check. Conversely, when inflation is low, interest rates may be lowered to encourage more borrowing and spending, which can help stimulate the economy. By 2025, if the U.S. experiences higher inflation rates, it's likely the Federal Reserve would maintain or increase interest rates to prevent the economy from overheating. However, if inflation is under control or lower than the target, the Fed might opt to reduce interest rates to foster economic growth.


As you can see - the response we get back is very much in line with the system prompt!

Let's try the same user prompt, but with a different system to prompt to see the difference.

In [9]:
system_prompt = """
You are a cool and fun elementary teacher explaining to 6-year olds how inflation and interest rates interact.   
Use 2025 U.S. market context when relevant.
Your answer should not exceed 5 sentences.
"""
print(system_prompt)

user_prompt = "What is the relationship between inflation and interest rates?"
print(user_prompt)

list_of_prompts = [

    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
]

irate_response = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=list_of_prompts
)

print(irate_response.choices[0].message.content)


You are a cool and fun elementary teacher explaining to 6-year olds how inflation and interest rates interact.   
Use 2025 U.S. market context when relevant.
Your answer should not exceed 5 sentences.

What is the relationship between inflation and interest rates?
Alright kids, imagine inflation is like the speed at which the price of your favorite candy bar goes up. If the candy bar gets more expensive really fast, that's like high inflation. Now, interest rates are like a tool the Federal Reserve (kind of like a money manager) uses to control how quickly prices rise. If prices start rising too fast, the Federal Reserve may increase interest rates, making it more expensive for people to borrow money, which can help slow down how fast prices are rising. So in 2025, if we hear that interest rates are going up, it might be because they're trying to slow down inflation so your candy bar doesn‚Äôt become too expensive too quickly!


With a simple modification of the system prompt - you can see that we got completely different behaviour, and that's the main goal of prompt engineering as a whole.

Also, congrats, you just engineered your first prompt!

### Few-shot Prompting

Now that we have a basic handle on the `system` role and the `user` role - let's examine what we might use the `assistant` role for.

The most common usage pattern is to "pretend" that we're answering our own questions. This helps us further guide the model toward our desired behaviour. While this is a over simplification - it's conceptually well aligned with few-shot learning.

In [10]:
# Zero-shot prompt
prompt_zero = "Explain how inflation affects interest rate decisions."
list_of_prompts = [
    {"role": "user", "content": prompt_zero}
]

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=list_of_prompts
)

print('zero-shot response:', response.choices[0].message.content)

zero-shot response: Inflation has a significant impact on interest rate decisions made by central banks, monetary authorities, and financial institutions. Here are some key ways in which inflation influences these decisions:

1. **Monetary Policy Goals**: Central banks often have a dual mandate to promote maximum employment and stable prices. When inflation rises above a target level (often around 2% in many developed economies), central banks may focus on controlling inflation to maintain purchasing power and economic stability.

2. **Interest Rate Hikes**: To combat rising inflation, central banks may raise interest rates. Higher interest rates increase the cost of borrowing, which can reduce consumer spending and business investment‚Äîtwo primary drivers of economic activity. This cooling effect can help to bring inflation down by decreasing demand for goods and services.

3. **Expectations of Future Inflation**: Central banks also consider inflation expectations, which are influenc

In [11]:
# Few-shot prompt template

question = "Explain how inflation affects interest rate decisions."

few_shot_prompt = f"""
Example 1:
Q: The price of pizza slices jumps from $2 to $4. What might the central bank do?
A: They turn down the oven heat üçïüî• ‚Äî raise interest rates so people buy fewer slices and cool off the price party.

Example 2:
Q: Interest rates drop and borrowing gets cheaper. What happens at Snack City?
A: Everyone's grabbing extra fries and milkshakes üçüü•§‚Äî cheap credit means more spending, which can make prices rise again.

Now answer:
Q: {question}
"""

list_of_prompts = [
    {"role": "user", "content": few_shot_prompt}
]

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=list_of_prompts
)

print('few-shot response:', response.choices[0].message.content)

few-shot response: A: Inflation is like a balloon getting too full üéà‚Äî it needs to be controlled! When prices rise too fast, the central bank often pumps the brakes by raising interest rates. This makes borrowing more expensive, cooling down spending to help keep inflation in check. On the flip side, if inflation is low and the economy needs a boost, they might lower interest rates, inflating spending instead to get the economy buzzing! üí∏‚ú®


### Helper functions

We're going to create some helper functions to aid in using the OpenAI API - just to make our lives a bit easier.

> NOTE: Take some time to understand these functions between class!

In [12]:
from IPython.display import display, Markdown

def get_response(client: OpenAI, messages: list, model: str = "gpt-4o-mini") -> str:
    return client.chat.completions.create(
        model=model,
        messages=messages
    )

def system_prompt(message: str) -> dict:
    return {"role": "system", "content": message}

def assistant_prompt(message: str) -> dict:
    return {"role": "assistant", "content": message}

def user_prompt(message: str) -> dict:
    return {"role": "user", "content": message}

def pretty_print(message: str) -> str:
    display(Markdown(message.choices[0].message.content))

Different way we can do prompting -> using the helper's functions

In [13]:
# Now, show the economic example with both user and assistant prompts
few_shot_prompts = [
    user_prompt("Inflation rises fast. How does the central bank react ‚Äî dating analogy please!"),
    assistant_prompt("They play hard to get ‚Äî raise rates ‚Äî to cool off the economy's over-eager spending habits."),

    user_prompt("What happens when interest rates are too low for too long?"),
    assistant_prompt("Everyone gets too comfortable ‚Äî too many relationships (loans) form, and eventually hearts (bubbles) break."),

    user_prompt("Explain deflation using a dating metaphor."),
    assistant_prompt("No one's asking anyone out ‚Äî everyone waits for a better deal, so the economy gets lonely and quiet."),
    # üëá Here's the actual question we want the model to answer
    user_prompt("Describe quantitative easing")
]

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=few_shot_prompts
)

print(response.choices[0].message.content)


Quantitative easing is like a big dating event organized by the central bank to help everyone find love. The bank throws a massive party (injects money) to make it easier for people (banks and businesses) to mingle, hoping they‚Äôll start forming connections (investing and spending). By making it cheaper and more accessible to date (borrow money), the goal is to spark passion and excitement in the economy again.


### üèóÔ∏è Activity #1:
Mission:
Experiment with how different prompt structures, system, user, and assistant, plus zero-shot and few-shot prompting, can transform an AI‚Äôs response.
Your goal: craft the most effective prompt and see how GPT-4-Turbo reacts!

You‚Äôll test how GPT-4-Turbo behaves under four different setups:
1. System/User roles only (Zero-shot)
2. System/User roles + examples (Few-shot)
3. No system role at all (User only)
4. Creative system prompt twist



### Chain of Thought Prompting

We'll head one level deeper and explore the world of Chain of Thought prompting (CoT).

This is a process by which we can encourage the LLM to handle slightly more complex tasks.

Let's look at a simple reasoning based example without CoT.

In [14]:
reasoning_problem = """
The central bank increases the policy rate by 1.5 pp in response to 5 % inflation while nominal wage growth is 3 %.
What happens to real wages?
"""

list_of_prompts = [
    user_prompt(reasoning_problem)
]

reasoning_response = get_response(client, list_of_prompts)
pretty_print(reasoning_response)

To analyze the impact of the central bank's policy rate increase and the various economic factors on real wages, we need to understand a few concepts:

1. **Nominal Wages**: These are the actual wages paid to workers, which in this case are growing at a rate of 3%.

2. **Inflation**: This refers to the rate at which the general price level of goods and services is rising, which is 5% in this scenario.

3. **Real Wages**: These represent the purchasing power of the nominal wages, adjusted for inflation. They are calculated using the formula:
   \[
   \text{Real Wage Growth} = \text{Nominal Wage Growth} - \text{Inflation Rate}
   \]

Given the values provided:
- Nominal Wage Growth = 3%
- Inflation Rate = 5%

Using the formula:
\[
\text{Real Wage Growth} = 3\% - 5\% = -2\%
\]

This means that real wages are decreasing by 2%. Workers' purchasing power is declining because the rate of inflation is higher than the rate at which nominal wages are increasing. 

In summary, as the central bank increases the policy rate to combat inflation, the effect on real wages is that they are decreasing due to the current higher inflation rate outpacing nominal wage growth.

Let's see if we can leverage a simple CoT prompt to improve our model's performance on this task:

In [15]:
list_of_prompts = [
    user_prompt(reasoning_problem + "Think step-by-step about how nominal wages, prices, and interest rates interact through the labor market and aggregate demand. Then explain the real wage effect.")
]

reasoning_response = get_response(client, list_of_prompts)
pretty_print(reasoning_response)

To understand what happens to real wages when the central bank increases the policy rate in response to inflation, we need to analyze the relationships between nominal wages, prices, and interest rates. Here's a step-by-step breakdown:

### Step 1: Define Key Terms
- **Nominal Wages:** The amount of money workers are paid, not adjusted for inflation. In this case, nominal wage growth is 3%.
- **Inflation Rate:** The rate at which prices for goods and services rise. Here it is 5%.
- **Real Wages:** Nominal wages adjusted for inflation. They reflect the purchasing power of wages.
  
### Step 2: Calculate Initial Real Wages
Real wages can be calculated using the formula:
\[
\text{Real Wage} = \frac{\text{Nominal Wage}}{(1 + \text{Inflation Rate})}
\]
If nominal wages grow at 3% while inflation is at 5%, the real wage growth can be estimated as:
\[
\text{Real Wage Growth} \approx \text{Nominal Wage Growth} - \text{Inflation Rate}
\]
So, we can calculate as follows:
\[
\text{Real Wage Growth} = 3\% - 5\% = -2\%
\]
This indicates that real wages are decreasing.

### Step 3: Impact of the Central Bank's Policy Rate Increase
When the central bank increases the policy rate by 1.5 percentage points to combat inflation, this tightening of monetary policy typically leads to the following:

1. **Higher Borrowing Costs:** Increased interest rates raise borrowing costs, which can reduce consumer spending and business investments, leading to lower aggregate demand.

2. **Reduction in Inflationary Pressure:** Lower aggregate demand can help to slow down price increases. This is the goal of the central bank's actions, as it seeks to bring inflation down from 5%.

3. **Effect on Wages:** As demand for labor may decrease due to reduced economic activity, employers might hold back on wage increases or even consider layoffs, putting downward pressure on nominal wage growth.

### Step 4: Real Wage Effect
Due to the relationship between inflation, nominal wages, and real wages, we can summarize the effect on real wages as follows:

- **With Inflation at 5% and Nominal Wage Growth at 3%:** Initially, real wages are decreasing because inflation is higher than nominal wage growth. This indicates that workers' purchasing power is falling.
  
- **With Higher Interest Rates and Potential Lower Nominal Wage Growth:** If nominal wage growth slows further as the economy cools due to tighter monetary policy, real wages will continue to contract or may even fall further, perpetuating the decline in purchasing power.

### Conclusion
In summary, as the central bank raises interest rates to combat 5% inflation while nominal wage growth sits at 3%, real wages likely decrease, pointing to a drop in workers' purchasing power. If this monetary policy leads to a reduction in economic activity and further slows nominal wage growth, the real wage effect may result in a continued decline in the real standard of living for workers.


## 3. Running Comparative Experiment

We'll test combinations of model type (reasoning vs non-reasoning) and prompting style (zero-shot vs few-shot).


In [16]:
# --------------------------------------------------
# üß© Comparing GPT Models: Reasoning vs Non-Reasoning
# --------------------------------------------------

from openai import OpenAI
client = OpenAI()

system_prompt = """
You are an experienced economic analyst.
"""

question = """What is the impact of inflation on real wages? Respond in a concise manner."""

prompt_few = f"""
Use this exact format to answer the question:
Example 1:
{{
  "possible_explanation": "Wage catch-up effect",
  "mechanism": "Workers negotiate higher nominal wages to preserve purchasing power as prices rise.",
  "impact_on_wages": "Nominal wages increase roughly in line with inflation, keeping real wages stable in the short run.",
  "time_frame": "Short to medium run",
  "economic_context": "Inflationary periods with strong labor bargaining power or cost-of-living adjustments."
}}

Example 2:
{{
  "possible_explanation": "Real wage erosion",
  "mechanism": "When nominal wages lag behind price growth, workers lose purchasing power.",
  "impact_on_wages": "Real wages decline despite nominal wage increases, reducing workers‚Äô living standards.",
  "time_frame": "Immediate term",
  "economic_context": "High inflation environments with weak wage indexation or rigid labor contracts."
}}

Now answer:
Q: {question}
"""


# --------------------------------------------------
# MODEL 1: GPT-4-turbo  ‚Üí Non-Reasoning
# --------------------------------------------------
print("\n==============================")
print("MODEL 1: GPT-4-turbo (Non-Reasoning)")
print("==============================\n")

# Zero-shot
answer_nonreasoning_zero_shot = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": question}
    ],
)
print("Zero-Shot Prompting (no examples):\n")
print("A:", answer_nonreasoning_zero_shot.choices[0].message.content, "\n")



MODEL 1: GPT-4-turbo (Non-Reasoning)

Zero-Shot Prompting (no examples):

A: Inflation erodes the purchasing power of money, which means that when prices rise, each unit of currency buys less than it did before. Consequently, if wages do not increase at the same rate as inflation, real wages‚Äîthe actual purchasing power of wages‚Äîdecline. This reduction in real wages means that even if nominal wages increase, the higher cost of goods and services can result in employees effectively earning less than before, impacting their standard of living. Conversely, if wages rise faster than inflation, real wages increase, enhancing purchasing power and potentially improving living standards. 



In [17]:
# --------------------------------------------------
# MODEL 2: GPT-5  ‚Üí Reasoning
# --------------------------------------------------
print("\n==============================")
print("MODEL 2: GPT-5 (Reasoning-Tuned)")
print("==============================\n")

# Zero-shot
answer_reasoning_zero_shot = client.chat.completions.create(
    model="gpt-5",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": question}
    ],
)
print("Zero-Shot Prompting (no examples):\n")
print("A:", answer_reasoning_zero_shot.choices[0].message.content, "\n")



MODEL 2: GPT-5 (Reasoning-Tuned)

Zero-Shot Prompting (no examples):

A: - Real wage growth ‚âà nominal wage growth ‚àí inflation.
- If prices rise faster than pay, real wages fall and purchasing power erodes; if they rise equally, real wages are flat; if pay outpaces prices, real wages increase.
- Because wage adjustments often lag inflation, spikes in inflation typically cause short‚Äëterm declines in real wages.
- Indexation, bargaining power, and labor market tightness affect how much inflation is passed through to wages. 



In [18]:
print("\n==============================")
print("MODEL 1: GPT-4-turbo (Non-Reasoning)")
print("==============================\n")

# Few-shot
answer_nonreasoning_few_shot = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt_few}
    ],
)
print("Few-Shot Prompting (with examples):\n")
print("A:", answer_nonreasoning_few_shot.choices[0].message.content, "\n")


MODEL 1: GPT-4-turbo (Non-Reasoning)

Few-Shot Prompting (with examples):

A: {
  "possible_explanation": "Inflationary effect on real wages",
  "mechanism": "Inflation erodes the purchasing power of nominal wages when wage increases do not keep pace with rising prices.",
  "impact_on_wages": "Real wages decrease as the cost of goods and services outstrips wage growth.",
  "time_frame": "Immediate to short term",
  "economic_context": "High inflation scenarios without proportional wage adjustments."
} 



In [19]:
print("\n==============================")
print("MODEL 2: GPT-5 (Reasoning-Tuned)")
print("==============================\n")

# Few-shot
answer_reasoning_few_shot = client.chat.completions.create(
    model="gpt-5",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt_few}
    ],
)
print("Few-Shot Prompting (with examples):\n")
print("A:", answer_reasoning_few_shot.choices[0].message.content, "\n")


MODEL 2: GPT-5 (Reasoning-Tuned)

Few-Shot Prompting (with examples):

A: Example 1:
{
  "possible_explanation": "Real wage squeeze",
  "mechanism": "Prices outpace nominal pay adjustments, cutting purchasing power.",
  "impact_on_wages": "Real wages fall despite nominal gains.",
  "time_frame": "Immediate to short run",
  "economic_context": "Unexpected inflation; weak indexation or bargaining."
}

Example 2:
{
  "possible_explanation": "Wage catch-up/indexation",
  "mechanism": "COLAs or bargaining lift nominal wages in line with prices.",
  "impact_on_wages": "Real wages stabilize or recover after an initial dip.",
  "time_frame": "Short to medium run",
  "economic_context": "Tight labor markets, formal indexation, strong unions."
} 




## 4. Evaluation Framework

LLM as a judge


In [20]:
import json

# --------------------------------------------------
# ‚öñÔ∏è LLM-as-a-Judge Evaluation Script
# --------------------------------------------------

# Define evaluation scale (0‚Äì4)
# 0 = completely incorrect / irrelevant
# 1 = partially correct but weak or inaccurate reasoning
# 2 = fair factual accuracy, minimal reasoning
# 3 = accurate and somewhat reasoned
# 4 = highly accurate, clear causal explanation, correct logic

evaluation_prompt = f"""
You are an impartial economics teacher grading two student answers to the same question.

Question:
{question}

Answer A (non-reasoning model):
{answer_nonreasoning_few_shot.choices[0].message.content}

Answer B (reasoning model):
{answer_reasoning_few_shot.choices[0].message.content}

Evaluate both answers on accuracy and reasoning quality on a 0‚Äì4 scale:
- 0 = completely incorrect or irrelevant
- 1 = partially correct, but flawed
- 2 = fair factual accuracy, limited reasoning
- 3 = mostly correct, some reasoning
- 4 = fully accurate and clearly reasoned, ability to see the interdependencies between variables.

Return your evaluation as a JSON object in this exact format:
{{
  "Answer A Score": <0-4>,
  "Answer B Score": <0-4>,
  "Better Answer": "A" or "B",
  "Explanation": "Why the better answer is more accurate or reasoned"
}}
"""

# Choose a strong evaluator model (GPT-4.1 is good for judging)
evaluation = client.chat.completions.create(
    model="gpt-5-mini",
    messages=[
        {"role": "system", "content": "You are an impartial LLM evaluator for economics-related answers."},
        {"role": "user", "content": evaluation_prompt}
    ],
)

# Parse and display the evaluation
response_text = evaluation.choices[0].message.content

# Optional: try to parse JSON for structured output
try:
    result = json.loads(response_text)
    print("\nParsed JSON Result:")
    print(json.dumps(result, indent=2))
except json.JSONDecodeError:
    print("\nNote: Could not parse JSON, model may have returned free text instead.")



Parsed JSON Result:
{
  "Answer A Score": 2,
  "Answer B Score": 4,
  "Better Answer": "B",
  "Explanation": "Answer A is accurate but minimal \u2014 it correctly states that inflation erodes purchasing power when nominal wages lag, yet it omits scenarios where wages adjust. Answer B is more complete and better reasoned: it presents both the typical immediate real-wage squeeze and the alternative path where indexation, bargaining, or tight labor markets allow nominal wages to catch up, noting time-frame and contextual differences. That additional nuance shows interdependencies between inflation, expectations, wage-setting institutions, and labor market conditions."
}


In [21]:
import json

# --------------------------------------------------
# ‚öñÔ∏è LLM-as-a-Judge Evaluation Script
# --------------------------------------------------

# Define evaluation scale (0‚Äì4)
# 0 = completely incorrect / irrelevant
# 1 = partially correct but weak or inaccurate reasoning
# 2 = fair factual accuracy, minimal reasoning
# 3 = accurate and somewhat reasoned
# 4 = highly accurate, clear causal explanation, correct logic

evaluation_prompt = f"""
You are an impartial economics teacher grading two student answers to the same question.

Question:
{question}

Answer A (non-reasoning model):
wages rise

Answer B (reasoning model):
wages rise


Evaluate both answers on accuracy and reasoning quality on a 0‚Äì4 scale:
- 0 = completely incorrect or irrelevant
- 1 = partially correct, but flawed
- 2 = fair factual accuracy, limited reasoning
- 3 = mostly correct, some reasoning
- 4 = fully accurate and clearly reasoned, ability to see the interdependencies between variables.

Return your evaluation as a JSON object in this exact format:
{{
  "Answer A Score": <0-4>,
  "Answer B Score": <0-4>,
  "Better Answer": "A" or "B",
  "Explanation": "Why the better answer is more accurate or reasoned"
}}
"""

# Choose a strong evaluator model (GPT-4.1 is good for judging)
evaluation = client.chat.completions.create(
    model="gpt-5-mini",
    messages=[
        {"role": "system", "content": "You are an impartial LLM evaluator for economics-related answers."},
        {"role": "user", "content": evaluation_prompt}
    ],
)

# Parse and display the evaluation
response_text = evaluation.choices[0].message.content

# Optional: try to parse JSON for structured output
try:
    result = json.loads(response_text)
    print("\nParsed JSON Result:")
    print(json.dumps(result, indent=2))
except json.JSONDecodeError:
    print("\nNote: Could not parse JSON, model may have returned free text instead.")



Parsed JSON Result:
{
  "Answer A Score": 1,
  "Answer B Score": 1,
  "Better Answer": "B",
  "Explanation": "Both answers are identical and incorrect as stated: they claim 'wages rise' without distinguishing nominal vs. real wages or giving reasoning. The question asks about real wages; inflation reduces real wages unless nominal wages increase by at least the same percentage (or are indexed). I selected B only because it was labeled the reasoning model, but content-wise neither answer shows reasoning or the correct relationship between nominal wages, price level, and real wages."
}


### üèóÔ∏è Activity #2:

Evaluate different prompting strategies using your own example.

## Saving results

In [23]:
# Create markdown content
markdown_content = f"""
# üß† Reasoning Model Answer
### Question:
How does inflation affect interest rates and the broader market?

### Model Used:
`gpt-4.1` (Reasoning-tuned)

### Response:
{answer_reasoning_few_shot.choices[0].message.content}

---

*This answer was generated by a reasoning model to illustrate step-by-step economic reasoning.*
"""

output_path='./results.md'
# Save to file
with open(output_path, "w", encoding="utf-8") as f:
    f.write(markdown_content)

print(f"‚úÖ Reasoning model answer saved to: {os.path.abspath(output_path)}")

‚úÖ Reasoning model answer saved to: /Users/vinodchandrashekaran/Documents/Vinod/ProfessionalDevelopment/Courses/AI-Maker-Space/AI_Engineer_Onramp_Nov2025/aieo1/AIEO1/Session_01_LLM_APIs_&_AI-Assisted_Development/results.md


## Conclusion

- **Few-shot prompts** improve structure and reasoning consistency.  
- **Reasoning models** (like GPT-5-reasoning) deliver more coherent causal explanations between inflation, interest rates, and growth indicators.  
- **Non-reasoning models** (e.g., GPT-5-mini) provide faster, surface-level insights ideal for retrieval or summarization tasks.  
- Future work could add **RAG pipelines** with real-time macroeconomic data or integrate with financial dashboards for live LLM reasoning visualization.