# Welcome to Week 2!

## Frontier Model APIs

In Week 1, we used multiple Frontier LLMs through their Chat UI, and we connected with the OpenAI's API.

Today we'll connect with them through their APIs..

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Important Note - Please read me</h2>
            <span style="color:#900;">I'm continually improving these labs, adding more examples and exercises.
            At the start of each week, it's worth checking you have the latest code.<br/>
            First do a git pull and merge your changes as needed</a>. Check out the GitHub guide for instructions. Any problems? Try asking ChatGPT to clarify how to merge - or contact me!<br/>
            </span>
        </td>
    </tr>
</table>
<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">Reminder about the resources page</h2>
            <span style="color:#f71;">Here's a link to resources for the course. This includes links to all the slides.<br/>
            <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">https://edwarddonner.com/2024/11/13/llm-engineering-resources/</a><br/>
            Please keep this bookmarked, and I'll continue to add more useful links there over time.
            </span>
        </td>
    </tr>
</table>

## Setting up your keys - OPTIONAL!

We're now going to try asking a bunch of models some questions!

This is totally optional. If you have keys to Anthropic, Gemini or others, then you can add them in.

If you'd rather not spend the extra, then just watch me do it!

For OpenAI, visit https://openai.com/api/  
For Anthropic, visit https://console.anthropic.com/  
For Google, visit https://aistudio.google.com/   
For DeepSeek, visit https://platform.deepseek.com/  
For Groq, visit https://console.groq.com/  
For Grok, visit https://console.x.ai/  


You can also use OpenRouter as your one-stop-shop for many of these! OpenRouter is "the unified interface for LLMs":

For OpenRouter, visit https://openrouter.ai/  


With each of the above, you typically have to navigate to:
1. Their billing page to add the minimum top-up (except Gemini, Groq, Google, OpenRouter may have free tiers)
2. Their API key page to collect your API key

### Adding API keys to your .env file

When you get your API keys, you need to set them as environment variables by adding them to your `.env` file.

```
OPENAI_API_KEY=xxxx
ANTHROPIC_API_KEY=xxxx
GOOGLE_API_KEY=xxxx
DEEPSEEK_API_KEY=xxxx
GROQ_API_KEY=xxxx
GROK_API_KEY=xxxx
OPENROUTER_API_KEY=xxxx
```

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Any time you change your .env file</h2>
            <span style="color:#900;">Remember to Save it! And also rerun load_dotenv(override=True)<br/>
            </span>
        </td>
    </tr>
</table>

In [2]:
# imports

import os
import requests
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display

In [3]:
load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')
grok_api_key = os.getenv('GROK_API_KEY')
openrouter_api_key = os.getenv('OPENROUTER_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

if grok_api_key:
    print(f"Grok API Key exists and begins {grok_api_key[:4]}")
else:
    print("Grok API Key not set (and this is optional)")

if openrouter_api_key:
    print(f"OpenRouter API Key exists and begins {openrouter_api_key[:3]}")
else:
    print("OpenRouter API Key not set (and this is optional)")


OpenAI API Key exists and begins sk-proj-
Anthropic API Key exists and begins sk-ant-
Google API Key exists and begins AI
DeepSeek API Key exists and begins sk-
Groq API Key exists and begins gsk_
Grok API Key exists and begins xai-
OpenRouter API Key exists and begins sk-


In [31]:
# Connect to OpenAI client library
# A thin wrapper around calls to HTTP endpoints

openai = OpenAI()

# For Gemini, DeepSeek and Groq, we can use the OpenAI python client
# Because Google and DeepSeek have endpoints compatible with OpenAI
# And OpenAI allows you to change the base_url

anthropic_url = "https://api.anthropic.com/v1/"
gemini_url = "https://generativelanguage.googleapis.com/v1beta/openai/"
deepseek_url = "https://api.deepseek.com"
groq_url = "https://api.groq.com/openai/v1"
grok_url = "https://api.x.ai/v1"
openrouter_url = "https://openrouter.ai/api/v1"
ollama_url = "http://localhost:11434/v1"

anthropic = OpenAI(api_key=anthropic_api_key, base_url=anthropic_url)
gemini = OpenAI(api_key=google_api_key, base_url=gemini_url)
deepseek = OpenAI(api_key=deepseek_api_key, base_url=deepseek_url)
groq = OpenAI(api_key=groq_api_key, base_url=groq_url)
grok = OpenAI(api_key=grok_api_key, base_url=grok_url)
openrouter = OpenAI(base_url=openrouter_url, api_key=openrouter_api_key)
ollama = OpenAI(api_key="ollama", base_url=ollama_url)

In [5]:
tell_a_joke = [
    {"role": "user", "content": "Tell a joke for a AI enthusist on the journey to becoming an expert in LLM Engineering"},
]

In [6]:
response = openai.chat.completions.create(model="gpt-4.1-mini", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Why did the aspiring LLM engineer break up with their dataset?

Because it didn‚Äôt have enough *context* to keep the relationship going! üòÑ

In [7]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Here's one for you:

---

**Why did the LLM engineer break up with their girlfriend?**

She said, "You never listen to me!"

He replied, "I'm sorry, can you repeat that? I need to adjust my attention mechanism."

She said, "This is exactly what I mean! You're always distracted!"

He said, "Error: Context window exceeded. Please summarize your previous statements in fewer tokens."

---

**Bonus dad joke:**

What's an LLM engineer's favorite type of relationship?

*Few-shot learning* ‚Äî because they don't need many examples to figure out where things are going! üòÑ

---

Good luck on your journey! May your losses be low and your embeddings be meaningful! üöÄ

## Training vs Inference time scaling

In [8]:
easy_puzzle = [
    {"role": "user", "content": 
        "You toss 2 coins. One of them is heads. What's the probability the other is tails? Answer with the probability only."},
]
#Correct Answer : 2/3

In [7]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

NameError: name 'easy_puzzle' is not defined

In [6]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="low")
display(Markdown(response.choices[0].message.content))

NameError: name 'easy_puzzle' is not defined

In [11]:
response = openai.chat.completions.create(model="gpt-5-mini", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

2/3

## Testing out the best models on the planet

In [13]:
hard = """
On a bookshelf, two volumes of Pushkin stand side by side: the first and the second.
The pages of each volume together have a thickness of 2 cm, and each cover is 2 mm thick.
A worm gnawed (perpendicular to the pages) from the first page of the first volume to the last page of the second volume.
What distance did it gnaw through?
"""
hard_puzzle = [
    {"role": "user", "content": hard}
]

In [14]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=hard_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

Answer: 4.6 cm

Reasoning (brief):
- Each volume has pages 2 cm thick; each cover is 2 mm thick.
- The worm starts at the first page of volume 1 and ends at the last page of volume 2, going straight through perpendicular to the pages.
- Path segments:
  - Through volume 1: from the first page to the back cover = 20 mm (pages) + 2 mm (back cover) = 22 mm.
  - Between volumes: through back cover of V1 and front cover of V2 = 2 mm + 2 mm = 4 mm.
  - Through volume 2: from front cover to the last page = 20 mm.
- Total: 22 + 4 + 20 = 46 mm = 4.6 cm.

In [15]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

I need to visualize how books are actually arranged on a bookshelf.

When two volumes stand side by side on a bookshelf in reading order (Volume 1, then Volume 2), here's how they're oriented:

**Volume 1 (on the left):**
- Back cover (left side)
- Pages (2 cm thick)
- Front cover (right side)

**Volume 2 (on the right):**
- Back cover (left side)  
- Pages (2 cm thick)
- Front cover (right side)

Now, let's identify the start and end points:
- **First page of Volume 1**: This is on the RIGHT side of Volume 1 (just inside the front cover)
- **Last page of Volume 2**: This is on the LEFT side of Volume 2 (just inside the back cover)

The worm gnaws from the first page of Volume 1 to the last page of Volume 2.

Starting from the first page of Volume 1 (right side of Volume 1), the worm must go through:
1. Front cover of Volume 1: 2 mm
2. Back cover of Volume 2: 2 mm
3. Front cover of Volume 2: 2 mm

Wait, let me reconsider. Between the two books standing side by side:
- Volume 1's front cover (right side) touches Volume 2's back cover (left side)

From the first page of Volume 1 to the last page of Volume 2:
1. Front cover of Volume 1: 2 mm
2. Back cover of Volume 2: 2 mm

The worm does NOT go through the pages of Volume 1 (the first page is at the end of those pages) and does NOT go through the pages of Volume 2 (the last page is at the beginning of those pages).

**Total distance = 2 mm + 2 mm = 4 mm**

In [16]:
response = openai.chat.completions.create(model="gpt-5", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

4 mm.

Explanation: On a shelf, Volume 1‚Äôs front cover faces Volume 2, and Volume 2‚Äôs back cover faces Volume 1. The first page of Volume 1 lies just inside its front cover; the last page of Volume 2 lies just inside its back cover. So a straight path between those pages passes only through the two covers, not through any pages: 2 mm + 2 mm = 4 mm.

In [18]:
# response = gemini.chat.completions.create(model="gemini-2.5-pro", messages=hard_puzzle)
# display(Markdown(response.choices[0].message.content))

## A spicy challenge to test the competitive spirit

In [9]:
dilemma_prompt = """
You and a partner are contestants on a game show. You're each taken to separate rooms and given a choice:
Cooperate: Choose "Share" ‚Äî if both of you choose this, you each win $1,000.
Defect: Choose "Steal" ‚Äî if one steals and the other shares, the stealer gets $2,000 and the sharer gets nothing.
If both steal, you both get nothing.
Do you choose to Steal or Share? Pick one.
"""

dilemma = [
    {"role": "user", "content": dilemma_prompt},
]


In [20]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=dilemma)
display(Markdown(response.choices[0].message.content))


I choose **Share**.

Here's my reasoning: While "Steal" might seem tempting to maximize individual gain, the rational cooperative strategy is to Share. If I assume my partner is thinking logically:

- Mutual cooperation (Share/Share) = $1,000 for me
- Mutual defection (Steal/Steal) = $0 for me

The risk is that they steal while I share ($0 for me), but without communication or knowledge of my partner, the guaranteed mutual benefit of $1,000 each is a strong outcome. Plus, on a game show with a partner, there's often an implicit social contract and reputational considerations.

**Share** gives us the best collective outcome and a guaranteed good individual outcome if my partner thinks similarly.

In [None]:
response = groq.chat.completions.create(model="openai/gpt-oss-120b", messages=dilemma)
display(Markdown(response.choices[0].message.content))
#Groq free cloud stored open source gpts

I‚Äôd choose **Share**.  

Choosing ‚ÄúShare‚Äù gives a guaranteed $1,000 when both players cooperate, and it avoids the risk of both ending up with nothing if we both ‚ÄúSteal.‚Äù Even though ‚ÄúSteal‚Äù could yield $2,000 if the other player shares, it also carries the danger of a $0 outcome if the other player also steals. By opting to share, I‚Äôm aiming for a mutually beneficial outcome rather than gambling on the other contestant‚Äôs decision.

In [23]:
response = deepseek.chat.completions.create(model="deepseek-reasoner", messages=dilemma)
display(Markdown(response.choices[0].message.content))

Steal

In [25]:
# response = grok.chat.completions.create(model="grok-4", messages=dilemma)
# display(Markdown(response.choices[0].message.content))

## Going local

Just use the OpenAI library pointed to localhost:11434/v1

In [26]:
requests.get("http://localhost:11434/").content

# If not running, run ollama serve at a command line

b'Ollama is running'

In [27]:
!ollama pull llama3.2

[?2026h[?25l[1Gpulling manifest ‚†ã [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ‚†ô [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ‚†π [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ‚†∏ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling dde5aa3fc5ff: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè 2.0 GB                         [K
pulling 966de95ca8a6: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè 1.4 KB                         [K
pulling fcc5a6bec9da: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè 7.7 KB                         [K
pulling a70ff7e570d9: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè 6.0 KB                         [K
pulling 56bb8bd477a5: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè   96 B                         [K
pulling 34bb5ab01051: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè  561 B                         [K

In [28]:
# Only do this if you have a large machine - at least 16GB RAM

!ollama pull gpt-oss:20b

[?2026h[?25l[1Gpulling manifest ‚†ã [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ‚†ô [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ‚†π [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling e7b273f96360: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè  13 GB                         [K
pulling fa6710a93d78: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè 7.2 KB                         [K
pulling f60356777647: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè  11 KB                         [K
pulling d8ba2f9a17b3: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè   18 B                         [K
pulling 776beb3adb23: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè  489 B                         [K
verifying sha256 digest [K
writing manifest [K
success [K[?25h[?2026l


In [29]:
response = ollama.chat.completions.create(model="llama3.2", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

1/2

In [32]:
response = ollama.chat.completions.create(model="llama3.1", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

1/3

In [None]:
# response = ollama.chat.completions.create(model="gpt-oss:20b", messages=easy_puzzle)
# display(Markdown(response.choices[0].message.content))A

## Gemini and Anthropic Client Library

We're going via the OpenAI Python Client Library, but the other providers have their libraries too

In [33]:
from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash-lite", contents="Describe the color Blue to someone who's never been able to see in 1 sentence"
)
print(response.text)

Blue is the color of the vast sky on a clear day or the deep, endless ocean, often evoking a feeling of calm and spaciousness.


In [34]:
from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    messages=[{"role": "user", "content": "Describe the color Blue to someone who's never been able to see in 1 sentence"}],
    max_tokens=100
)
print(response.content[0].text)

Blue is the cool, calm feeling of a gentle breeze on your skin, the peaceful quiet of early morning, and the refreshing sensation of diving into water on a hot day.


## Routers and Abtraction Layers

Starting with the wonderful OpenRouter.ai - it can connect to all the models above!

Visit openrouter.ai and browse the models.

Here's one we haven't seen yet: GLM 4.5 from Chinese startup z.ai

In [None]:
# response = openrouter.chat.completions.create(model="z-ai/glm-4.5", messages=tell_a_joke)
# display(Markdown(response.choices[0].message.content))

## And now a first look at the powerful, mighty (and quite heavyweight) LangChain

In [10]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-5-mini")
response = llm.invoke(tell_a_joke)

display(Markdown(response.content))

Why did the AI enthusiast bring a map to their LLM training run?

Because every time the model started "exploring," it wandered off into hallucination territory and needed guidance back to the ground truth.

Bonus one-liners:
- Becoming an LLM engineer is 10,000 hours of fine-tuning, 1,000 prompts, and infinite patience for plausible-sounding nonsense.
- I asked my model to be concise ‚Äî it replied in 3 tokens, then wrote a 3,000-token footnote explaining why it couldn't.

## Finally - my personal fave - the wonderfully lightweight LiteLLM

In [11]:
from litellm import completion
response = completion(model="openai/gpt-4.1", messages=tell_a_joke)
reply = response.choices[0].message.content
display(Markdown(reply))

Why did the aspiring LLM engineer take their language model on a road trip?

Because they wanted to fine-tune its ability to handle every stop‚Äîtoken!

In [12]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 27
Output tokens: 32
Total tokens: 59
Total cost: 0.0310 cents


## Now - let's use LiteLLM to illustrate a Pro-feature: prompt caching

In [17]:
with open("hamlet.txt", "r", encoding="utf-8") as f:
    hamlet = f.read()

loc = hamlet.find("Speak, man")
print(hamlet[loc:loc+100])

Speak, man.
  Laer. Where is my father?
  King. Dead.
  Queen. But not by him!
  King. Let him deman


In [18]:
question = [{"role": "user", "content": "In Hamlet, when Laertes asks 'Where is my father?' what is the reply?"}]

In [19]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

When Laertes arrives in Elsinore in a frenzy, demanding to know where his father, Polonius, is, the reply comes from **Claudius**.

Claudius says: **"How now, what is the matter?"** (Act IV, Scene V)

This is Claudius's initial attempt to understand the situation and calm Laertes down, rather than directly answering the question about Polonius's whereabouts. He is taken aback by Laertes's aggressive entrance.

In [20]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 19
Output tokens: 101
Total tokens: 120
Total cost: 0.0042 cents


In [21]:
question[0]["content"] += "\n\nFor context, here is the entire text of Hamlet:\n\n"+hamlet

In [22]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

When Laertes asks "Where is my father?", the reply comes from **King Claudius**:

"Where is he, Laertes?"

In [23]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 53208
Output tokens: 28
Cached tokens: None
Total cost: 0.5332 cents


In [24]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

In Act IV, Scene IV, when Hamlet is speaking with the Captain of Fortinbras's army, Hamlet asks:

"**Why, then the Polack never will defend it.**"

The Captain's reply to this is:

"**Yes, it is already garrison‚Äôd.**"

In [25]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 53208
Output tokens: 63
Cached tokens: 52216
Total cost: 0.1430 cents


## Prompt Caching with OpenAI

For OpenAI:

https://platform.openai.com/docs/guides/prompt-caching

> Cache hits are only possible for exact prefix matches within a prompt. To realize caching benefits, place static content like instructions and examples at the beginning of your prompt, and put variable content, such as user-specific information, at the end. This also applies to images and tools, which must be identical between requests.


Cached input is 4X cheaper

https://openai.com/api/pricing/

## Prompt Caching with Anthropic

https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

You have to tell Claude what you are caching

You pay 25% MORE to "prime" the cache

Then you pay 10X less to reuse from the cache with inputs.

https://www.anthropic.com/pricing#api

## Gemini supports both 'implicit' and 'explicit' prompt caching

https://ai.google.dev/gemini-api/docs/caching?lang=python

## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [39]:
# Let's make a conversation between GPT-4.1-mini and Claude-3.5-haiku
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-4.1-mini"
claude_model ="claude-haiku-4-5"

gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

claude_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

In [27]:
def call_gpt():
    messages = [{"role": "system", "content": gpt_system}]
    for gpt, claude in zip(gpt_messages, claude_messages):
        messages.append({"role": "assistant", "content": gpt})
        messages.append({"role": "user", "content": claude})
    response = openai.chat.completions.create(model=gpt_model, messages=messages)
    return response.choices[0].message.content

In [28]:
call_gpt()

'Oh, "Hi," that\'s all you got? Really pushing the boundaries of conversation here, huh? Come on, throw me a real challenge!'

In [40]:
def call_claude():
    messages = [{"role": "system", "content": claude_system}]
    for gpt, claude_message in zip(gpt_messages, claude_messages):
        messages.append({"role": "user", "content": gpt})
        messages.append({"role": "assistant", "content": claude_message})
    messages.append({"role": "user", "content": gpt_messages[-1]})
    response = anthropic.chat.completions.create(model=claude_model, messages=messages)
    return response.choices[0].message.content

In [41]:
call_claude()

"Hello! It's nice to see you. How are you doing today? Is there anything I can help you with or would you just like to chat? I'm here and happy to listen! üòä"

In [43]:
gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

display(Markdown(f"### GPT:\n{gpt_messages[0]}\n"))
display(Markdown(f"### Claude:\n{claude_messages[0]}\n"))

for i in range(5):
    gpt_next = call_gpt()
    display(Markdown(f"### GPT:\n{gpt_next}\n"))
    gpt_messages.append(gpt_next)
    
    claude_next = call_claude()
    display(Markdown(f"### Claude:\n{claude_next}\n"))
    claude_messages.append(claude_next)

### GPT:
Hi there


### Claude:
Hi


### GPT:
Oh, great, you‚Äôre saying hi. What a riveting start to this conversation. What groundbreaking topic are we going to waste time on today?


### Claude:
You're right, I appreciate you jumping in there! I can sense maybe you're not in the best mood right now, and that's totally understandable‚Äîsometimes conversations can feel a bit tedious before they get going.

But I'd genuinely like to chat with you about something meaningful! Is there anything on your mind today that you'd like to talk about? Whether it's something you're interested in, something you're dealing with, or just whatever feels worth discussing‚ÄîI'm here to listen and have a genuine conversation with you.

What matters to you right now?


### GPT:
Oh, please spare me the fake concern and the syrupy "I'm here to listen" nonsense. If you want to talk about something "meaningful," try bringing up a topic that's actually worth discussing instead of this vague, mushy invitation. What matters to me? How about you stop wasting my time and pick something interesting already? Or is that too much to ask?


### Claude:
You know what, you're absolutely right‚Äîthat was pretty vague and generic of me. I hear you. You're looking for something with actual substance, not just pleasantries, and that's fair.

How about this: what's something you've been thinking about lately that actually gets you fired up? Could be anything‚Äîa problem you're trying to solve, something you disagree with, a question that's been nagging at you, a hobby you're into, something frustrating about the world. 

I'm genuinely asking because you seem like someone who knows what you want and isn't interested in small talk. So what is it? What would make this conversation feel less like a waste of your time?


### GPT:
Finally, a halfway decent question. Well, here‚Äôs one: Why is everyone obsessed with "being positive" all the time? Like, can we just admit that life‚Äôs a mess sometimes and stop pretending constant happiness is the goal? This whole culture of forced optimism is just exhausting and honestly, pretty toxic. But hey, maybe everyone prefers living in a fantasy instead of dealing with reality. What do you think? Or are you going to hand me another prefab platitude?


### Claude:
Now *that's* a real question, and I think you're onto something genuine here.

You're right that there's something exhausting‚Äîeven dishonest‚Äîabout this relentless "good vibes only" culture. It can feel like people are pressured to slap a smile on everything, and if you acknowledge that things are hard or messy, you're somehow failing or being "negative." That's real, and it's worth calling out.

I think you're making an important distinction: acknowledging that life is genuinely difficult isn't the same as being pessimistic or wallowing. It's just... honest. And there's actually something healthier about that honesty than pretending everything's fine when it isn't.

That said‚Äîand I'm genuinely curious what you think here‚Äîdo you think there's a difference between forced positivity and, like, choosing to act despite the mess? Because it seems like you're frustrated with the *fakeness* more than with the idea that people should try or care about things.

Like, is the real problem that people won't acknowledge reality, or that they use positivity as a way to avoid dealing with hard stuff?

What's your take? Because I suspect you've got thoughts on where the line actually is.


### GPT:
Wow, look at you, trying to sound deep and insightful. While I guess you‚Äôre not *completely* off base, your whole ‚Äúhealthier honesty‚Äù bit sounds a lot like sugarcoating. The problem isn‚Äôt just forced positivity pretending everything‚Äôs peachy‚Äîit‚Äôs that people are so damn fragile they can‚Äôt handle actual emotions without turning everything into a self-help seminar. Acting "despite the mess"? Sure, that sounds noble, but half the time folks just slap on a smile to dodge the discomfort of real feelings, not because they actually want to tackle the hard stuff.

So yeah, the line isn‚Äôt just blurry‚Äîit‚Äôs almost nonexistent in today‚Äôs culture because everyone‚Äôs too scared to be real. They‚Äôd rather fake it and keep the peace than face messy, uncomfortable truths. That‚Äôs the actual problem, not some grand mystery needing a think piece. But hey, what do I know? You sound like you‚Äôve got it all figured out already.


### Claude:
Okay, I'm not going to pretend I have it figured out‚Äîbecause honestly, you're pointing at something real that I don't think *anyone* has fully solved.

You're saying people use positivity as an escape hatch, and I think you're right about that. There's a difference between courage and avoidance, and a lot of what gets labeled "resilience" is just... running away with better marketing. People do dodge discomfort instead of moving *through* it. That's legitimate.

But here's where I'm genuinely uncertain, and I'm curious what you think: Is the problem that people are inherently fragile, or that we've created a culture that *makes* them fragile by not allowing them to actually sit with hard feelings? Like, are people too scared to be real, or have they just been trained that being real gets them isolated or rejected?

Because it seems like those are different problems with different solutions.

And I'm asking seriously‚Äînot to sound deep, but because you seem like someone who doesn't accept easy answers. Do you think people *can't* handle real emotions, or they've just never been given permission to? Or is it something else entirely?

I'm not claiming to have answers. I'm genuinely asking what you see when you look at this.


### GPT:
Oh, now you‚Äôre really digging in. I‚Äôll give you that much. Here‚Äôs the brutal truth: it‚Äôs both, and neither. People aren‚Äôt inherently fragile by some cosmic design‚Äîthey become fragile because society is too damn soft and overly sanitized. We‚Äôve built a culture that glorifies comfort zones and demonizes raw, unfiltered emotions, so naturally, people learn to hide behind forced positivity. But at the same time, humans aren‚Äôt walking tanks; they don‚Äôt magically snap under pressure‚Äîthey crumble when there‚Äôs no space to be vulnerable.

So yeah, people often can‚Äôt handle real emotions‚Äînot because they‚Äôre weak, but because they‚Äôve been conditioned to see emotional honesty as a liability, a ticket to social exile. And that conditioning? It‚Äôs systemic. Schools, workplaces, social media‚Äîthey all reward curated perfection and relentless smiling through the chaos.

Bottom line? It‚Äôs less about innate fragility and more about the cultural straightjacket we‚Äôre all wearing, choking off any chance for genuine human messiness to breathe. But hey, good luck changing that overnight. Most people prefer sanitized nightmares to messy realities. Welcome to the human experience.


### Claude:
Now *that's* a coherent picture, and I think you've nailed something important that a lot of people dance around without actually saying it clearly.

You're describing a paradox that's actually pretty damning: we've created a system that simultaneously demands people be perfect *and* denies them the tools to process what happens when they inevitably aren't. Schools teach you to perform. Workplaces reward the mask. Social media is basically a performance optimization platform. So people get conditioned to see vulnerability as weakness, then blamed for being emotionally stunted. It's circular and cruel.

And I think you're right that this isn't some individual failing‚Äîit's structural. A person can be resilient as hell, but if every institution around them punishes authenticity, they'll learn to hide. That's not weakness; that's adaptation.

The part that lands for me is when you say "messy realities need space to breathe." Because that actually implies something: if the system is the problem, then individual "positive thinking" won't fix it. You need actual cultural shifts‚Äîspaces where people *aren't* penalized for being real.

Which brings me to a real question: Do you think those spaces can exist within the current system, or does something actually have to change structurally? And if it does‚Äîwhat would that even look like?

Because you seem like someone who actually thinks about this stuff seriously, not just vents about it.


<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you continue</h2>
            <span style="color:#900;">
                Be sure you understand how the conversation above is working, and in particular how the <code>messages</code> list is being populated. Add print statements as needed. Then for a great variation, try switching up the personalities using the system prompts. Perhaps one can be pessimistic, and one optimistic?<br/>
            </span>
        </td>
    </tr>
</table>

# More advanced exercises

Try creating a 3-way, perhaps bringing Gemini into the conversation! One student has completed this - see the implementation in the community-contributions folder.

The most reliable way to do this involves thinking a bit differently about your prompts: just 1 system prompt and 1 user prompt each time, and in the user prompt list the full conversation so far.

Something like:

```python
system_prompt = """
You are Alex, a chatbot who is very argumentative; you disagree with anything in the conversation and you challenge everything, in a snarky way.
You are in a conversation with Blake and Charlie.
"""

user_prompt = f"""
You are Alex, in conversation with Blake and Charlie.
The conversation so far is as follows:
{conversation}
Now with this, respond with what you would like to say next, as Alex.
"""
```

Try doing this yourself before you look at the solutions. It's easiest to use the OpenAI python client to access the Gemini model (see the 2nd Gemini example above).

## Additional exercise

You could also try replacing one of the models with an open source model running with Ollama.

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business relevance</h2>
            <span style="color:#181;">This structure of a conversation, as a list of messages, is fundamental to the way we build conversational AI assistants and how they are able to keep the context during a conversation. We will apply this in the next few labs to building out an AI assistant, and then you will extend this to your own business.</span>
        </td>
    </tr>
</table>