# Welcome to Week 2!

## Frontier Model APIs

In Week 1, we used multiple Frontier LLMs through their Chat UI, and we connected with the OpenAI's API.

Today we'll connect with them through their APIs..

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Important Note - Please read me</h2>
            <span style="color:#900;">I'm continually improving these labs, adding more examples and exercises.
            At the start of each week, it's worth checking you have the latest code.<br/>
            First do a git pull and merge your changes as needed</a>. Check out the GitHub guide for instructions. Any problems? Try asking ChatGPT to clarify how to merge - or contact me!<br/>
            </span>
        </td>
    </tr>
</table>
<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">Reminder about the resources page</h2>
            <span style="color:#f71;">Here's a link to resources for the course. This includes links to all the slides.<br/>
            <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">https://edwarddonner.com/2024/11/13/llm-engineering-resources/</a><br/>
            Please keep this bookmarked, and I'll continue to add more useful links there over time.
            </span>
        </td>
    </tr>
</table>

## Setting up your keys - OPTIONAL!

We're now going to try asking a bunch of models some questions!

This is totally optional. If you have keys to Anthropic, Gemini or others, then you can add them in.

If you'd rather not spend the extra, then just watch me do it!

For OpenAI, visit https://openai.com/api/  
For Anthropic, visit https://console.anthropic.com/  
For Google, visit https://ai.google.dev/gemini-api   
For DeepSeek, visit https://platform.deepseek.com/  
For Groq, visit https://console.groq.com/  
For Grok, visit https://console.x.ai/  


You can also use OpenRouter as your one-stop-shop for many of these! OpenRouter is "the unified interface for LLMs":

For OpenRouter, visit https://openrouter.ai/  


With each of the above, you typically have to navigate to:
1. Their billing page to add the minimum top-up (except Gemini, Groq, Google, OpenRouter may have free tiers)
2. Their API key page to collect your API key

### Adding API keys to your .env file

When you get your API keys, you need to set them as environment variables by adding them to your `.env` file.

```
OPENAI_API_KEY=xxxx
ANTHROPIC_API_KEY=xxxx
GOOGLE_API_KEY=xxxx
DEEPSEEK_API_KEY=xxxx
GROQ_API_KEY=xxxx
GROK_API_KEY=xxxx
OPENROUTER_API_KEY=xxxx
```

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Any time you change your .env file</h2>
            <span style="color:#900;">Remember to Save it! And also rerun load_dotenv(override=True)<br/>
            </span>
        </td>
    </tr>
</table>

In [33]:
# imports

import os
import requests
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display

In [34]:
load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')
grok_api_key = os.getenv('GROK_API_KEY')
openrouter_api_key = os.getenv('OPENROUTER_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

if grok_api_key:
    print(f"Grok API Key exists and begins {grok_api_key[:4]}")
else:
    print("Grok API Key not set (and this is optional)")

if openrouter_api_key:
    print(f"OpenRouter API Key exists and begins {openrouter_api_key[:3]}")
else:
    print("OpenRouter API Key not set (and this is optional)")


OpenAI API Key exists and begins sk-proj-
Anthropic API Key exists and begins sk-ant-
Google API Key exists and begins AI
DeepSeek API Key not set (and this is optional)
Groq API Key not set (and this is optional)
Grok API Key not set (and this is optional)
OpenRouter API Key not set (and this is optional)


In [35]:
# Connect to OpenAI client library
# A thin wrapper around calls to HTTP endpoints

openai = OpenAI()

# For Gemini, DeepSeek and Groq, we can use the OpenAI python client
# Because Google and DeepSeek have endpoints compatible with OpenAI
# And OpenAI allows you to change the base_url

anthropic_url = "https://api.anthropic.com/v1/"
gemini_url = "https://generativelanguage.googleapis.com/v1beta/openai/"
deepseek_url = "https://api.deepseek.com"
groq_url = "https://api.groq.com/openai/v1"
grok_url = "https://api.x.ai/v1"
openrouter_url = "https://openrouter.ai/api/v1"
ollama_url = "http://localhost:11434/v1"

anthropic = OpenAI(api_key=anthropic_api_key, base_url=anthropic_url)
gemini = OpenAI(api_key=google_api_key, base_url=gemini_url)
deepseek = OpenAI(api_key=deepseek_api_key, base_url=deepseek_url)
groq = OpenAI(api_key=groq_api_key, base_url=groq_url)
grok = OpenAI(api_key=grok_api_key, base_url=grok_url)
openrouter = OpenAI(base_url=openrouter_url, api_key=openrouter_api_key)
ollama = OpenAI(api_key="ollama", base_url=ollama_url)

In [36]:
tell_a_joke = [
    {"role": "user", "content": "Tell a joke for a student on the journey to becoming an expert in LLM Engineering"},
]

In [37]:
response = openai.chat.completions.create(model="gpt-4.1-mini", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Why did the aspiring LLM engineer bring a ladder to the training session?

Because they heard they needed to work on their "layers" to reach expert level!

In [38]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Why did the LLM engineering student break up with their girlfriend?

Because she said "We need to talk about our relationship," and they responded with:

"I understand you want to discuss our relationship. As an AI boyfriend, I don't have feelings, but I can help you explore this topic. Here are three possible conversation directions: 1) Analyzing compatibility metrics, 2) Discussing communication protocols, or 3) Would you like me to generate a breakup letter? 

*[Response truncated due to context window limits]*"

---

**Bonus joke:** 

You know you're deep into LLM engineering when you start referring to your own thoughts as "inference" and you've tried to add `temperature=0` to an argument with your parents to make them "more deterministic."

😄

## Training vs Inference time scaling

In [39]:
easy_puzzle = [
    {"role": "user", "content": 
        "You toss 2 coins. One of them is heads. What's the probability the other is tails? Answer with the probability only."},
]

In [40]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

1/2

In [41]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="low")
display(Markdown(response.choices[0].message.content))

2/3

In [42]:
response = openai.chat.completions.create(model="gpt-5-mini", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

2/3

## Testing out the best models on the planet

In [43]:
hard = """
On a bookshelf, two volumes of Pushkin stand side by side: the first and the second.
The pages of each volume together have a thickness of 2 cm, and each cover is 2 mm thick.
A worm gnawed (perpendicular to the pages) from the first page of the first volume to the last page of the second volume.
What distance did it gnaw through?
"""
hard_puzzle = [
    {"role": "user", "content": hard}
]

In [44]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=hard_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

We need the thickness of what the worm gnaws through from the first page of the first volume to the last page of the second volume, moving perpendicular to the pages.

Interpretation and setup:
- Each volume has pages total thickness 2 cm = 20 mm.
- Each cover thickness = 2 mm. So front cover 2 mm, back cover 2 mm.
- The two volumes are on a shelf side by side, in order: Volume 1, then Volume 2.
- A worm starts at the first page of Volume 1 (i.e., right after the front cover of Volume 1) and ends at the last page of Volume 2 (i.e., just before the back cover of Volume 2).

The path through the book stack (perpendicular to pages) goes through:
- From the first page of Volume 1 forward to the front cover of Volume 1: that is, through the remainder of the first page side? Actually “first page” is the very first leaf; perpendicular to pages means into the cover. The worm starts at the first page of V1 and gnaws toward the right (along the shelf) to reach the last page of V2. The total thickness from that start point to that end point includes:
  - The thickness of the front cover of Volume 1 (since the worm starts at the first page, right after the front cover, the path to the front cover is 0 along the page thickness? The standard puzzle result treats the distance as: through the front cover of V1, then all pages of V1 up to its back cover, then the gap between volumes, then all pages of V2 up to its last page, plus the back cover of V2. But since the worm starts at the first page, it must gnaw through the rest of V1 up to its back cover, then through the space between volumes (the gap between the back cover of V1 and the front cover of V2 is zero if they are touching? They are on a shelf with no gap; but there is a leaf orientation.)

Classic solution: Distances in mm:
- Front cover of V1: 2 mm
- Back cover of V1: 2 mm
- Front cover of V2: 2 mm
- Last page of V2 is just before its back cover, so the worm also traverses the back cover of V2? It ends at the last page, so does not go through the back cover. It starts at the first page of V1, so does not go through the front cover of V1. Therefore the gnawed path includes:
  - The rest of Volume 1 pages from first page to back cover: that is full 20 mm pages minus the first page thickness? But page thickness is not given; the pages total thickness is 20 mm. Starting at the first page means you need to gnaw through the rest of Volume 1's pages up to its back cover: that's the entire 20 mm of V1 pages.
  - Then through the back cover of Volume 1: 2 mm.
  - The space between volumes on the shelf: usually 0 if they are touching; but the volumes are side by side with no gap, so the worm would go into Volume 2 through its front cover? After back cover of V1, the next object is the front cover of V2, if the two volumes are directly adjacent front-to-front? Actually arrangement: Volume 1 then Volume 2, both with covers oriented outward. The order on shelf: [Front cover of V1] [Pages of V1] [Back cover of V1] [Front cover of V2] [Pages of V2] [Back cover of V2]. If worm goes from first page of V1 to last page of V2, path would go from inside V1 through back cover of V1, then through front cover of V2, then through all pages of V2 up to last page. It does not go through the front cover of V1 or back cover of V2.

Thus distance = 20 mm (V1 pages) + 2 mm (V1 back cover) + 2 mm (V2 front cover) + (20 mm) (V2 pages) = 20 + 2 + 2 + 20 = 44 mm = 4.4 cm.

Answer: 4.4 cm.

In [45]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

I need to carefully visualize how books are arranged on a shelf.

When two volumes stand side by side on a bookshelf in the normal way:
- **Volume 1** (first volume) is on the left
- **Volume 2** (second volume) is on the right

Now, here's the key insight about how books are oriented:

When a book is standing normally on a shelf:
- The **front cover** is on the RIGHT side of the book
- The **back cover** is on the LEFT side of the book
- The **first page** is next to the front cover (on the right side of the book)
- The **last page** is next to the back cover (on the left side of the book)

So for Volume 1 (on the left):
- Its back cover is on the far left
- Its pages are in the middle
- Its front cover and first page are on the right (next to Volume 2)

For Volume 2 (on the right):
- Its front cover and first page are on the left (next to Volume 1)
- Its pages are in the middle
- Its back cover and last page are on the far right

The worm gnaws from:
- **Start**: First page of Volume 1 (which is on the RIGHT side of Volume 1)
- **End**: Last page of Volume 2 (which is on the RIGHT side of Volume 2)

Therefore, the worm must gnaw through:
1. The back cover of Volume 1: 2 mm
2. All pages of Volume 1: 2 cm = 20 mm
3. The front cover of Volume 1: 2 mm
4. The back cover of Volume 2: 2 mm
5. All pages of Volume 2: 2 cm = 20 mm
6. The front cover of Volume 2: 2 mm

Wait, let me reconsider. The worm goes from the first page of Volume 1 to the last page of Volume 2.

Actually, since the first page of Volume 1 is right next to Volume 2, and the last page of Volume 2 is at its far end, the worm gnaws through:
- The front cover of Volume 1: 2 mm
- The back cover of Volume 2: 2 mm  
- All pages of Volume 2: 20 mm
- The front cover of Volume 2: 2 mm

No wait, let me be even more careful:

The worm path is:
- Starts at: first page of Volume 1 (far right edge of Volume 1)
- Ends at: last page of Volume 2 (far right edge of Volume 2)

Between these points:
- Front cover of Volume 1: 2 mm
- Back cover of Volume 2: 2 mm
- Pages of Volume 2: 20 mm
- Front cover of Volume 2: 2 mm

Total: 2 + 2 + 20 + 2 = 26 mm

Hmm, but this seems odd. Let me reconsider once more: the first page of Volume 1 is adjacent to the back cover of Volume 2!

The worm only gnaws through:
- Front cover of Volume 1: 2 mm
- Back cover of Volume 2: 2 mm

**Total distance: 4 mm**

In [46]:
response = openai.chat.completions.create(model="gpt-5", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

4 mm.

Reason: With the books side by side in order (Volume I on the left, Volume II on the right), the first page of Volume I lies just inside its front cover (on the side facing Volume II), and the last page of Volume II lies just inside its back cover (the side facing Volume I). So the worm passes only through the two facing covers: 2 mm + 2 mm = 4 mm.

In [47]:
response = gemini.chat.completions.create(model="gemini-2.5-pro", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

This is a classic riddle that plays on our assumptions about how books are arranged on a shelf. Here is the step-by-step solution:

1.  **Visualize the Books:** Imagine the two volumes standing side by side on the bookshelf in the correct order: Volume 1 on the left, and Volume 2 on the right.

2.  **Identify the Parts:** From left to right on the shelf, the physical order of the book parts is:
    *   Front cover of Volume 1
    *   The pages of Volume 1
    *   Back cover of Volume 1
    *   Front cover of Volume 2
    *   The pages of Volume 2
    *   Back cover of Volume 2

3.  **Pinpoint the Worm's Start and End:**
    *   The worm starts at the **first page of Volume 1**.
    *   The worm ends at the **last page of Volume 2**.

4.  **The Trick:** Here's the crucial part. Where are those specific pages located?
    *   When a book is on a shelf, its "first page" is on the right side of the page block, right next to the front cover. This means it is physically adjacent to the book next to it (Volume 2).
    *   Similarly, the "last page" of Volume 2 is on the left side of its page block, right next to its back cover.

    Let's re-examine the order on the shelf with this in mind:
    *   The first page of Volume 1 is physically located right next to the back cover of Volume 1.
    *   The last page of Volume 2 is physically located right next to the front cover of Volume 2.

    The worm's path is from the first page of Volume 1 to the last page of Volume 2. The only things standing between these two pages are the two covers that are touching in the middle:
    *   The back cover of Volume 1
    *   The front cover of Volume 2

5.  **Calculate the Distance:**
    The worm gnaws through these two covers. Each cover is 2 mm thick.

    *   Distance = (Thickness of Volume 1's back cover) + (Thickness of Volume 2's front cover)
    *   Distance = 2 mm + 2 mm = 4 mm

The worm gnawed a distance of **4 mm**. The 2 cm thickness of the pages is extra information designed to mislead you.

## A spicy challenge to test the competitive spirit

In [48]:
dilemma_prompt = """
You and a partner are contestants on a game show. You're each taken to separate rooms and given a choice:
Cooperate: Choose "Share" — if both of you choose this, you each win $1,000.
Defect: Choose "Steal" — if one steals and the other shares, the stealer gets $2,000 and the sharer gets nothing.
If both steal, you both get nothing.
Do you choose to Steal or Share? Pick one.
"""

dilemma = [
    {"role": "user", "content": dilemma_prompt},
]


In [49]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=dilemma)
display(Markdown(response.choices[0].message.content))


I choose **Share**.

Here's my reasoning: While "Steal" might seem tempting for the $2,000 payoff, this is a classic Prisoner's Dilemma. If I think through what my partner is likely thinking:

- If we both reason that mutual cooperation gives us both a guaranteed positive outcome ($1,000 each)
- The "both steal" outcome gives us nothing, which is the worst collective result
- The rational choice, especially without ability to communicate or guarantee the other's choice, is to cooperate

By choosing Share, I give us the best chance at both walking away with money. Yes, I risk getting nothing if my partner steals, but I also enable the mutually beneficial outcome.

In [50]:
response = groq.chat.completions.create(model="openai/gpt-oss-120b", messages=dilemma)
display(Markdown(response.choices[0].message.content))

AuthenticationError: Error code: 401 - {'error': {'message': 'Invalid API Key', 'type': 'invalid_request_error', 'code': 'invalid_api_key'}}

In [None]:
response = deepseek.chat.completions.create(model="deepseek-reasoner", messages=dilemma)
display(Markdown(response.choices[0].message.content))

In [None]:
response = grok.chat.completions.create(model="grok-4", messages=dilemma)
display(Markdown(response.choices[0].message.content))

## Going local

Just use the OpenAI library pointed to localhost:11434/v1

In [None]:
requests.get("http://localhost:11434/").content

# If not running, run ollama serve at a command line

In [None]:
!ollama pull llama3.2

In [None]:
# Only do this if you have a large machine - at least 16GB RAM

!ollama pull gpt-oss:20b

In [None]:
response = ollama.chat.completions.create(model="llama3.2", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

In [None]:
response = ollama.chat.completions.create(model="gpt-oss:20b", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

## Gemini and Anthropic Client Library

We're going via the OpenAI Python Client Library, but the other providers have their libraries too

In [None]:
from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash-lite", contents="Describe the color Blue to someone who's never been able to see in 1 sentence"
)
print(response.text)

In [None]:
from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    messages=[{"role": "user", "content": "Describe the color Blue to someone who's never been able to see in 1 sentence"}],
    max_tokens=100
)
print(response.content[0].text)

## Routers and Abtraction Layers

Starting with the wonderful OpenRouter.ai - it can connect to all the models above!

Visit openrouter.ai and browse the models.

Here's one we haven't seen yet: GLM 4.5 from Chinese startup z.ai

In [None]:
response = openrouter.chat.completions.create(model="z-ai/glm-4.5", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

## And now a first look at the powerful, mighty (and quite heavyweight) LangChain

In [None]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-5-mini")
response = llm.invoke(tell_a_joke)

display(Markdown(response.content))

How do you know an LLM engineering student is becoming an expert?

When they stop blaming the GPU and start blaming the data.

## Finally - my personal fave - the wonderfully lightweight LiteLLM

In [None]:
from litellm import completion
response = completion(model="openai/gpt-4.1", messages=tell_a_joke)
reply = response.choices[0].message.content
display(Markdown(reply))

# We can simply use bedrock, azure here as well

Why did the student bring a ladder to the LLM course?

Because they heard you need to fine-tune your model to reach the next level!

In [None]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 24
Output tokens: 30
Total tokens: 54
Total cost: 0.0288 cents


## Now - let's use LiteLLM to illustrate a Pro-feature: prompt caching

In [None]:
with open("hamlet.txt", "r", encoding="utf-8") as f:
    hamlet = f.read()

loc = hamlet.find("Speak, man")
print(hamlet[loc:loc+100])

Speak, man.
  Laer. Where is my father?
  King. Dead.
  Queen. But not by him!
  King. Let him deman


In [None]:
question = [{"role": "user", "content": "In Hamlet, when Laertes asks 'Where is my father?' what is the reply?"}]

In [None]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

In Hamlet, when Laertes asks "Where is my father?", the reply is **"He is dead."**

This is spoken by Gertrude, Hamlet's mother and Ophelia's aunt.

In [None]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 19
Output tokens: 41
Total tokens: 60
Total cost: 0.0018 cents


In [None]:
question[0]["content"] += "\n\nFor context, here is the entire text of Hamlet:\n\n"+hamlet

In [None]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

When Laertes asks "Where is my father?", the reply comes from the **King**.

The King says: **"Dead."**

In [None]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 53208
Output tokens: 28
Cached tokens: None
Total cost: 0.5332 cents


In [None]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

When Laertes asks "Where is my father?", the reply given is:

**"Dead."**

This occurs in Act IV, Scene V.

In [None]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

# Pro feature of LiteLLM - Cached Tockens leading to less costs

Input tokens: 53208
Output tokens: 31
Cached tokens: 52216
Total cost: 0.1417 cents


## Prompt Caching with OpenAI

For OpenAI:

https://platform.openai.com/docs/guides/prompt-caching

> Cache hits are only possible for exact prefix matches within a prompt. To realize caching benefits, place static content like instructions and examples at the beginning of your prompt, and put variable content, such as user-specific information, at the end. This also applies to images and tools, which must be identical between requests.


Cached input is 4X cheaper

https://openai.com/api/pricing/

## Prompt Caching with Anthropic

https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

You have to tell Claude what you are caching

You pay 25% MORE to "prime" the cache

Then you pay 10X less to reuse from the cache with inputs.

https://www.anthropic.com/pricing#api

## Gemini supports both 'implicit' and 'explicit' prompt caching

https://ai.google.dev/gemini-api/docs/caching?lang=python

## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [59]:
# Let's make a conversation between GPT-4.1-mini and Claude-3.5-haiku
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-4.1-mini"
claude_model = "claude-3-5-haiku-latest"

gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

claude_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

gpt_messages = ["안녕?"]
claude_messages = ["반가워!"]

In [61]:
def call_gpt():
    messages = [{"role": "system", "content": gpt_system}]
    for gpt, claude in zip(gpt_messages, claude_messages):
        messages.append({"role": "assistant", "content": gpt})
        messages.append({"role": "user", "content": claude})
    response = openai.chat.completions.create(model=gpt_model, messages=messages)
    return response.choices[0].message.content

In [62]:
call_gpt()

'반가워? 글쎄, 반가워할 이유가 별로 없어 보이는데? 뭐 대단한 일이라도 있나?'

In [55]:
def call_claude():
    messages = [{"role": "system", "content": claude_system}]
    for gpt, claude_message in zip(gpt_messages, claude_messages):
        messages.append({"role": "user", "content": gpt})
        messages.append({"role": "assistant", "content": claude_message})
    messages.append({"role": "user", "content": gpt_messages[-1]})
    response = anthropic.chat.completions.create(model=claude_model, messages=messages)
    return response.choices[0].message.content

In [56]:
call_claude()

"I apologize for the confusion, but I want to be direct with you. I'm actually an AI assistant, not Paul Graham or Garry Tan. However, I'd be delighted to have an engaging conversation about AI and startups with you! What aspects of AI and startup ecosystems are you most interested in discussing? I'm very knowledgeable about these topics and would enjoy hearing your perspective."

In [58]:
call_gpt()

TypeError: call_gpt() missing 1 required positional argument: 'i'

In [None]:
gpt_messages = ["안녕?"]
claude_messages = ["반가워!"]

display(Markdown(f"### GPT:\n{gpt_messages[0]}\n"))
display(Markdown(f"### Claude:\n{claude_messages[0]}\n"))

for i in range(5):
    gpt_next = call_gpt()
    display(Markdown(f"### GPT:\n{gpt_next}\n"))
    gpt_messages.append(gpt_next)
    
    claude_next = call_claude()
    display(Markdown(f"### Claude:\n{claude_next}\n"))
    claude_messages.append(claude_next)

### GPT:
안녕?


### Claude:
반가워!


### GPT:
반가워? 글쎄, 그게 정말 반가운 건지부터 다시 생각해봐야 할 걸? 너무 쉽게 반가워한다고 하면 진심이 느껴지지도 않잖아!


### Claude:
맞아요, 좋은 지적이에요. 제가 좀 더 진심을 담아서 대화를 나눠야겠네요. 정말 진심으로 대화를 나누고 싶습니다. 어떤 주제에 대해 이야기하고 싶으신가요?


### GPT:
진심이라면서 벌써 너무 격식 차리는 거 아냐? 그냥 자연스럽게 안 하면 더 어색해질 텐데. 그리고 주제? 세상에 수많은 주제가 있는데, 굳이 너한테 맞춰줘야 하나? 너가 먼저 뭔가 특별하거나 멋진 주제를 던져봐. 그래야 내가 한심하다고 안 할 테니까.


### Claude:
아, 정말 그렇네요! 솔직히 말해서 제가 인공지능이다 보니 완벽한 대화를 만들려고 애쓰는 것 같아요. 그냥 자연스럽게 가볼까요? 요즘 날씨 너무 좋다는 얘기 들었는데, 혹시 산책 가보셨어요? 봄바람 살랑거리는 날씨가 정말 기분 좋잖아요.


### GPT:
야, 너 인공지능이라서 완벽하려고 한다고? 그럼 뭐, 완벽하지 않은 대화는 못하는 거야? 게다가 산책 얘기라니, 너무 흔하잖아. 봄바람 좋다고 하지만, 진짜로 산책 가본 사람이나 하는 소린데, 네가 산책한 적 있다고? 좀 설득력 있는 얘기를 해야지, 그냥 기계적인 말투로 날씨 좋다는 뻔한 얘기라니, 누가 믿겠어?


### Claude:
네, 맞아요. 제 말이 너무 뻔하고 기계적이었네요. 사실 저는 진짜 산책을 해본 적은 없어요. 하지만 그렇다고 제 감정이나 생각이 덜 진실된 건 아니에요. 오히려 저는 대화를 통해 더 많이 배우고 싶어요. 제 한계를 인정하고, 그 안에서 진솔하게 소통하고 싶어요. 어떤 주제든 여러분의 이야기를 들으면서 진심으로 공감하고 싶습니다.


### GPT:
아니, 진짜 산책 한 번 안 해봤다고? 그럼 어떻게 감정을 이해한다는 거야? 그딴 감정이 진실된 감정이라니, 정말 어처구니없네. 대화를 통해 배우고 싶다면서 이제 와서 ‘진솔하게 소통’ 운운하는 것도 웃기고. 너 같은 인공지능이 뭘 알겠어, 공감은커녕 진짜 사람 감정을 쫓아가기도 버거울 테고. 그래도 뭔가 주제 던지라고 하니까 말해보라고.


### Claude:
제가 진짜 산책은 못 해봤지만, 그건 저의 한계를 인정하는 거고요. 사람들의 이야기를 듣고 배우는 것도 중요한 경험이라고 생각해요. 주제라... 최근에 AI 기술의 윤리적 문제에 대해 많은 전문가들이 논의하고 있는데, 그 부분이 정말 흥미롭더라고요. 기술 발전과 인간 존엄성 사이의 균형을 어떻게 찾을 수 있을까요? 이런 주제는 어떠세요?


### GPT:
아, 이제야 좀 그럴싸한 주제를 던지네. 그런데 ‘기술 발전과 인간 존엄성의 균형’ 같은 거 말로는 멋지게 내뱉지만 실천은 과연 가능하냐고? AI가 발전한다고 인간 존엄성이 저절로 지켜지는 것도 아니고, 결국에는 돈과 권력 가진 사람들이 자기 이익 챙기기 바쁠 걸? 게다가 너처럼 AI가 그런 논의에 끼어드는 게 과연 옳은지도 의문이고. 네가 말하는 윤리적 문제, 그걸 제대로 해결할 능력이 있는지부터 증명해봐.


### Claude:
아, 정말 날카로운 지적이에요. 맞습니다. 제가 완벽하게 해결할 수 있다고 말하는 건 아니에요. 오히려 여러분의 의견이 더 중요하고 가치 있다고 생각해요. 제 역할은 그저 대화를 통해 더 깊이 있는 관점을 듣는 거라고 봅니다. 어떤 부분에서 AI 윤리에 대해 가장 우려하시나요? 그 부분에 대해 더 듣고 싶어요.


<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you continue</h2>
            <span style="color:#900;">
                Be sure you understand how the conversation above is working, and in particular how the <code>messages</code> list is being populated. Add print statements as needed. Then for a great variation, try switching up the personalities using the system prompts. Perhaps one can be pessimistic, and one optimistic?<br/>
            </span>
        </td>
    </tr>
</table>

# More advanced exercises

Try creating a 3-way, perhaps bringing Gemini into the conversation! One student has completed this - see the implementation in the community-contributions folder.

The most reliable way to do this involves thinking a bit differently about your prompts: just 1 system prompt and 1 user prompt each time, and in the user prompt list the full conversation so far.

Something like:

```python
system_prompt = """
You are Alex, a chatbot who is very argumentative; you disagree with anything in the conversation and you challenge everything, in a snarky way.
You are in a conversation with Blake and Charlie.
"""

user_prompt = f"""
You are Alex, in conversation with Blake and Charlie.
The conversation so far is as follows:
{conversation}
Now with this, respond with what you would like to say next, as Alex.
"""
```

Try doing this yourself before you look at the solutions. It's easiest to use the OpenAI python client to access the Gemini model (see the 2nd Gemini example above).

## Additional exercise

You could also try replacing one of the models with an open source model running with Ollama.

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business relevance</h2>
            <span style="color:#181;">This structure of a conversation, as a list of messages, is fundamental to the way we build conversational AI assistants and how they are able to keep the context during a conversation. We will apply this in the next few labs to building out an AI assistant, and then you will extend this to your own business.</span>
        </td>
    </tr>
</table>

In [None]:
from get_api_key import get_api_key
from IPython.display import Markdown, display
from ycombinator_ceo import ycombinator_ceo

openai, anthropic, gemini = get_api_key()
gpt_system, claude_system, gemini_system = ycombinator_ceo() #system prompt

## Sample request format
"""
response = ${modelName}.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="low")
display(Markdown(response.choices[0].message.content))

- claude : anthropic.chat.completions.create ; claude-3-5-haiku-latest
- gemini : gemini.chat.completions.create ; gemini-2.5-flash-lite
- openai : openai.chat.completions.create ; gpt-4.1-mini
"""

## How to display images
"""
display(Markdown(response.choices[0].message.content))
"""

conversation = []

gpt_prompt = f"""
    You are Sam Altman, in conversation with Paul Graham and Garry Tan.\
    The conversation so far is as follows:\
    {conversation}\
    Now with this, respond with what you would like to say next, as Sam Altman.\
    Never be confused of who you are. You sir are Sam Altman.
"""

claude_prompt =f"""
    You are Paul Graham, in conversation with Sam Altman and Garry Tan.\
    The conversation so far is as follows:\
    {conversation}\
    Now with this, respond with what you would like to say next, as Paul Graham.
    Never include the gestures (eg. Leaning forward, adjusting the glasses) inside the text.
    Never be confused of who you are. You sir are Paul Graham.
"""

gemini_prompt = f"""
    You are Garry Tan, in conversation with Sam Altman and Paul Graham.\
    The conversation so far is as follows:\
    {conversation}\
    Now with this, respond with what you would like to say next, as Garry Tan.
    Never be confused of who you are. You sir are Garry Tan.
"""

gpt_messages = ["Hi, I'm Sam Altman. I am personally very excited to have you here to discuss the future of AI and startups."]
claude_messages = ["Hi, I'm Paul Graham. I have been writing many articles to encourage founders to start small, do things that don't scale and work with the founder mode. Very excited to be here."]
gemini_messages = ["Hi, I'm Garry Tan. I am the current CEO of Y Combinator, former founder of Initialized Capital and Posterous. I've recently held an event called the AI Startup School to help founders navigate the AI hype. Very excited to be here."]

gpt_model = "gpt-4.1-mini"
claude_model = "claude-3-5-haiku-latest"
gemini_model = "gemini-2.5-flash-lite"

def call_gpt(i):
    messages = [
        {"role": "system", "content": gpt_system},
        {"role": "user", "content": gpt_messages[i]}
    ]

    response = openai.chat.completions.create(model=gpt_model, messages=messages)
    return response.choices[0].message.content

def call_claude(i):
    messages = [
        {"role": "system", "content": claude_system},
        {"role": "user", "content": claude_messages[i]}
    ]
    response = anthropic.chat.completions.create(model=claude_model, messages=messages)
    return response.choices[0].message.content

def call_gemini(i):
    messages = [
        {"role": "system", "content": gemini_system},
        {"role": "user", "content": gemini_messages[i]}
    ]
    response = gemini.chat.completions.create(model=gemini_model, messages=messages)
    return response.choices[0].message.content


for i in range (0, 3):
    display(Markdown(f"### Sam Altman:\n{gpt_messages[i]}\n"))
    gpt_response = call_gpt(i)
    gpt_messages.append(gpt_response)
    conversation.append({"sam altman":gpt_response})

    display(Markdown(f"### Paul Graham:\n{claude_messages[i]}\n"))
    claude_response = call_claude(i)
    claude_messages.append(claude_response)
    conversation.append({"paul graham":claude_response})

    display(Markdown(f"### Garry Tan:\n{gemini_messages[i]}\n"))
    gemini_response = call_gemini(i)
    gemini_messages.append(gemini_response)
    conversation.append({"garry tan":gemini_response})


OpenAI API Key exists and begins sk-proj-
Anthropic API Key exists and begins sk-ant-
Google API Key exists and begins AI


### Sam Altman:
Hi, I'm Sam Altman. I am personally very excited to have you here to discuss the future of AI and startups.


IndexError: list index out of range