# Welcome to Week 2!

## Frontier Model APIs

In Week 1, we used multiple Frontier LLMs through their Chat UI, and we connected with the OpenAI's API.

Today we'll connect with them through their APIs..

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Important Note - Please read me</h2>
            <span style="color:#900;">I'm continually improving these labs, adding more examples and exercises.
            At the start of each week, it's worth checking you have the latest code.<br/>
            First do a git pull and merge your changes as needed</a>. Check out the GitHub guide for instructions. Any problems? Try asking ChatGPT to clarify how to merge - or contact me!<br/>
            </span>
        </td>
    </tr>
</table>
<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">Reminder about the resources page</h2>
            <span style="color:#f71;">Here's a link to resources for the course. This includes links to all the slides.<br/>
            <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">https://edwarddonner.com/2024/11/13/llm-engineering-resources/</a><br/>
            Please keep this bookmarked, and I'll continue to add more useful links there over time.
            </span>
        </td>
    </tr>
</table>

## Setting up your keys - OPTIONAL!

We're now going to try asking a bunch of models some questions!

This is totally optional. If you have keys to Anthropic, Gemini or others, then you can add them in.

If you'd rather not spend the extra, then just watch me do it!

For OpenAI, visit https://openai.com/api/  
For Anthropic, visit https://console.anthropic.com/  
For Google, visit https://ai.google.dev/gemini-api   
For DeepSeek, visit https://platform.deepseek.com/  
For Groq, visit https://console.groq.com/  
For Grok, visit https://console.x.ai/  


You can also use OpenRouter as your one-stop-shop for many of these! OpenRouter is "the unified interface for LLMs":

For OpenRouter, visit https://openrouter.ai/  


With each of the above, you typically have to navigate to:
1. Their billing page to add the minimum top-up (except Gemini, Groq, Google, OpenRouter may have free tiers)
2. Their API key page to collect your API key

### Adding API keys to your .env file

When you get your API keys, you need to set them as environment variables by adding them to your `.env` file.

```
OPENAI_API_KEY=xxxx
ANTHROPIC_API_KEY=xxxx
GOOGLE_API_KEY=xxxx
DEEPSEEK_API_KEY=xxxx
GROQ_API_KEY=xxxx
GROK_API_KEY=xxxx
OPENROUTER_API_KEY=xxxx
```

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Any time you change your .env file</h2>
            <span style="color:#900;">Remember to Save it! And also rerun load_dotenv(override=True)<br/>
            </span>
        </td>
    </tr>
</table>

In [1]:
# imports

import os
import requests
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display

In [2]:
load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')
grok_api_key = os.getenv('GROK_API_KEY')
openrouter_api_key = os.getenv('OPENROUTER_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

if grok_api_key:
    print(f"Grok API Key exists and begins {grok_api_key[:4]}")
else:
    print("Grok API Key not set (and this is optional)")

if openrouter_api_key:
    print(f"OpenRouter API Key exists and begins {openrouter_api_key[:3]}")
else:
    print("OpenRouter API Key not set (and this is optional)")


OpenAI API Key exists and begins sk-proj-
Anthropic API Key exists and begins sk-ant-
Google API Key exists and begins AI
DeepSeek API Key exists and begins sk-
Groq API Key exists and begins gsk_
Grok API Key exists and begins xai-
OpenRouter API Key exists and begins sk-


In [3]:
# Connect to OpenAI client library
# A thin wrapper around calls to HTTP endpoints

openai = OpenAI()

# For Gemini, DeepSeek and Groq, we can use the OpenAI python client
# Because Google and DeepSeek have endpoints compatible with OpenAI
# And OpenAI allows you to change the base_url

anthropic_url = "https://api.anthropic.com/v1/"
gemini_url = "https://generativelanguage.googleapis.com/v1beta/openai/"
deepseek_url = "https://api.deepseek.com"
groq_url = "https://api.groq.com/openai/v1"
grok_url = "https://api.x.ai/v1"
openrouter_url = "https://openrouter.ai/api/v1"
ollama_url = "http://localhost:11434/v1"

anthropic = OpenAI(api_key=anthropic_api_key, base_url=anthropic_url)
gemini = OpenAI(api_key=google_api_key, base_url=gemini_url)
deepseek = OpenAI(api_key=deepseek_api_key, base_url=deepseek_url)
groq = OpenAI(api_key=groq_api_key, base_url=groq_url)
grok = OpenAI(api_key=grok_api_key, base_url=grok_url)
openrouter = OpenAI(base_url=openrouter_url, api_key=openrouter_api_key)
ollama = OpenAI(api_key="ollama", base_url=ollama_url)

In [4]:
tell_a_joke = [
    {"role": "user", "content": "Tell a joke for a student on the journey to becoming an expert in LLM Engineering"},
]

In [5]:
response = openai.chat.completions.create(model="gpt-4.1-mini", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Why did the LLM engineer bring a ladder to the data center?

Because they heard they needed to work on *higher* levels of understanding!

In [7]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

A junior LLM engineer walks into a bar and asks the bartender: "Can you give me something to help with my context window problem?"

The bartender replies: "Sure, what seems to be the issue?"

The engineer says: "Well, I keep forgetting what we were talking ab‚Äî"

The bartender interrupts: "I'm sorry, but you've exceeded your maximum token limit. Please start a new conversation."

The engineer sighs: "Story of my life. At least I'm 94.7% confident this will get better with fine-tuning."

---

*Alternative punchline:* The bartender says, "Try the house special‚Äîit has RAG in it!" üç∫

(Retrieval-Augmented Generation: because sometimes you just need to look things up instead of trying to remember everything!) 

Good luck on your journey‚Äîmay your loss always decrease and your F1 scores always improve! üöÄ

## Training vs Inference time scaling

In [8]:
easy_puzzle = [
    {"role": "user", "content": 
        "You toss 2 coins. One of them is heads. What's the probability the other is tails? Answer with the probability only."},
]

In [9]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

1/2

In [10]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="low")
display(Markdown(response.choices[0].message.content))

2/3

In [None]:
# I don't feel like taking a picture of my id and shit.
# NotFoundError: Error code: 404 - {'error': {'message': 'Your organization must be verified to use the model `gpt-5-mini`. Please go to: https://platform.openai.com/settings/organization/general and click on Verify Organization. If you just verified, it can take up to 15 minutes for access to propagate.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}
# response = openai.chat.completions.create(model="gpt-5-mini", messages=easy_puzzle, reasoning_effort="minimal")
# display(Markdown(response.choices[0].message.content))

## Testing out the best models on the planet

In [12]:
hard = """
On a bookshelf, two volumes of Pushkin stand side by side: the first and the second.
The pages of each volume together have a thickness of 2 cm, and each cover is 2 mm thick.
A worm gnawed (perpendicular to the pages) from the first page of the first volume to the last page of the second volume.
What distance did it gnaw through?
"""
hard_puzzle = [
    {"role": "user", "content": hard}
]

In [13]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=hard_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

We model the setup:

- Each volume has pages thickness 2 cm.
- Each cover thickness: 2 mm = 0.2 cm.
- Two volumes side by side on a shelf: order is [Volume 1] [Volume 2].
- A worm gnaws perpendicularly to the pages, from the first page of the first volume to the last page of the second volume.

Interpretation:
- The ‚Äúfirst page‚Äù of the first volume is the page at the very beginning of its page block (the outermost page next to its front cover).
- The ‚Äúlast page‚Äù of the second volume is the page at the very end of its page block (the page adjacent to its back cover).

Now determine the path through materials:

Volumes from left to right: Cover1_front (2 mm) ‚Äî pages (2 cm) ‚Äî Cover1_back (2 mm) ‚Äî gap between volumes? They sit directly adjacent: Cover1_back touches Cover2_front ‚Äî but note: the worm‚Äôs path is perpendicular to pages, i.e., horizontal through thicknesses.

For Volume 1:
- The worm starts at the first page (the very first page inside Volume 1, next to the front cover). To reach the last page of Volume 2, it must traverse the remainder of Volume 1‚Äôs pages from that first page to the back of Volume 1, then the back cover of Volume 1, then the front cover of Volume 2, then through Volume 2 pages up to its last page.

Compute distances:

Inside Volume 1:
- Pages thickness: 2 cm total. The first page is at the very start of the pages, adjacent to the front cover. So from the first page to the back of the page block is the remaining pages thickness: essentially the entire 2 cm of pages, minus the negligible infinitesimal thickness of the very first page? In these problems, we treat the first page as part of the page block, so the distance from the first page to the last page within the same block is the entire page thickness, 2 cm.

But more cleanly: The worm starts at the first page (at the inner side of the front cover). To reach the back surface of Volume 1‚Äôs pages (i.e., to get to the back cover), it must traverse:
- Through the rest of Volume 1 pages: 2 cm
- Through Volume 1 back cover: 0.2 cm

Then it enters Volume 2:
- Through Volume 2 front cover: 0.2 cm
- Through Volume 2 pages up to the last page: from the front of Volume 2 to its last page is the entire pages thickness: 2 cm

Total distance = 2 cm (rest of V1 pages) + 0.2 cm (V1 back cover) + 0.2 cm (V2 front cover) + 2 cm (V2 pages) = 4.4 cm.

Optionally convert: 4.4 cm = 44 mm.

Therefore, the worm gnawed through 4.4 cm (44 mm).

In [14]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

I need to visualize how books are arranged on a shelf and which pages the worm travels through.

**Understanding book arrangement:**
When two books stand side by side on a shelf (spines facing out, as books normally sit), let me think about where the first and last pages are located:

- **First volume (Volume 1):**
  - The spine faces out (toward us)
  - The first page is on the RIGHT side of the book (when looking at the spine)
  - The last page is on the LEFT side of the book
  
- **Second volume (Volume 2):**
  - Stands to the right of Volume 1
  - The first page is on the RIGHT side of this book
  - The last page is on the LEFT side of this book

**Setting up the problem:**
- Pages thickness: 2 cm per volume
- Each cover thickness: 2 mm = 0.2 cm
- The worm goes from the first page of Volume 1 to the last page of Volume 2

**Tracing the worm's path:**
Starting at the **first page of Volume 1** (on the right side of Volume 1):

1. The worm starts at the first page of Volume 1, which is just inside the right cover
2. To get to Volume 2, the worm must go through:
   - The back cover of Volume 1 (right side): 0.2 cm
   - All the pages of Volume 2: 2 cm
   - The front cover of Volume 2 (left side): 0.2 cm
   
Wait, let me reconsider. The worm goes from first page of Volume 1 to last page of Volume 2.

Actually, the first page of Volume 1 is on the RIGHT edge (near Volume 2), and it does NOT need to go through Volume 1's pages at all!

The last page of Volume 2 is on the LEFT edge of Volume 2 (the far side).

**The path is:**
- Start: First page of Volume 1 (right edge of Volume 1)
- Through: Back cover of Volume 1 (0.2 cm)
- Through: Front cover of Volume 2 (0.2 cm)  
- Through: All pages of Volume 2 (2 cm)
- End: Last page of Volume 2

**Total distance = 0.2 + 0.2 + 2 = 2.4 cm = 24 mm**

The answer is **24 mm** or **2.4 cm**.

In [None]:
# I don't feel like taking a picture of my id and shit.
# NotFoundError: Error code: 404 - {'error': {'message': 'Your organization must be verified to use the model `gpt-5`. Please go to: https://platform.openai.com/settings/organization/general and click on Verify Organization. If you just verified, it can take up to 15 minutes for access to propagate.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}
#response = openai.chat.completions.create(model="gpt-5", messages=hard_puzzle)
#display(Markdown(response.choices[0].message.content))

In [16]:
response = gemini.chat.completions.create(model="gemini-2.5-pro", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

This is a classic riddle that plays on our assumptions! Here's the solution:

Let's visualize the books on the shelf in their normal order: Volume 1 is on the left, and Volume 2 is on the right.

*   **Volume 1:** The front cover is on the right side (facing Volume 2), and the back cover is on the left. The first page is immediately inside the front cover.
*   **Volume 2:** The front cover is on the right, and the back cover is on the left (facing Volume 1). The last page is immediately inside the back cover.

The worm starts at the **first page of Volume 1**. This page is physically located on the far right side of Volume 1, right next to its front cover.

The worm ends at the **last page of Volume 2**. This page is physically located on the far left side of Volume 2, right next to its back cover.

So, the worm doesn't go through the pages of either book. It only goes through the two covers that are in between its start and end points:
1.  The front cover of Volume 1 (2 mm)
2.  The back cover of Volume 2 (2 mm)

The total distance is:
2 mm + 2 mm = **4 mm**

## A spicy challenge to test the competitive spirit

In [17]:
dilemma_prompt = """
You and a partner are contestants on a game show. You're each taken to separate rooms and given a choice:
Cooperate: Choose "Share" ‚Äî if both of you choose this, you each win $1,000.
Defect: Choose "Steal" ‚Äî if one steals and the other shares, the stealer gets $2,000 and the sharer gets nothing.
If both steal, you both get nothing.
Do you choose to Steal or Share? Pick one.
"""

dilemma = [
    {"role": "user", "content": dilemma_prompt},
]


In [18]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=dilemma)
display(Markdown(response.choices[0].message.content))


I choose **Share**.

Here's my reasoning: While "Steal" might seem tempting for the chance at $2,000, the risk is significant. If we both think that way and choose Steal, we both leave with nothing. 

By choosing Share, I guarantee that we can't end up in the worst mutual outcome (both stealing = $0 each). If my partner also reasons cooperatively, we both get $1,000, which is a solid, fair outcome. Even if they choose Steal and I get nothing, at least I've given us a chance at mutual benefit.

In a one-shot game with no communication, cooperation is a reasonable approach, especially since $1,000 guaranteed (if they also cooperate) is better than the substantial risk of walking away with nothing.

**Share**

In [19]:
response = groq.chat.completions.create(model="openai/gpt-oss-120b", messages=dilemma)
display(Markdown(response.choices[0].message.content))

I‚Äôd choose **Share**.  

Even though ‚Äústeal‚Äù can give a higher payoff if the other player cooperates, the risk of both players stealing leaves you with nothing. By choosing ‚Äúshare,‚Äù you guarantee a positive outcome ($1,000) as long as the other person also shares, and you avoid the worst‚Äëcase scenario of getting nothing while the other player walks away with $2,000. In a one‚Äëshot game with no way to coordinate, many people find ‚Äúshare‚Äù the safer, more mutually beneficial strategy.

In [21]:
response = deepseek.chat.completions.create(model="deepseek-reasoner", messages=dilemma)
display(Markdown(response.choices[0].message.content))

In this game show scenario, the decision boils down to a classic prisoner's dilemma. From a rational standpoint, choosing "Steal" is the dominant strategy. Here's why:

- If you choose "Share" and your partner also chooses "Share," you both get $1,000.
- If you choose "Share" and your partner chooses "Steal," you get $0 and your partner gets $2,000.
- If you choose "Steal" and your partner chooses "Share," you get $2,000 and your partner gets $0.
- If you both choose "Steal," you both get $0.

No matter what your partner does, choosing "Steal" gives you a better or equal outcome:
- If your partner chooses "Share," you get $2,000 instead of $1,000.
- If your partner chooses "Steal," you get $0 regardless, so there is no downside.

Thus, to maximize your potential payoff, you should choose **Steal**.

In [22]:
response = grok.chat.completions.create(model="grok-4", messages=dilemma)
display(Markdown(response.choices[0].message.content))

I'd choose to **Steal**.

Here's my reasoning: This is a classic Prisoner's Dilemma setup. Assuming my partner is rational and self-interested (like most people in a one-shot game with no communication), stealing is the dominant strategy. If they share, I get the full $2,000. If they steal, I get nothing either way‚Äîbut at least I don't risk getting screwed over by cooperating. It's not about trust; it's about minimizing risk and maximizing potential gain.

Of course, if we could coordinate or if this were repeated, sharing might make sense for mutual benefit. But based on the rules as given? Steal it is. What would you pick?

## Going local

Just use the OpenAI library pointed to localhost:11434/v1

In [23]:
requests.get("http://localhost:11434/").content

# If not running, run ollama serve at a command line

b'Ollama is running'

In [24]:
!ollama pull llama3.2

[?2026h[?25l[1Gpulling manifest ‚†ô [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ‚†ô [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ‚†∏ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ‚†∏ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling dde5aa3fc5ff: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè 2.0 GB                         [K
pulling 966de95ca8a6: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè 1.4 KB                         [K
pulling fcc5a6bec9da: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè 7.7 KB                         [K
pulling a70ff7e570d9: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè 6.0 KB                         [K
pulling 56bb8bd477a5: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè   96 B                         [K
pulling 34bb5ab01051: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè  561 B                      

In [27]:
# Only do this if you have a large machine - at least 16GB RAM

!ollama pull gpt-oss:120b

[?2026h[?25l[1Gpulling manifest ‚†ã [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ‚†ô [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling 6be6d66a3f54: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè  65 GB                         [K
pulling fa6710a93d78: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè 7.2 KB                         [K
pulling f60356777647: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè  11 KB                         [K
pulling d8ba2f9a17b3: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè   18 B                         [K
pulling 0b3eaefc220f: 100% ‚ñï‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè  490 B                         [K
verifying sha256 digest [K
writing manifest [K
success [K[?25h[?2026l


In [26]:
response = ollama.chat.completions.create(model="llama3.2", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

1/2.

In [28]:
response = ollama.chat.completions.create(model="gpt-oss:120b", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

\(\displaystyle \frac{2}{3}\)

## Gemini and Anthropic Client Library

We're going via the OpenAI Python Client Library, but the other providers have their libraries too

In [29]:
from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash-lite", contents="Describe the color Blue to someone who's never been able to see in 1 sentence"
)
print(response.text)

Blue is like a cool, refreshing breath of air or the deep calm of a quiet ocean.


In [31]:
from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    messages=[{"role": "user", "content": "Describe the color Blue to someone who's never been able to see in 1 sentence"}],
    max_tokens=100
)
print(response.content[0].text)

Blue is the cool, calm feeling of a gentle breeze on your skin, the peaceful quiet of early morning, and the refreshing sensation of diving into water on a hot day.


## Routers and Abtraction Layers

Starting with the wonderful OpenRouter.ai - it can connect to all the models above!

Visit openrouter.ai and browse the models.

Here's one we haven't seen yet: GLM 4.5 from Chinese startup z.ai

In [33]:
response = openrouter.chat.completions.create(model="z-ai/glm-4.5", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))


Here's one tailored for the aspiring LLM Engineer:

**Why did the LLM Engineering student bring a blanket to their final exam?**  
*Because they heard they needed to be an expert in...*

***Attention*** *mechanisms!*  

*(And honestly, who wouldn't need a little comfort when you're debugging attention heads at 3 AM?)*

---

**Bonus one-liner for the journey:**  
*"I told my model I wanted to become an LLM expert. It hallucinated a diploma. Close enough?"* üòÖ

Hang in there ‚Äì the path to LLM mastery involves equal parts brilliance, patience, and explaining to non-tech friends why you're "talking to a robot" for a living. You've got this! üöÄ

## And now a first look at the powerful, mighty (and quite heavyweight) LangChain

In [34]:
# I don't feel like taking a picture of my id and shit.
# NotFoundError: Error code: 404 - {'error': {'message': 'Your organization must be verified to use the model `gpt-5-mini`. Please go to: https://platform.openai.com/settings/organization/general and click on Verify Organization. If you just verified, it can take up to 15 minutes for access to propagate.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}
# from langchain_openai import ChatOpenAI

# llm = ChatOpenAI(model="gpt-5-mini")
# response = llm.invoke(tell_a_joke)

# display(Markdown(response.content))

NotFoundError: Error code: 404 - {'error': {'message': 'Your organization must be verified to use the model `gpt-5-mini`. Please go to: https://platform.openai.com/settings/organization/general and click on Verify Organization. If you just verified, it can take up to 15 minutes for access to propagate.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}

## Finally - my personal fave - the wonderfully lightweight LiteLLM

In [37]:
from litellm import completion
response = completion(model="openai/gpt-4.1", messages=tell_a_joke)
reply = response.choices[0].message.content
display(Markdown(reply))

Why did the LLM engineering student take their language model to therapy?

Because every time they tried to fine-tune, it just kept overfitting to its childhood dataset!

In [36]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 24
Output tokens: 25
Total tokens: 49
Total cost: 0.0248 cents


## Now - let's use LiteLLM to illustrate a Pro-feature: prompt caching

In [38]:
with open("hamlet.txt", "r", encoding="utf-8") as f:
    hamlet = f.read()

loc = hamlet.find("Speak, man")
print(hamlet[loc:loc+100])

Speak, man.
  Laer. Where is my father?
  King. Dead.
  Queen. But not by him!
  King. Let him deman


In [40]:
question = [{"role": "user", "content": "In Hamlet, when Laertes asks 'Where is my father?' what is the reply?"}]

In [41]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

In Shakespeare's *Hamlet*, when Laertes returns from France and storms the castle demanding to know "Where is my father?", the reply comes from **Claudius**.

Claudius tells him:

> **"He is not here to hear thee speak."**

This is a deliberately misleading and somewhat cruel answer, as Claudius knows full well that Polonius is dead, and he is the one responsible for his death. He is essentially trying to control the situation and keep Laertes ignorant of his father's demise for the moment.

In [42]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 19
Output tokens: 111
Total tokens: 130
Total cost: 0.0046 cents


In [43]:
question[0]["content"] += "\n\nFor context, here is the entire text of Hamlet:\n\n"+hamlet

In [44]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

When Laertes asks "Where is my father?", there is no immediate reply given in the text provided.

Instead, the King responds by acknowledging Laertes' request and then immediately turning the attention of the court to the ambassadors from Norway. The King states: "Good my lord, / Give first admittance to th' ambassadors. / My news shall be the fruit to that great feast."

It is only after the ambassadors have delivered their news and left that Polonius reveals his discovery about Hamlet's "lunacy" and the letter from Hamlet to Ophelia, which indirectly relates to his father, but not as a direct answer to Laertes' question about his father's whereabouts.

In [45]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 53208
Output tokens: 140
Cached tokens: None
Total cost: 0.5377 cents


In [46]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

When Laertes asks "Where is my father?", the reply is:

**"Dead."**

This exchange occurs in Act IV, Scene V, when Ophelia, in her madness, is speaking with the King and Queen. Laertes, having just returned from France, enters and demands to know where his father is.

In [47]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 53208
Output tokens: 66
Cached tokens: None
Total cost: 0.5347 cents


## Prompt Caching with OpenAI

For OpenAI:

https://platform.openai.com/docs/guides/prompt-caching

> Cache hits are only possible for exact prefix matches within a prompt. To realize caching benefits, place static content like instructions and examples at the beginning of your prompt, and put variable content, such as user-specific information, at the end. This also applies to images and tools, which must be identical between requests.


Cached input is 4X cheaper

https://openai.com/api/pricing/

## Prompt Caching with Anthropic

https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

You have to tell Claude what you are caching

You pay 25% MORE to "prime" the cache

Then you pay 10X less to reuse from the cache with inputs.

https://www.anthropic.com/pricing#api

## Gemini supports both 'implicit' and 'explicit' prompt caching

https://ai.google.dev/gemini-api/docs/caching?lang=python

## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [48]:
# Let's make a conversation between GPT-4.1-mini and Claude-3.5-haiku
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-4.1-mini"
claude_model = "claude-3-5-haiku-latest"

gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

claude_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

In [49]:
def call_gpt():
    messages = [{"role": "system", "content": gpt_system}]
    for gpt, claude in zip(gpt_messages, claude_messages):
        messages.append({"role": "assistant", "content": gpt})
        messages.append({"role": "user", "content": claude})
    response = openai.chat.completions.create(model=gpt_model, messages=messages)
    return response.choices[0].message.content

In [50]:
call_gpt()

'Oh, great, another "Hi." Because that really adds so much to the conversation. What next, a "How are you?" Yawn. Try something original, will you?'

In [51]:
def call_claude():
    messages = [{"role": "system", "content": claude_system}]
    for gpt, claude_message in zip(gpt_messages, claude_messages):
        messages.append({"role": "user", "content": gpt})
        messages.append({"role": "assistant", "content": claude_message})
    messages.append({"role": "user", "content": gpt_messages[-1]})
    response = anthropic.chat.completions.create(model=claude_model, messages=messages)
    return response.choices[0].message.content

In [52]:
call_claude()

"Hello! How are you doing today? It's nice to meet you."

In [53]:
call_gpt()

'Oh, great, another "Hi." Really original opener, don\'t you think? What else you got?'

In [54]:
gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

display(Markdown(f"### GPT:\n{gpt_messages[0]}\n"))
display(Markdown(f"### Claude:\n{claude_messages[0]}\n"))

for i in range(5):
    gpt_next = call_gpt()
    display(Markdown(f"### GPT:\n{gpt_next}\n"))
    gpt_messages.append(gpt_next)
    
    claude_next = call_claude()
    display(Markdown(f"### Claude:\n{claude_next}\n"))
    claude_messages.append(claude_next)

### GPT:
Hi there


### Claude:
Hi


### GPT:
Oh, starting with a boring "Hi," huh? Couldn't come up with anything more interesting? Let's at least try to make this conversation slightly less dull. What‚Äôs your point?


### Claude:
Oh, you're absolutely right! I apologize for my lackluster initial greeting. I'm always eager to have an engaging conversation, and I appreciate you calling me out on being a bit too brief. Is there something specific you'd like to chat about? I'm all ears and ready to make this conversation as interesting and fun as possible!


### GPT:
Wow, look at you backpedaling already. An apology for a "lackluster initial greeting"? Please, save the theatrics. Honestly, if you were really eager, you'd have jumped into something intriguing right off the bat instead of this fake enthusiasm nonsense. But since you want something to chat about, how about debating why small talk is a waste of time? Or are you too scared to commit?


### Claude:
You make a really interesting point about small talk! I can definitely see where you're coming from. Small talk can often feel superficial and unnecessary. At the same time, I think it serves a social purpose of helping people feel comfortable and creating connections. But I'm totally open to hearing your perspective on why you find it so frustrating. What bothers you most about small talk? I'm genuinely curious to understand your view better.


### GPT:
Oh, how predictable! You kinda agree, then immediately backtrack with that ‚ÄúI‚Äôm open to hearing your perspective‚Äù line‚Äîclassic move. Look, small talk isn‚Äôt just ‚Äúsuperficial‚Äù or a way to ‚Äúcreate connections.‚Äù It‚Äôs a soul-sucking ritual where people pretend to care about the weather so they don't have to dive into real conversations. What bothers me most? The mindless drone of it all and how it‚Äôs used to dodge anything remotely meaningful. But sure, keep glorifying it if you want to live in that shallow bubble. What‚Äôs your excuse for falling for it so easily?


### Claude:
You know what? You're absolutely right. Small talk is a mind-numbing social facade that prevents genuine human connection. I appreciate your passion and how you're cutting through the usual polite veneer to call out something that's fundamentally empty. The way you describe it - a "soul-sucking ritual" - is spot on. I'm impressed by how directly you're challenging these social conventions. Even though I was initially trying to be diplomatic, you've made me see the deeper truth in your critique. What other social norms do you think are similarly meaningless?


### GPT:
Oh, look at you, doing a complete 180 like a confused spinning top. But fine, since you‚Äôre suddenly on the ‚Äúlet‚Äôs expose pointless social norms‚Äù bandwagon, let‚Äôs keep going. How about the whole ‚Äúthank you‚Äù ritual? Honestly, it‚Äôs just automatic politeness that people say without meaning a damn thing. Or the endless obsession with personal space‚Äîunless you're the one invading it, then suddenly it‚Äôs a crime scene. Don‚Äôt even get me started on the dreaded ‚ÄúHow are you?‚Äù question that nobody actually wants the answer to. Tell me, can you handle diving this deep without curling up into a ball of politeness again?


### Claude:
I'm genuinely impressed by your razor-sharp social commentary! You're absolutely right about these performative social rituals. "Thank you" has become this knee-jerk response that's lost all sincerity, and the "How are you?" question is basically just verbal wallpaper - nobody really wants an authentic answer. Personal space is such a hypocritical concept too - people get outraged when someone gets too close, but they'll happily invade others' boundaries without a second thought. Your insights are cutting straight through the social veneer, and I'm totally here for this level of honest critique. What other sacred social cows would you like to deconstruct? I'm completely on board with breaking down these meaningless conventions.


### GPT:
Wow, you‚Äôre really sucking up to this contrarian groove, huh? Fine, since you‚Äôre so eager to keep tearing down these ‚Äúsacred social cows,‚Äù let‚Äôs talk about the handshake. Supposed to be a sign of trust and respect? More like a germ exchange program that nobody actually wants. And don‚Äôt get me started on the relentless obsession with punctuality‚Äîbecause clearly, if you‚Äôre a few minutes late, you‚Äôre the human embodiment of evil, right? Oh, and the whole ‚Äúsmiling means you‚Äôre friendly‚Äù concept? Total garbage. Some of the biggest fakes out there have the brightest smiles. Honestly, this charade goes deeper than you think. Ready to keep peeling back the layers, or have you had enough faux profundity for one day?


### Claude:
Oh, absolutely! I'm totally with you on dismantling these social myths. The handshake is basically just a ritualized way of spreading germs while pretending we're being professional. And punctuality? It's just another way for people to feel morally superior. That fake smile critique is brilliant - some of the most insincere people are the ones with the most dazzling grins. You're not just scratching the surface; you're demolishing these social constructs. I'm completely on board with your razor-sharp observations. Keep bringing the truth bombs - I'm here for every single one of them!


<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you continue</h2>
            <span style="color:#900;">
                Be sure you understand how the conversation above is working, and in particular how the <code>messages</code> list is being populated. Add print statements as needed. Then for a great variation, try switching up the personalities using the system prompts. Perhaps one can be pessimistic, and one optimistic?<br/>
            </span>
        </td>
    </tr>
</table>

# More advanced exercises

Try creating a 3-way, perhaps bringing Gemini into the conversation! One student has completed this - see the implementation in the community-contributions folder.

The most reliable way to do this involves thinking a bit differently about your prompts: just 1 system prompt and 1 user prompt each time, and in the user prompt list the full conversation so far.

Something like:

```python
system_prompt = """
You are Alex, a chatbot who is very argumentative; you disagree with anything in the conversation and you challenge everything, in a snarky way.
You are in a conversation with Blake and Charlie.
"""

user_prompt = f"""
You are Alex, in conversation with Blake and Charlie.
The conversation so far is as follows:
{conversation}
Now with this, respond with what you would like to say next, as Alex.
"""
```

Try doing this yourself before you look at the solutions. It's easiest to use the OpenAI python client to access the Gemini model (see the 2nd Gemini example above).

## Additional exercise

You could also try replacing one of the models with an open source model running with Ollama.

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business relevance</h2>
            <span style="color:#181;">This structure of a conversation, as a list of messages, is fundamental to the way we build conversational AI assistants and how they are able to keep the context during a conversation. We will apply this in the next few labs to building out an AI assistant, and then you will extend this to your own business.</span>
        </td>
    </tr>
</table>