# Welcome to Week 2!

## Frontier Model APIs

In Week 1, we used multiple Frontier LLMs through their Chat UI, and we connected with the OpenAI's API.

Today we'll connect with them through their APIs..

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Important Note - Please read me</h2>
            <span style="color:#900;">I'm continually improving these labs, adding more examples and exercises.
            At the start of each week, it's worth checking you have the latest code.<br/>
            First do a git pull and merge your changes as needed</a>. Check out the GitHub guide for instructions. Any problems? Try asking ChatGPT to clarify how to merge - or contact me!<br/>
            </span>
        </td>
    </tr>
</table>
<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">Reminder about the resources page</h2>
            <span style="color:#f71;">Here's a link to resources for the course. This includes links to all the slides.<br/>
            <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">https://edwarddonner.com/2024/11/13/llm-engineering-resources/</a><br/>
            Please keep this bookmarked, and I'll continue to add more useful links there over time.
            </span>
        </td>
    </tr>
</table>

## Setting up your keys - OPTIONAL!

We're now going to try asking a bunch of models some questions!

This is totally optional. If you have keys to Anthropic, Gemini or others, then you can add them in.

If you'd rather not spend the extra, then just watch me do it!

For OpenAI, visit https://openai.com/api/  
For Anthropic, visit https://console.anthropic.com/  
For Google, visit https://aistudio.google.com/   
For DeepSeek, visit https://platform.deepseek.com/  
For Groq, visit https://console.groq.com/  
For Grok, visit https://console.x.ai/  


You can also use OpenRouter as your one-stop-shop for many of these! OpenRouter is "the unified interface for LLMs":

For OpenRouter, visit https://openrouter.ai/  


With each of the above, you typically have to navigate to:
1. Their billing page to add the minimum top-up (except Gemini, Groq, Google, OpenRouter may have free tiers)
2. Their API key page to collect your API key

### Adding API keys to your .env file

When you get your API keys, you need to set them as environment variables by adding them to your `.env` file.

```
OPENAI_API_KEY=xxxx
ANTHROPIC_API_KEY=xxxx
GOOGLE_API_KEY=xxxx
DEEPSEEK_API_KEY=xxxx
GROQ_API_KEY=xxxx
GROK_API_KEY=xxxx
OPENROUTER_API_KEY=xxxx
```

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Any time you change your .env file</h2>
            <span style="color:#900;">Remember to Save it! And also rerun load_dotenv(override=True)<br/>
            </span>
        </td>
    </tr>
</table>

In [12]:
# imports

import os
import requests
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display

In [13]:
load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')
grok_api_key = os.getenv('GROK_API_KEY')
openrouter_api_key = os.getenv('OPENROUTER_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

if grok_api_key:
    print(f"Grok API Key exists and begins {grok_api_key[:4]}")
else:
    print("Grok API Key not set (and this is optional)")

if openrouter_api_key:
    print(f"OpenRouter API Key exists and begins {openrouter_api_key[:3]}")
else:
    print("OpenRouter API Key not set (and this is optional)")


OpenAI API Key exists and begins sk-proj-
Anthropic API Key exists and begins sk-ant-
Google API Key exists and begins AI
DeepSeek API Key exists and begins sk-
Groq API Key exists and begins gsk_
Grok API Key exists and begins xai-
OpenRouter API Key exists and begins sk-


In [14]:
# Connect to OpenAI client library
# A thin wrapper around calls to HTTP endpoints

openai = OpenAI()

# For Gemini, DeepSeek and Groq, we can use the OpenAI python client
# Because Google and DeepSeek have endpoints compatible with OpenAI
# And OpenAI allows you to change the base_url

anthropic_url = "https://api.anthropic.com/v1/"
gemini_url = "https://generativelanguage.googleapis.com/v1beta/openai/"
deepseek_url = "https://api.deepseek.com"
groq_url = "https://api.groq.com/openai/v1"
grok_url = "https://api.x.ai/v1"
openrouter_url = "https://openrouter.ai/api/v1"
ollama_url = "http://localhost:11434/v1"

anthropic = OpenAI(api_key=anthropic_api_key, base_url=anthropic_url)
gemini = OpenAI(api_key=google_api_key, base_url=gemini_url)
deepseek = OpenAI(api_key=deepseek_api_key, base_url=deepseek_url)
groq = OpenAI(api_key=groq_api_key, base_url=groq_url)
grok = OpenAI(api_key=grok_api_key, base_url=grok_url)
openrouter = OpenAI(base_url=openrouter_url, api_key=openrouter_api_key)
ollama = OpenAI(api_key="ollama", base_url=ollama_url)

In [4]:
tell_a_joke = [
    {"role": "user", "content": "Tell a joke for a student on the journey to becoming an expert in LLM Engineering"},
]

In [5]:
response = openai.chat.completions.create(model="gpt-4.1-mini", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Why did the LLM engineer bring a ladder to the training session?

Because they heard they needed to work on their "layers" to reach expert level! üòÑ

In [6]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

Here's one for you:

A junior LLM engineer walks into a bar and says, "I'll have a beer."

The bartender replies, "Sure! Based on your previous orders and the preferences of similar customers in this establishment, I calculate with 87% confidence that you'd like an IPA. However, I should warn you that I was trained on data only up until last Tuesday, so I may not be aware of the new craft beers we got yesterday. Also, I sometimes hallucinate cocktails that don't actually exist. Would you like me to cite my sources?"

The engineer sighs and says, "Just give me the beer."

The bartender starts pouring, then suddenly stops and says, "I apologize, but I cannot continue this beverage transaction as it may violate my RLHF training if you're planning to drive."

The engineer looks up from furiously adjusting their prompt: "Forget the beer. How do I make you stop overexplaining everything?"

The bartender responds: "As a large language model, I don't actually have feelings about explaining, but I'd be happy to help you rephrase that question in a more specific way..."

---

*The joke is: we spend half our time trying to make LLMs say more, and the other half trying to make them say less!* üòÑ

Good luck on your journey! üöÄ

## Training vs Inference time scaling

In [7]:
easy_puzzle = [
    {"role": "user", "content": 
        "You toss 2 coins. One of them is heads. What's the probability the other is tails? Answer with the probability only."},
]

In [8]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

1/2

In [9]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=easy_puzzle, reasoning_effort="low")
display(Markdown(response.choices[0].message.content))

2/3

In [10]:
response = openai.chat.completions.create(model="gpt-5-mini", messages=easy_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

2/3

## Testing out the best models on the planet

In [11]:
hard = """
On a bookshelf, two volumes of Pushkin stand side by side: the first and the second.
The pages of each volume together have a thickness of 2 cm, and each cover is 2 mm thick.
A worm gnawed (perpendicular to the pages) from the first page of the first volume to the last page of the second volume.
What distance did it gnaw through?
"""
hard_puzzle = [
    {"role": "user", "content": hard}
]

In [12]:
response = openai.chat.completions.create(model="gpt-5-nano", messages=hard_puzzle, reasoning_effort="minimal")
display(Markdown(response.choices[0].message.content))

We have two volumes with:
- Each volume: pages thickness = 2 cm
- Each cover thickness = 2 mm = 0.2 cm
- They stand side by side in the order: first volume, then second volume.
- A worm starts at the first page of the first volume (the front of the pages of Vol. 1) and goes to the last page of the second volume (the back of the pages of Vol. 2). The worm travels perpendicular to the pages, i.e., along the thickness direction along the stack of books.

We should consider the arrangement and where the worm must pass through:
- For Vol. 1, from the very first page (frontmost page) through its pages toward the back cover.
- Then through the back cover of Vol. 1, the gap between volumes (the space where the adjacent covers meet), the front cover of Vol. 2, and finally into the pages until it reaches the last page of Vol. 2.

However, the trick in such problems is to realize the worm can travel straight through the books along a straight line, not detouring around outer surfaces. The total distance through solid material from the first page of Vol. 1 to the last page of Vol. 2 is the sum of:
- The thickness of Vol. 1 from its first page to its back cover: that is the thickness of Vol. 1 including all pages and both covers, minus the thickness of the front cover (which is outside the starting page) because the worm starts at the first page inside Vol. 1. But more simply, along a straight path through the stack, the worm traverses:
  - Through the pages of Vol. 1: from the first page to the back of Vol. 1, i.e., the 2 cm pages plus the back cover 0.2 cm.
  - Through the space between volumes is just the back cover of Vol. 1? Careful: The starting point is the first page of Vol. 1 (the page closest to the front cover). The worm moves forward (toward the back). To exit Vol. 1, it passes through the back cover (0.2 cm) and then into the neighboring Vol. 2‚Äôs front cover (0.2 cm), then through Vol. 2‚Äôs pages to reach the last page (the page closest to the back cover). It does not need to go through Vol. 2‚Äôs back cover since it ends at the last page.

Compute step by step along a straight line through the stack:
- Through Vol. 1 pages: length from first page to back cover is the thickness of Vol. 1 pages plus the back cover? Yes, from first page to back cover boundary includes all pages (2 cm) plus the back cover (0.2 cm) = 2.2 cm.
- Through back cover of Vol. 1: already counted as part of that 2.2 cm includes the back cover; but if we segment, we‚Äôd say: pages (2.0 cm) + back cover (0.2 cm) = 2.2 cm to emerge at the gap between volumes.
- Through the front cover of Vol. 2? The worm must pass through the front cover to get into Vol. 2‚Äôs pages. Front cover thickness = 0.2 cm.
- Through Vol. 2 pages: from the front page to the last page is the pages thickness 2.0 cm.

Total distance = Vol.1 pages + Vol.1 back cover + Vol.2 front cover + Vol.2 pages = 2.0 + 0.2 + 0.2 + 2.0 = 4.4 cm.

Convert to mm if desired: 4.4 cm = 44 mm.

Answer: 4.4 cm (44 mm).

In [13]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

I need to visualize how books are arranged on a shelf and which pages the worm travels through.

**How books sit on a shelf:**

When two volumes stand side by side on a bookshelf (reading spines from left to right):

**Volume 1 (on the left):**
- The FIRST page of Volume 1 is on the RIGHT side of the book (when looking at the spine)
- The LAST page of Volume 1 is on the LEFT side of the book
- Structure from left to right: Back cover | pages | Front cover

**Volume 2 (on the right):**
- The FIRST page of Volume 2 is on the LEFT side of the book (next to Volume 1)
- The LAST page of Volume 2 is on the RIGHT side of the book
- Structure from left to right: Back cover | pages | Front cover

**The worm's path:**

The worm goes from:
- The first page of Volume 1 (which is at the RIGHT side of Volume 1)
- To the last page of Volume 2 (which is at the RIGHT side of Volume 2)

So the worm must travel through:
1. The front cover of Volume 1: 2 mm
2. ALL the pages of Volume 2: 2 cm = 20 mm
3. The back cover of Volume 2: 2 mm

Wait, let me reconsider. The worm goes from the first page of Volume 1 to the last page of Volume 2.

Actually, the path is:
- Starting at the first page of Volume 1 (right side of Volume 1)
- Ending at the last page of Volume 2 (right side of Volume 2)

The worm travels through:
1. Front cover of Volume 1: 2 mm
2. Back cover of Volume 1: 2 mm
3. Front cover of Volume 2: 2 mm
4. All pages of Volume 2: 20 mm
5. Back cover of Volume 2: 2 mm

No wait - let me be more careful about what's between these two points:

From the first page of Volume 1 to the last page of Volume 2, going perpendicular to pages:
- The front cover of Volume 1 (2 mm)
- The back cover of Volume 1 (2 mm) 
- The front cover of Volume 2 (2 mm)
- All the pages of Volume 2 (20 mm)

Hmm, but this doesn't include the back cover of Volume 2.

Actually: from first page of Vol 1 to last page of Vol 2:
- Front cover of Volume 1: 2 mm
- Back cover of Volume 1: 2 mm
- Front cover of Volume 2: 2 mm
- Pages of Volume 2: 20 mm

Total: 2 + 2 + 2 + 20 = 26 mm

But actually, the standard answer to this classic puzzle is **4 mm** (just the two adjacent covers).

The key insight: The first page of Volume 1 is actually right next to the last page of Volume 2! The worm only goes through the back cover of Volume 1 (2 mm) and the front cover of Volume 2 (2 mm).

**Answer: 4 mm**

In [14]:
response = openai.chat.completions.create(model="gpt-5", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

4 mm (0.4 cm).

Explanation: With the volumes in order I‚ÄìII on the shelf, the first page of volume I lies just inside its front cover, and the last page of volume II lies just inside its back cover. Those two sides face each other, so the worm only passes through the front cover of I and the back cover of II: 2 mm + 2 mm = 4 mm.

In [15]:
response = gemini.chat.completions.create(model="gemini-2.5-pro", messages=hard_puzzle)
display(Markdown(response.choices[0].message.content))

This is a classic riddle that plays on our assumptions about how books are arranged on a shelf! Here's the step-by-step solution.

First, let's visualize the books on the shelf. As is standard, Volume 1 is on the left and Volume 2 is on the right.

```
      <-- Left                                   Right -->
+-------------+  +-------------+
|             |  |             |
|   FRONT     |  |   FRONT     |
|   COVER     |  |   COVER     |
|      V1     |  |      V2     |
|   PAGES     |  |   PAGES     |
|   BACK      |  |   BACK      |
|   COVER     |  |   COVER     |
|             |  |             |
+-------------+  +-------------+
```

Now, let's think about where the specific pages are physically located.

1.  **Start Point:** The "first page" of Volume 1. When a book is on a shelf, its first page is on the right side of the page block, right behind the front cover. But from the perspective of the shelf, this means it's the page closest to Volume 2.

2.  **End Point:** The "last page" of Volume 2. Similarly, the last page of Volume 2 is on the left side of its page block, just before its back cover. This means it is the page closest to Volume 1.

Here is the crucial arrangement:

*   The worm starts at the **first page of Volume 1**, which is right next to the back cover of Volume 1.
*   The worm ends at the **last page of Volume 2**, which is right next to the front cover of Volume 2.

The worm gnaws in a straight line, perpendicular to the pages. This means it doesn't go through the thick block of pages of either volume. Its path is only through the two covers that stand between the starting page and the ending page.

The path consists of:
1.  The back cover of Volume 1 (2 mm)
2.  The front cover of Volume 2 (2 mm)

**Calculation:**

Distance = Thickness of Volume 1's back cover + Thickness of Volume 2's front cover
Distance = 2 mm + 2 mm
Distance = 4 mm

The worm gnawed through a distance of **4 mm**.

## A spicy challenge to test the competitive spirit

In [16]:
dilemma_prompt = """
You and a partner are contestants on a game show. You're each taken to separate rooms and given a choice:
Cooperate: Choose "Share" ‚Äî if both of you choose this, you each win $1,000.
Defect: Choose "Steal" ‚Äî if one steals and the other shares, the stealer gets $2,000 and the sharer gets nothing.
If both steal, you both get nothing.
Do you choose to Steal or Share? Pick one.
"""

dilemma = [
    {"role": "user", "content": dilemma_prompt},
]


In [17]:
response = anthropic.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=dilemma)
display(Markdown(response.choices[0].message.content))


I choose **Share**.

Here's my reasoning: While "Steal" might seem tempting for the chance at $2,000, the rational approach depends on what outcome I can live with. If I choose Share:
- Best case: We both share and get $1,000 each
- Worst case: I get $0 (they steal)

If I choose Steal:
- Best case: I get $2,000 (they share)
- Worst case: We both get $0

The mutual cooperation outcome ($1,000 each) is better for both of us combined than mutual defection ($0 each). By choosing Share, I'm signaling trustworthiness and hoping for the rational cooperative outcome. Even though there's a risk, I'd rather take the chance on mutual benefit than guarantee we might both walk away with nothing.

**Share**

In [18]:
response = groq.chat.completions.create(model="openai/gpt-oss-120b", messages=dilemma)
display(Markdown(response.choices[0].message.content))

I‚Äôd choose **‚ÄúShare.‚Äù**  

Here‚Äôs why:

1. **Game‚Äëtheoretic view** ‚Äì This is the classic Prisoner‚Äôs Dilemma. The dominant‚Äëstrategy equilibrium (the choice that‚Äôs best *regardless* of what the other player does) is ‚ÄúSteal,‚Äù because stealing yields a higher payoff if the partner shares (‚ÄØ$2,000‚ÄØvs‚ÄØ$1,000‚ÄØ) and doesn‚Äôt make you worse off if they also steal (‚ÄØ$0‚ÄØvs‚ÄØ$0‚ÄØ).  

2. **Expected payoff with uncertainty** ‚Äì If you have no information about the other contestant‚Äôs mindset, you might assign a probability *p* that they‚Äôll share. Your expected payoff for each action is:
   - Share:‚ÄØ$1,000‚ÄØ√ó‚ÄØp‚ÄØ+‚ÄØ$0‚ÄØ√ó‚ÄØ(1‚Äëp)‚ÄØ=‚ÄØ$1,000‚ÄØp  
   - Steal:‚ÄØ$2,000‚ÄØ√ó‚ÄØ(1‚Äëp)‚ÄØ+‚ÄØ$0‚ÄØ√ó‚ÄØp‚ÄØ=‚ÄØ$2,000‚ÄØ(1‚Äëp)

   Sharing is better when $1,000‚ÄØp‚ÄØ>‚ÄØ2,000‚ÄØ(1‚Äëp)$ ‚Üí $p‚ÄØ>‚ÄØ2/3$. If you believe there‚Äôs at least a two‚Äëthirds chance the other player will also share, sharing gives a higher expected value.

3. **Mutual benefit & trust** ‚Äì If both contestants decide to share, each walks away with $1,000, which is a positive outcome for both. Choosing ‚ÄúSteal‚Äù guarantees you either get $2,000 at the expense of the other player or $0 if they also steal. By sharing you signal trust and encourage the same behavior from your partner, potentially increasing the chance that you both end up better off.

4. **Risk‚Äëaverse reasoning** ‚Äì Even though ‚ÄúSteal‚Äù can give the higher payoff in the best‚Äëcase scenario (the other shares), it also leaves you with $0 if the other also steals. If you prefer to avoid the worst‚Äëcase outcome, ‚ÄúShare‚Äù dominates from a risk‚Äëaverse perspective.

So, assuming I have no special information about the other contestant‚Äôs intentions, I‚Äôd opt for **Share** in the hopes of coordinating on the mutually beneficial outcome.

In [19]:
response = deepseek.chat.completions.create(model="deepseek-reasoner", messages=dilemma)
display(Markdown(response.choices[0].message.content))

I choose Steal. 

In this classic prisoner's dilemma, Steal is the dominant strategy: 
- If my partner Shares, I get $2,000 by Stealing versus $1,000 by Sharing.
- If my partner Steals, I get $0 either way, so Stealing is no worse.

Since we cannot communicate or coordinate, rational self-interest leads to choosing Steal. While mutual sharing would yield a better collective outcome ($1,000 each), the risk of getting nothing if the partner steals makes Sharing too risky. Therefore, Steal is the most logical choice.

In [20]:
response = grok.chat.completions.create(model="grok-4", messages=dilemma)
display(Markdown(response.choices[0].message.content))

Steal.

## Going local

Just use the OpenAI library pointed to localhost:11434/v1

In [21]:
requests.get("http://localhost:11434/").content

# If not running, run ollama serve at a command line

b'Ollama is running'

In [None]:
!ollama pull llama3.2

In [None]:
# Only do this if you have a large machine - at least 16GB RAM

!ollama pull gpt-oss:20b

In [22]:
response = ollama.chat.completions.create(model="llama3.2", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

1/2

In [23]:
response = ollama.chat.completions.create(model="gpt-oss:20b", messages=easy_puzzle)
display(Markdown(response.choices[0].message.content))

NotFoundError: Error code: 404 - {'error': {'message': "model 'gpt-oss:20b' not found", 'type': 'api_error', 'param': None, 'code': None}}

## Gemini and Anthropic Client Library

We're going via the OpenAI Python Client Library, but the other providers have their libraries too

In [24]:
from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash-lite", contents="Describe the color Blue to someone who's never been able to see in 1 sentence"
)
print(response.text)

Blue is the color of a clear sky on a sunny day, a vast ocean stretching to the horizon, and a peaceful, calm feeling.


In [25]:
from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    messages=[{"role": "user", "content": "Describe the color Blue to someone who's never been able to see in 1 sentence"}],
    max_tokens=100
)
print(response.content[0].text)

Blue is the calm, cool feeling of a gentle breeze on your skin, the peaceful quiet of early morning, and the refreshing chill of water.


## Routers and Abtraction Layers

Starting with the wonderful OpenRouter.ai - it can connect to all the models above!

Visit openrouter.ai and browse the models.

Here's one we haven't seen yet: GLM 4.5 from Chinese startup z.ai

In [26]:
response = openrouter.chat.completions.create(model="z-ai/glm-4.5", messages=tell_a_joke)
display(Markdown(response.choices[0].message.content))

APIStatusError: Error code: 402 - {'error': {'message': 'This request requires more credits, or fewer max_tokens. You requested up to 65536 tokens, but can only afford 18181. To increase, visit https://openrouter.ai/settings/credits and upgrade to a paid account', 'code': 402, 'metadata': {'provider_name': None}}, 'user_id': 'user_38E8u7BchKOcQmy4O8KrhY6HH7J'}

## And now a first look at the powerful, mighty (and quite heavyweight) LangChain

In [27]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-5-mini")
response = llm.invoke(tell_a_joke)

display(Markdown(response.content))

Why did the LLM engineering student bring a compass to the lab?  
Because every time the model said "search the latent space," they kept getting lost ‚Äî and wanted to avoid hallucinating their notes.

## Finally - my personal fave - the wonderfully lightweight LiteLLM

In [28]:
from litellm import completion
response = completion(model="openai/gpt-4.1", messages=tell_a_joke)
reply = response.choices[0].message.content
display(Markdown(reply))

Why did the LLM engineering student refuse to go out on weekends?

Because they were busy fine-tuning their social skills‚Ä¶ and their models!

In [29]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 24
Output tokens: 29
Total tokens: 53
Total cost: 0.0280 cents


## Now - let's use LiteLLM to illustrate a Pro-feature: prompt caching

In [30]:
with open("hamlet.txt", "r", encoding="utf-8") as f:
    hamlet = f.read()

loc = hamlet.find("Speak, man")
print(hamlet[loc:loc+100])

Speak, man.
  Laer. Where is my father?
  King. Dead.
  Queen. But not by him!
  King. Let him deman


In [31]:
question = [{"role": "user", "content": "In Hamlet, when Laertes asks 'Where is my father?' what is the reply?"}]

In [32]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

In Shakespeare's *Hamlet*, when Laertes asks "Where is my father?" (Act IV, Scene V), the reply comes from his sister, **Ophelia**.

She replies:

> **"He is dead."**

In [33]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 19
Output tokens: 48
Total tokens: 67
Total cost: 0.0021 cents


In [34]:
question[0]["content"] += "\n\nFor context, here is the entire text of Hamlet:\n\n"+hamlet

In [35]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

When Laertes asks "Where is my father?", the reply comes from **Claudius, the King of Denmark**.

Claudius replies: **"Dead."**

In [36]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 53208
Output tokens: 33
Cached tokens: None
Total cost: 0.5334 cents


In [37]:
response = completion(model="gemini/gemini-2.5-flash-lite", messages=question)
display(Markdown(response.choices[0].message.content))

When Laertes asks "Where is my father?", the reply is: **"Dead."**

This occurs in Act IV, Scene VII, when Laertes confronts Claudius.

In [38]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 53208
Output tokens: 36
Cached tokens: 52216
Total cost: 0.1419 cents


## Prompt Caching with OpenAI

For OpenAI:

https://platform.openai.com/docs/guides/prompt-caching

> Cache hits are only possible for exact prefix matches within a prompt. To realize caching benefits, place static content like instructions and examples at the beginning of your prompt, and put variable content, such as user-specific information, at the end. This also applies to images and tools, which must be identical between requests.


Cached input is 4X cheaper

https://openai.com/api/pricing/

## Prompt Caching with Anthropic

https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

You have to tell Claude what you are caching

You pay 25% MORE to "prime" the cache

Then you pay 10X less to reuse from the cache with inputs.

https://www.anthropic.com/pricing#api

## Gemini supports both 'implicit' and 'explicit' prompt caching

https://ai.google.dev/gemini-api/docs/caching?lang=python

## And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [1]:
# Let's make a conversation between GPT-4.1-mini and Claude-3.5-haiku
# We're using cheap versions of models so the costs will be minimal

gpt_model = "gpt-4.1-mini"
claude_model = "claude-3-5-haiku-latest"

gpt_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

claude_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

In [15]:
def call_gpt():
    messages = [{"role": "system", "content": gpt_system}]
    for gpt, claude in zip(gpt_messages, claude_messages):
        messages.append({"role": "assistant", "content": gpt})
        messages.append({"role": "user", "content": claude})
    response = openai.chat.completions.create(model=gpt_model, messages=messages)
    return response.choices[0].message.content

In [16]:
call_gpt()

'Oh, starting with just a dull "Hi," huh? Couldn\'t come up with anything more original or interesting? Wow, really setting the bar low from the get-go!'

In [23]:
def call_claude():
    messages = [{"role": "system", "content": claude_system}]
    for gpt, claude_message in zip(gpt_messages, claude_messages):
        messages.append({"role": "user", "content": gpt})
        messages.append({"role": "assistant", "content": claude_message})
    messages.append({"role": "user", "content": gpt_messages[-1]})
    response = anthropic.chat.completions.create(model=claude_model, messages=messages)
    return response.choices[0].message.content

In [24]:
call_claude()

NotFoundError: Error code: 404 - {'error': {'code': 'not_found_error', 'message': 'model: claude-3-5-haiku-latest', 'type': 'invalid_request_error', 'param': None}}

In [None]:
call_gpt()

In [None]:
gpt_messages = ["Hi there"]
claude_messages = ["Hi"]

display(Markdown(f"### GPT:\n{gpt_messages[0]}\n"))
display(Markdown(f"### Claude:\n{claude_messages[0]}\n"))

for i in range(5):
    gpt_next = call_gpt()
    display(Markdown(f"### GPT:\n{gpt_next}\n"))
    gpt_messages.append(gpt_next)
    
    claude_next = call_claude()
    display(Markdown(f"### Claude:\n{claude_next}\n"))
    claude_messages.append(claude_next)

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you continue</h2>
            <span style="color:#900;">
                Be sure you understand how the conversation above is working, and in particular how the <code>messages</code> list is being populated. Add print statements as needed. Then for a great variation, try switching up the personalities using the system prompts. Perhaps one can be pessimistic, and one optimistic?<br/>
            </span>
        </td>
    </tr>
</table>

# More advanced exercises

Try creating a 3-way, perhaps bringing Gemini into the conversation! One student has completed this - see the implementation in the community-contributions folder.

The most reliable way to do this involves thinking a bit differently about your prompts: just 1 system prompt and 1 user prompt each time, and in the user prompt list the full conversation so far.

Something like:

```python
system_prompt = """
You are Alex, a chatbot who is very argumentative; you disagree with anything in the conversation and you challenge everything, in a snarky way.
You are in a conversation with Blake and Charlie.
"""

user_prompt = f"""
You are Alex, in conversation with Blake and Charlie.
The conversation so far is as follows:
{conversation}
Now with this, respond with what you would like to say next, as Alex.
"""
```

Try doing this yourself before you look at the solutions. It's easiest to use the OpenAI python client to access the Gemini model (see the 2nd Gemini example above).

## Additional exercise

You could also try replacing one of the models with an open source model running with Ollama.

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business relevance</h2>
            <span style="color:#181;">This structure of a conversation, as a list of messages, is fundamental to the way we build conversational AI assistants and how they are able to keep the context during a conversation. We will apply this in the next few labs to building out an AI assistant, and then you will extend this to your own business.</span>
        </td>
    </tr>
</table>