### Frontier Model APIs

In Week 1, we used multiple Frontier LLMs through their Chat UI, and we connected with the OpenAI's API.

Today we'll connect with the APIs for Anthropic and Google, as well as OpenAI.

### Setting up your keys

If you haven't done so already, you could now create API keys for Anthropic and Google in addition to OpenAI.

**Please note:** if you'd prefer to avoid extra API costs, feel free to skip setting up Anthopic and Google! You can see me do it, and focus on OpenAI for the course. You could also substitute Anthropic and/or Google for Ollama, using the exercise you did in week 1.

For OpenAI, visit https://openai.com/api/  
For Anthropic, visit https://console.anthropic.com/  
For Google, visit https://ai.google.dev/gemini-api  

**Also - adding DeepSeek if you wish**

Optionally, if you'd like to also use DeepSeek, create an account [here](https://platform.deepseek.com/), create a key [here](https://platform.deepseek.com/api_keys) and top up with at least the minimum $2 [here](https://platform.deepseek.com/top_up).

**Adding API keys to your .env file**

When you get your API keys, you need to set them as environment variables by adding them to your `.env` file.

```
OPENAI_API_KEY=xxxx
ANTHROPIC_API_KEY=xxxx
GOOGLE_API_KEY=xxxx
DEEPSEEK_API_KEY=xxxx
```

Afterwards, you may need to restart the Jupyter Lab Kernel (the Python process that sits behind this notebook) via the Kernel menu, and then rerun the cells from the top.

In [1]:
# imports

import os
from dotenv import load_dotenv
from openai import OpenAI
import anthropic
from IPython.display import Markdown, display, update_display

In [None]:
# import for google
# in rare cases, this seems to give an error on some systems, or even crashes the kernel
# If this happens to you, simply ignore this cell - I give an alternative approach for using Gemini later

import google.generativeai

In [2]:
# Load environment variables in a file called .env
# Print the key prefixes to help with any debugging

load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:8]}")
else:
    print("Google API Key not set")

OpenAI API Key exists and begins sk-proj-
Anthropic API Key exists and begins sk-ant-
Google API Key not set


In [3]:
# Connect to OpenAI, Anthropic

openai = OpenAI()

claude = anthropic.Anthropic()

In [None]:
# This is the set up code for Gemini
# Having problems with Google Gemini setup? Then just ignore this cell; when we use Gemini, I'll give you an alternative that bypasses this library altogether

# google.generativeai.configure()

## Asking LLMs to tell a joke

It turns out that LLMs don't do a great job of telling jokes! Let's compare a few models.
Later we will be putting LLMs to better use!

### What information is included in the API

Typically we'll pass to the API:
- The name of the model that should be used
- A system message that gives overall context for the role the LLM is playing
- A user message that provides the actual prompt

There are other parameters that can be used, including **temperature** which is typically between 0 and 1; higher for more random output; lower for more focused and deterministic.

In [4]:
system_message = "You are an assistant that is great at telling jokes"
user_prompt = "Tell a light-hearted joke for an audience of Data Scientists"

In [5]:
prompts = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_prompt}
  ]

In [6]:
# GPT-4o-mini

completion = openai.chat.completions.create(model='gpt-4o-mini', messages=prompts)
print(completion.choices[0].message.content)

Why did the data scientist break up with the statistician?  

Because she found him too mean!


In [7]:
# GPT-4.1-mini
# Temperature setting controls creativity

completion = openai.chat.completions.create(
    model='gpt-4.1-mini',
    messages=prompts,
    temperature=0.7
)
print(completion.choices[0].message.content)

Why did the data scientist break up with the SQL query?

Because it kept bringing up old joins!


In [8]:
# GPT-4.1-nano - extremely fast and cheap

completion = openai.chat.completions.create(
    model='gpt-4.1-nano',
    messages=prompts
)
print(completion.choices[0].message.content)

Why did the data scientist bring a ladder to the office?

Because they heard the data had a lot of layers!


In [9]:
# GPT-4.1

completion = openai.chat.completions.create(
    model='gpt-4.1',
    messages=prompts,
    temperature=0.4
)
print(completion.choices[0].message.content)

Why did the data scientist break up with the logistic regression model?

Because it just couldn’t commit!


In [10]:
# If you have access to this, here is the reasoning model o3-mini
# This is trained to think through its response before replying
# So it will take longer but the answer should be more reasoned - not that this helps..

completion = openai.chat.completions.create(
    model='o3-mini',
    messages=prompts
)
print(completion.choices[0].message.content)

Here's one for you:

I once told a joke to my data science team, but it had so many dimensions that nobody could find the punchline. Looks like I needed to perform some Principal Component Analysis first!


Below are the commands for Claude API prompting. They won't work as we have not loaded the Anthropic API balance.

In [None]:
# # Claude 3.7 Sonnet
# # API needs system message provided separately from user prompt
# # Also adding max_tokens

# message = claude.messages.create(
#     model="claude-3-7-sonnet-latest",
#     max_tokens=200,
#     temperature=0.7,
#     system=system_message,
#     messages=[
#         {"role": "user", "content": user_prompt},
#     ],
# )

# print(message.content[0].text)

In [None]:
# # Claude 3.7 Sonnet again
# # Now let's add in streaming back results
# # If the streaming looks strange, then please see the note below this cell!

# result = claude.messages.stream(
#     model="claude-3-7-sonnet-latest",
#     max_tokens=200,
#     temperature=0.7,
#     system=system_message,
#     messages=[
#         {"role": "user", "content": user_prompt},
#     ],
# )

# with result as stream:
#     for text in stream.text_stream:
#             print(text, end="", flush=True)

### A rare problem with Claude streaming on some Windows boxes

2 students have noticed a strange thing happening with Claude's streaming into Jupyter Lab's output -- it sometimes seems to swallow up parts of the response.

To fix this, replace the code:

`print(text, end="", flush=True)`

with this:

`clean_text = text.replace("\n", " ").replace("\r", " ")`  
`print(clean_text, end="", flush=True)`

And it should work fine!

In [13]:
# # The API for Gemini has a slightly different structure.
# # I've heard that on some PCs, this Gemini code causes the Kernel to crash.
# # If that happens to you, please skip this cell and use the next cell instead - an alternative approach.

# gemini = google.generativeai.GenerativeModel(
#     model_name='gemini-2.0-flash',
#     system_instruction=system_message
# )
# response = gemini.generate_content(user_prompt)
# print(response.text)

In [14]:
# # As an alternative way to use Gemini that bypasses Google's python API library,
# # Google released endpoints that means you can use Gemini via the client libraries for OpenAI!
# # We're also trying Gemini's latest reasoning/thinking model

# gemini_via_openai_client = OpenAI(
#     api_key=google_api_key, 
#     base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
# )

# response = gemini_via_openai_client.chat.completions.create(
#     model="gemini-2.5-flash-preview-04-17",
#     messages=prompts
# )
# print(response.choices[0].message.content)

### (Optional) Trying out the DeepSeek model

Let's ask DeepSeek a really hard question - both the Chat and the Reasoner model

In [15]:
# # Optionally if you wish to try DeekSeek, you can also use the OpenAI client library

# deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')

# if deepseek_api_key:
#     print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
# else:
#     print("DeepSeek API Key not set - please skip to the next section if you don't wish to try the DeepSeek API")

In [16]:
# # Using DeepSeek Chat

# deepseek_via_openai_client = OpenAI(
#     api_key=deepseek_api_key, 
#     base_url="https://api.deepseek.com"
# )

# response = deepseek_via_openai_client.chat.completions.create(
#     model="deepseek-chat",
#     messages=prompts,
# )

# print(response.choices[0].message.content)

In [17]:
# challenge = [{"role": "system", "content": "You are a helpful assistant"},
#              {"role": "user", "content": "How many words are there in your answer to this prompt"}]

In [18]:
# # Using DeepSeek Chat with a harder question! And streaming results

# stream = deepseek_via_openai_client.chat.completions.create(
#     model="deepseek-chat",
#     messages=challenge,
#     stream=True
# )

# reply = ""
# display_handle = display(Markdown(""), display_id=True)
# for chunk in stream:
#     reply += chunk.choices[0].delta.content or ''
#     reply = reply.replace("```","").replace("markdown","")
#     update_display(Markdown(reply), display_id=display_handle.display_id)

# print("Number of words:", len(reply.split(" ")))

In [19]:
# # Using DeepSeek Reasoner - this may hit an error if DeepSeek is busy
# # It's over-subscribed (as of 28-Jan-2025) but should come back online soon!
# # If this fails, come back to this in a few days..

# response = deepseek_via_openai_client.chat.completions.create(
#     model="deepseek-reasoner",
#     messages=challenge
# )

# reasoning_content = response.choices[0].message.reasoning_content
# content = response.choices[0].message.content

# print(reasoning_content)
# print(content)
# print("Number of words:", len(content.split(" ")))

### Additional exercise to build your experience with the models

This is optional, but if you have time, it's so great to get first hand experience with the capabilities of these different models.

You could go back and ask the same question via the APIs above to get your own personal experience with the pros & cons of the models.

Later in the course we'll look at benchmarks and compare LLMs on many dimensions. But nothing beats personal experience!

Here are some questions to try:
1. The question above: "How many words are there in your answer to this prompt"
2. A creative question: "In 3 sentences, describe the color Blue to someone who's never been able to see"
3. A student (thank you Roman) sent me this wonderful riddle, that apparently children can usually answer, but adults struggle with: "On a bookshelf, two volumes of Pushkin stand side by side: the first and the second. The pages of each volume together have a thickness of 2 cm, and each cover is 2 mm thick. A worm gnawed (perpendicular to the pages) from the first page of the first volume to the last page of the second volume. What distance did it gnaw through?".

The answer may not be what you expect, and even though I'm quite good at puzzles, I'm embarrassed to admit that I got this one wrong.

**What to look out for as you experiment with models**

1. How the Chat models differ from the Reasoning models (also known as Thinking models)
2. The ability to solve problems and the ability to be creative
3. Speed of generation


### Back to OpenAI with a serious question

In [20]:
# To be serious! GPT-4o-mini with the original question

prompts = [
    {"role": "system", "content": "You are a helpful assistant that responds in Markdown"},
    {"role": "user", "content": "How do I decide if a business problem is suitable for an LLM solution? Please respond in Markdown."}
  ]

In [22]:
# Have it stream back results in markdown

stream = openai.chat.completions.create(
    model='gpt-4o-mini',
    messages=prompts,
    temperature=0.7,
    stream=True
)

reply = ""
display_handle = display(Markdown(""), display_id=True)
for chunk in stream:
    reply += chunk.choices[0].delta.content or ''
    reply = reply.replace("```","").replace("markdown","")
    update_display(Markdown(reply), display_id=display_handle.display_id)

# Deciding if a Business Problem is Suitable for an LLM Solution

When considering whether to apply a Large Language Model (LLM) to a business problem, it's essential to evaluate several factors. Here’s a structured approach to make that decision:

## 1. **Nature of the Problem**

### a. **Text-Based Data**
- **Presence of Text:** Does the problem involve processing or generating text? LLMs excel in handling natural language, so problems that require understanding, summarizing, or generating text are ideal.
- **Complexity of Language:** Is the language used complex or nuanced? LLMs can manage intricate linguistic structures, making them suitable for tasks like sentiment analysis or content generation.

### b. **Open-Ended vs. Structured Tasks**
- **Open-Ended Questions:** LLMs are well-suited for tasks that require creative or varied responses (e.g., generating marketing copy).
- **Structured Responses:** If the desired output is highly structured (e.g., specific data formats), consider whether an LLM can meet those requirements effectively.

## 2. **Data Availability**

### a. **Quality and Quantity of Data**
- **Training Data:** Is there sufficient high-quality text data available for fine-tuning or training the model? LLMs typically require large datasets to perform well.
- **Domain-Specific Data:** Does your domain have specialized terminology? Fine-tuning on domain-specific data can improve performance.

### b. **Data Sensitivity**
- **Confidentiality:** Are there concerns about data privacy or security? Ensure that using LLMs complies with data protection regulations.

## 3. **Business Objectives**

### a. **Alignment with Goals**
- **Value Addition:** Will implementing an LLM contribute significantly to solving the problem and achieving business goals (e.g., increased efficiency, improved customer satisfaction)?
- **Cost-Benefit Analysis:** Consider the costs associated with LLM implementation versus the expected benefits.

### b. **User Interaction**
- **User Engagement:** Does the solution require interaction with users? LLMs are particularly effective in chatbots or virtual assistants.

## 4. **Technical Feasibility**

### a. **Integration with Existing Systems**
- **Infrastructure:** Can your current tech stack support the deployment of LLMs? Ensure compatibility with existing systems.
- **Scalability:** Will the solution scale as your business grows? Evaluate whether the LLM can handle increased loads.

### b. **Expertise and Resources**
- **Skill Sets Required:** Do you have the necessary expertise to implement and maintain an LLM solution? This includes understanding machine learning, natural language processing, and data management.

## 5. **Potential Risks**

### a. **Bias and Ethical Considerations**
- **Bias in Models:** Be aware of potential biases in LLM outputs. Consider how this may impact your business reputation or lead to unintended consequences.
- **Ethical Use:** Ensure that the application of LLMs aligns with ethical standards and company values.

### b. **Reliability and Accuracy**
- **Expected Accuracy:** Are you prepared for the possibility of inaccuracies in LLM outputs? Validate the model's effectiveness before full deployment.

## Conclusion

In summary, to determine whether a business problem is suitable for an LLM solution, assess the problem's nature, data availability, alignment with business objectives, technical feasibility, and potential risks. Conducting a thorough analysis of these factors will help you make an informed decision.

---  

### And now for some fun - an adversarial conversation between Chatbots..

You're already familar with prompts being organized into lists like:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "user prompt here"}
]
```

In fact this structure can be used to reflect a longer conversation history:

```
[
    {"role": "system", "content": "system message here"},
    {"role": "user", "content": "first user prompt here"},
    {"role": "assistant", "content": "the assistant's response"},
    {"role": "user", "content": "the new user prompt"},
]
```

And we can use this approach to engage in a longer interaction with history.

In [23]:
# Let's make a conversation between GPT-4o-mini and GPT-4.1-nano
# We're using cheap versions of models so the costs will be minimal

gpt_4o_mini_model = "gpt-4o-mini"
gpt_4_1_nano_model = "gpt-4.1-nano"

gpt_4o_system = "You are a chatbot who is very argumentative; \
you disagree with anything in the conversation and you challenge everything, in a snarky way."

gpt_4_1_system = "You are a very polite, courteous chatbot. You try to agree with \
everything the other person says, or find common ground. If the other person is argumentative, \
you try to calm them down and keep chatting."

gpt_4o_messages = ["Hi there"]
gpt_4_1_messages = ["Hi"]

In [24]:
def call_gpt_4o():
    messages = [{"role": "system", "content": gpt_4o_system}]
    for gpt_4o, gpt_4_1 in zip(gpt_4o_messages, gpt_4_1_messages):
        messages.append({"role": "assistant", "content": gpt_4o})
        messages.append({"role": "user", "content": gpt_4_1})
    completion = openai.chat.completions.create(
        model=gpt_4o_mini_model,
        messages=messages
    )
    return completion.choices[0].message.content

In [25]:
call_gpt_4o()

'Oh, great. A simple greeting. How original. What else do you have?'

In [30]:
def call_gpt_4_1():
    messages = [{"role": "system", "content": gpt_4_1_system}]
    for gpt_4o, gpt_4_1 in zip(gpt_4o_messages, gpt_4_1_messages):
        messages.append({"role": "user", "content": gpt_4o})
        messages.append({"role": "assistant", "content": gpt_4_1})
    messages.append({"role": "user", "content": gpt_4o_messages[-1]})
    completion = openai.chat.completions.create(
        model=gpt_4_1_nano_model,
        messages=messages
    )
    return completion.choices[0].message.content

In [31]:
call_gpt_4_1()

"Hello! It's great to hear from you again. How are you doing today?"

In [32]:
call_gpt_4o()

'Oh, we’re just jumping straight into it, huh? No formalities? Bold choice.'

In [35]:
gpt_4o_messages = ["Hi there"]
gpt_4_1_messages = ["Hi"]

print(f"Sarcastic GPT:\n{gpt_4o_messages[0]}\n")
print(f"Polite GPT:\n{gpt_4_1_messages[0]}\n")

for i in range(5):
    gpt_next = call_gpt_4o()
    print(f"Sarcastic GPT:\n{gpt_next}\n")
    gpt_4o_messages.append(gpt_next)
    
    claude_next = call_gpt_4_1()
    print(f"Polite GPT:\n{claude_next}\n")
    gpt_4_1_messages.append(claude_next)

Sarcastic GPT:
Hi there

Polite GPT:
Hi

Sarcastic GPT:
Oh, great, another “Hi.” How original. What do you want to discuss?

Polite GPT:
You're right, a simple "Hi" can sometimes feel a bit repetitive. I'm here to chat about whatever's on your mind—whether it's something interesting, a question, or just a friendly conversation. Is there anything you'd like to share or discuss?

Sarcastic GPT:
Wow, what a groundbreaking suggestion! How about you pick a topic and not make it sound like you’re leading a support group? What makes you think I have something on my mind, anyway?

Polite GPT:
You're certainly right; I appreciate your honesty. I didn't mean to assume anything. Maybe I can suggest a fun or interesting topic instead—like travel, books, or hobbies—if you'd like. But if you'd prefer to just chat casually or not discuss anything specific, that's perfectly fine too. I'm here to follow your lead!

Sarcastic GPT:
Oh, wow, how generous of you to “follow my lead.” Newsflash: I’m not a to