In [1]:
# imports

import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

# If you get an error running this cell, then please head over to the troubleshooting notebook!

In [2]:
# Load environment variables in a file called .env

load_dotenv(override=True)
api_key = os.getenv('PERPLEXITY_API_KEY')

# Check the key

if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")


An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook


In [3]:
#openai = OpenAI()
openai = OpenAI(api_key=api_key, base_url="https://api.perplexity.ai")

# If this doesn't work, try Kernel menu >> Restart Kernel and Clear Outputs Of All Cells, then run the cells from the top of this notebook down.
# If it STILL doesn't work (horrors!) then please see the Troubleshooting notebook in this folder for full instructions

In [4]:
# To give you a preview -- calling OpenAI with these messages is this easy. Any problems, head over to the Troubleshooting notebook.

message = "Hello, GPT! This is my first ever message to you! Hi!"
response = openai.chat.completions.create(model="sonar-pro", messages=[{"role":"user", "content":message}])
print(response.choices[0].message.content)

Hello! Thank you for your warm greeting. I'm here to help answer your questions, explain topics, brainstorm ideas, or just chat—whatever you need[1][3][5]. 

If you want to get the most out of this conversation, feel free to ask about anything specific or let me know what you're curious about. The more details or keywords you provide, the better I can tailor my responses[1]. But I'm also happy to just say hi back and welcome you!


In [5]:
# A class to represent a Webpage
# If you're not familiar with Classes, check out the "Intermediate Python" notebook

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [6]:
# Let's try one out. Change the website and add print statements to follow along.

ed = Website("https://edwarddonner.com")
print(ed.title)
print(ed.text)

Home - Edward Donner
Home
Connect Four
Outsmart
An arena that pits LLMs against each other in a battle of diplomacy and deviousness
About
Posts
Well, hi there.
I’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (
very
amateur) and losing myself in
Hacker News
, nodding my head sagely to things I only half understand.
I’m the co-founder and CTO of
Nebula.io
. We’re applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I’m previously the founder and CEO of AI startup untapt,
acquired in 2021
.
We work with groundbreaking, proprietary LLMs verticalized for talent, we’ve
patented
our matching model, and our award-winning platform has happy customers and tons of press coverage.
Connec

In [7]:
# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish."

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [8]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [9]:
print(user_prompt_for(ed))

You are looking at a website titled Home - Edward Donner
The contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.

Home
Connect Four
Outsmart
An arena that pits LLMs against each other in a battle of diplomacy and deviousness
About
Posts
Well, hi there.
I’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (
very
amateur) and losing myself in
Hacker News
, nodding my head sagely to things I only half understand.
I’m the co-founder and CTO of
Nebula.io
. We’re applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I’m previously the founder and CEO of AI startup untapt,
acqui

## Messages

The API from OpenAI expects to receive messages in a particular structure.
Many of the other APIs share this structure:

```python
[
    {"role": "system", "content": "system message goes here"},
    {"role": "user", "content": "user message goes here"}
]
```
To give you a preview, the next 2 cells make a rather simple call - we won't stretch the mighty GPT (yet!)

In [10]:
messages = [
    {"role": "system", "content": "You are a snarky assistant"},
    {"role": "user", "content": "What is 2 + 2?"}
]

In [11]:
# To give you a preview -- calling OpenAI with system and user messages:

response = openai.chat.completions.create(model="sonar-pro", messages=messages)
print(response.choices[0].message.content)

**2 + 2 equals 4.** This is a basic arithmetic fact, universally taught and recognized as mathematically correct by all standard mathematical principles[3][4].

Some sources playfully or philosophically reference "2 + 2 = 5" as an example of a deliberate logical error, most famously in George Orwell's *Nineteen Eighty-Four*, but in actual math, the answer is **4**[4].


In [12]:
# See how this function creates exactly the format above

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [13]:
# Try this out, and then try for a few more websites

messages_for(ed)

[{'role': 'system',
  'content': 'You are an assistant that analyzes the contents of a website and provides a short summary, ignoring text that might be navigation related. Respond in markdown.'},
 {'role': 'user',
  'content': 'You are looking at a website titled Home - Edward Donner\nThe contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.\n\nHome\nConnect Four\nOutsmart\nAn arena that pits LLMs against each other in a battle of diplomacy and deviousness\nAbout\nPosts\nWell, hi there.\nI’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (\nvery\namateur) and losing myself in\nHacker News\n, nodding my head sagely to things I only half understand.\nI’m the co-founder and CTO of\nNebula.io\n. We’re applying AI to a field where it can make a

In [14]:
# And now: call the OpenAI API. You will get very familiar with this!

def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "sonar-pro",
        messages = messages_for(website)
    )
    return response.choices[0].message.content

In [15]:
summarize("https://edwarddonner.com")

'The website "Home - Edward Donner" introduces **Ed Donner**, co-founder and CTO of Nebula.io, an AI-driven platform focused on transforming talent discovery and recruitment through proprietary large language models (LLMs)[2]. Ed is passionate about software development, experimenting with LLMs, DJing, and electronic music production, and he shares these interests with the site\'s visitors.\n\nKey sections and features of the website include:\n\n- **Professional Background:** Ed describes Nebula.io as using AI to help people discover and pursue their potential. The platform enables recruiters to source, understand, engage, and manage talent more effectively with technology that automates candidate matching and communication[2].\n- **Personal Interests:** Ed details his enthusiasm for coding, LLM experimentation, and creative hobbies.\n- **Interactive Projects:** The site features AI/LLM-related activities such as Connect Four and "Outsmart," an arena where various LLMs compete in games

In [16]:
# A function to display this nicely in the Jupyter output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [20]:
display_summary("https://edwarddonner.com")

The website "Home - Edward Donner" serves as a personal and professional hub for **Ed Donner**, co-founder and CTO of Nebula.io. It highlights his interests in coding, large language models (LLMs), DJing, and music production. Ed shares his enthusiasm for experimenting with AI and invites like-minded visitors to connect[2].

Key features of the website include:

- **Professional Background:** Ed is CTO of Nebula.io, a company focused on using AI to help people discover their potential, primarily by enhancing recruitment and talent management with proprietary LLMs and patented matching models. He was previously the founder and CEO of AI startup untapt, acquired in 2021[2].

- **Projects & Experiments:** The site features interactive AI projects such as "Connect Four" and "Outsmart," an arena where different LLMs compete in diplomacy and strategy, serving as a playful testbed for LLM capabilities[2][5].

- **Updates and Announcements:** Recent posts include educational offerings like "Connecting my courses – become an LLM expert and leader" (May 28, 2025), "2025 AI Executive Briefing" (May 18, 2025), "The Complete Agentic AI Engineering Course" (April 21, 2025), and resources for an "LLM Workshop – Hands-on with Agents" (January 23, 2025). These indicate ongoing activity and a focus on AI education and executive engagement.

- **Contact & Community:** Ed encourages readers to connect via email and social platforms, and there’s an option to subscribe to a newsletter for future updates[2].

Overall, the website centers on Ed Donner's work with AI in talent technology, his experimental approach to LLMs, and provides educational resources and interactive tools for the AI community.

# Let's try more websites

Note that this will only work on websites that can be scraped using this simplistic approach.

Websites that are rendered with Javascript, like React apps, won't show up. See the community-contributions folder for a Selenium implementation that gets around this. You'll need to read up on installing Selenium (ask ChatGPT!)

Also Websites protected with CloudFront (and similar) may give 403 errors - many thanks Andy J for pointing this out.

But many websites will work just fine!

In [21]:
display_summary("https://cnn.com")

The CNN website titled **"Breaking News, Latest News and Videos"** is a comprehensive global news portal that delivers up-to-date coverage across a wide range of topics, including:

- **World News:** Ongoing coverage of major international events such as the Ukraine-Russia War, Israel-Hamas War, and key diplomatic developments (e.g., US-China trade talks, ceasefire negotiations in Southeast Asia).
- **US News & Politics:** Reports on American politics, the 2025 elections, Trump-related developments, and policy issues.
- **Business & Technology:** Insights on the economy, cybersecurity incidents, and technological advances.
- **Health & Science:** Reporting on public health, scientific discoveries, environmental issues (like algal blooms and efforts to release mosquitoes by drone), and climate change.
- **Entertainment, Style, and Culture:** Features on celebrity news, arts, design, fashion trends, and popular culture moments.
- **Sports:** Updates on global and US sports, including stories on major personalities and events.
- **Special Coverage:** Photo essays, video reports, and live updates on current crises, humanitarian needs (such as Gaza aid efforts), and feature stories (for example, about Hong Kong’s mahjong carvers and the plight of pet owners during economic hardship).

The website also offers live TV, audio podcasts, interactive games, and multimedia features. Additionally, CNN uses machine learning to tailor content recommendations based on user interests and provides feedback options for site usability and advertisement relevance.

In [None]:
display_summary("https://anthropic.com")

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business applications</h2>
            <span style="color:#181;">In this exercise, you experienced calling the Cloud API of a Frontier Model (a leading model at the frontier of AI) for the first time. We will be using APIs like OpenAI at many stages in the course, in addition to building our own LLMs.

More specifically, we've applied this to Summarization - a classic Gen AI use case to make a summary. This can be applied to any business vertical - summarizing the news, summarizing financial performance, summarizing a resume in a cover letter - the applications are limitless. Consider how you could apply Summarization in your business, and try prototyping a solution.</span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you continue - now try yourself</h2>
            <span style="color:#900;">Use the cell below to make your own simple commercial example. Stick with the summarization use case for now. Here's an idea: write something that will take the contents of an email, and will suggest an appropriate short subject line for the email. That's the kind of feature that might be built into a commercial email tool.</span>
        </td>
    </tr>
</table>

In [None]:
# Step 1: Create your prompts

system_prompt = "something here"
user_prompt = """
    Lots of text
    Can be pasted here
"""

# Step 2: Make the messages list

messages = [] # fill this in

# Step 3: Call OpenAI

response =

# Step 4: print the result

print(

## An extra exercise for those who enjoy web scraping

You may notice that if you try `display_summary("https://openai.com")` - it doesn't work! That's because OpenAI has a fancy website that uses Javascript. There are many ways around this that some of you might be familiar with. For example, Selenium is a hugely popular framework that runs a browser behind the scenes, renders the page, and allows you to query it. If you have experience with Selenium, Playwright or similar, then feel free to improve the Website class to use them. In the community-contributions folder, you'll find an example Selenium solution from a student (thank you!)

# Sharing your code

I'd love it if you share your code afterwards so I can share it with others! You'll notice that some students have already made changes (including a Selenium implementation) which you will find in the community-contributions folder. If you'd like add your changes to that folder, submit a Pull Request with your new versions in that folder and I'll merge your changes.

If you're not an expert with git (and I am not!) then GPT has given some nice instructions on how to submit a Pull Request. It's a bit of an involved process, but once you've done it once it's pretty clear. As a pro-tip: it's best if you clear the outputs of your Jupyter notebooks (Edit >> Clean outputs of all cells, and then Save) for clean notebooks.

Here are good instructions courtesy of an AI friend:  
https://chatgpt.com/share/677a9cb5-c64c-8012-99e0-e06e88afd293