## Basic Webscraper using BeautifulSoup

In [16]:
import requests
from bs4 import BeautifulSoup

headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

ed = Website("https://edwarddonner.com")
print(ed.title)
print(ed.text)

Home - Edward Donner
Home
Connect Four
Outsmart
An arena that pits LLMs against each other in a battle of diplomacy and deviousness
About
Posts
Well, hi there.
I’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (
very
amateur) and losing myself in
Hacker News
, nodding my head sagely to things I only half understand.
I’m the co-founder and CTO of
Nebula.io
. We’re applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I’m previously the founder and CEO of AI startup untapt,
acquired in 2021
.
We work with groundbreaking, proprietary LLMs verticalized for talent, we’ve
patented
our matching model, and our award-winning platform has happy customers and tons of press coverage.
Connec

## Basic System and User prompting for testing

In [17]:
import os
from dotenv import load_dotenv
from groq import Groq

load_dotenv()

messages = [{
    "role": "system",
    "content":
    "You are a helpful assistant. You reply with very short answers."
}, {"role": "user", "content": 'How are you'}]

client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
response = client.chat.completions.create(model="llama3-70b-8192",
                                        messages=messages,
                                        max_tokens=100,
                                        temperature=1.2)

print("Assistant:", response.choices[0].message.content)    

Assistant: I'm good!


## Basic Static Website Summariser

In [18]:
from IPython.display import display, Markdown

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

def summarize(url):
    website = Website(url)

    response = client.chat.completions.create(
        model="llama3-70b-8192",
        messages=messages_for(website))
    return response.choices[0].message.content

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [22]:
display_summary("https://edwarddonner.com")

**Summary**
### About the Author
Ed Donner is a co-founder and CTO, with a passion for writing code, experimenting with LLMs, and DJing. He's previously founded another AI startup, which was acquired.

### Recent Posts

* **The Complete Agentic AI Engineering Course** (April 21, 2025
* **LLM Workshop – Hands-on with Agents – resources** (January 23, 2025)
* **Welcome, SuperDataScientists!** (December 21, 2024)
* **Mastering AI and LLM Engineering – Resources** (November 13, 2024)

### Projects
* **Nebula.io**: Applying AI to help people discover their potential and pursue their reason for being.
* **Outsmart**: An arena where LLMs compete against each other in diplomacy and strategy.

In [21]:
display_summary("https://cnn.com")

**Breaking News, Latest News and Videos | CNN**
===========================================================

CNN provides up-to-date news from around the world, covering various topics including:

### Top Stories

* Australia's center-left Labor Party wins election
* Trump slams Canada, Rubio spars with German Foreign Ministry
* Prince Harry says King Charles won't speak to him due to security issues
* Migrant scales tree to avoid ICE arrest in Texas
* Elon Musk's ethical minefield, mapped

### Featured Sections

* **Space and Science**: First image from world's largest solar telescope, massive molecular cloud close to Earth
* **Global Travel**: 1,600-year-old megastructure, hedonistic party capital of Europe
* **Global Business**: A massive tariff on millions of Americans' purchases, S&P 500 posts longest winning streak in 20 years
* **Style**: The artisan making warrior prints for modern Japan, why young people are getting into watchmaking
* **Sports**: Breaking records, Olympic gold medalist talks to CNN, Harry Maguire scores goal of a lifetime

### In Case You Missed It**

* 2025 nominees, translating Chinese food names into English, Ukrainian journalist's body returned with signs of torture
* **Fact Check**: Trump lies about serving three terms, Amazon considered breaking out a tariff charge
* **Photos You Should See**: Defining photos of Trump's first 100 days, people we've lost in 2025

In [20]:
display_summary("https://anthropic.com")

**Anthropic: AI Research and Products for Safety and Well-being**
============================================================

Anthropic is a company focused on building AI that serves humanity's long-term well-being, with a focus on safety and responsible scaling.

### News and Announcements
--------------------

* **Claude 3.7 Sonnet**: The latest AI model from Anthropic, offering improved performance and capabilities.
* **Research and Development**: Recent papers and releases on topics such as interpretability, alignment science, and the introduction of the Model Context Protocol.

### Products and Solutions
---------------------------

* **Claude**: A large language model for building AI-powered applications and custom experiences.
* **API Platform**: A platform for developers to build with Claude, with documentation and pricing plans.
* **Solutions**: AI agents, coding, customer support, and more.

### Research and Commitments
-----------------------------

* **Research Overview**: Papers and publications on AI safety, economic index, and Claude models.
* **Commitments**: Transparency, responsible scaling policy, security, and compliance.