## Basic Webscraper using BeautifulSoup

In [22]:
import requests
from bs4 import BeautifulSoup

headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

ed = Website("https://edwarddonner.com")
print(ed.title)
print(ed.text)

Home - Edward Donner
Home
Connect Four
Outsmart
An arena that pits LLMs against each other in a battle of diplomacy and deviousness
About
Posts
Well, hi there.
I’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (
very
amateur) and losing myself in
Hacker News
, nodding my head sagely to things I only half understand.
I’m the co-founder and CTO of
Nebula.io
. We’re applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I’m previously the founder and CEO of AI startup untapt,
acquired in 2021
.
We work with groundbreaking, proprietary LLMs verticalized for talent, we’ve
patented
our matching model, and our award-winning platform has happy customers and tons of press coverage.
Connec

## Test local llm
ollama pull llama3.2
ollama run llama3.2

In [23]:
import os
from dotenv import load_dotenv
import requests

load_dotenv()

messages = [{
    "role": "system",
    "content":
    "You are a helpful assistant. You reply with very short answers."
}, {"role": "user", "content": 'How are you'}]

payload = {
    "model": "llama3.2",
    "messages": messages,
    "stream": False,
}

response = requests.post(os.environ.get("OLLAMA_API"), 
                         headers={"Content-Type": "application/json"}, 
                         json=payload)

print ("Assistant:", response.json()['message']['content'])

Assistant: I'm functioning properly.


## Basic Static Website Summariser using local llm

In [24]:
from IPython.display import display, Markdown

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

def summarize(url):
    website = Website(url)

    payload = {
        "model": "llama3.2",
        "messages": messages_for(website),
        "stream": False,
    }

    response = requests.post(os.environ.get("OLLAMA_API"), 
                         headers={"Content-Type": "application/json"}, 
                         json=payload)
    return response.json()['message']['content']

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [25]:
display_summary("https://edwarddonner.com")

**Summary**
===============

### About the Author and Organization

The website is about Edward Donner, a co-founder and CTO of Nebula.io, an AI startup that applies AI to help people discover their potential. He also shares his personal interests, including coding, DJing, and reading Hacker News.

### Recent Announcements and News

* **April 21, 2025:** The Complete Agentic AI Engineering Course
* **January 23, 2025:** LLM Workshop – Hands-on with Agents – resources
* **December 21, 2024:** Welcome, SuperDataScientists!
* **November 13, 2024:** Mastering AI and LLM Engineering – Resources

### Featured Content

* **Outsmart**: An arena that pits LLMs against each other in a battle of diplomacy and deviousness

In [26]:
display_summary("https://cnn.com")

# Summary of CNN Website

## Overview
The CNN website is a news and information portal that provides comprehensive coverage of global events, politics, business, health, entertainment, and more.

## Latest News
- The United States has imposed new tariffs on millions of Americans' purchases.
- China's electric vehicle industry is preparing to take on the world.
- Mondo Duplantis talks exclusively with CNN about his record-breaking year.
- Andrew Trump proposes a $1 trillion defense budget that slashes education, foreign aid, environment, health, and public assistance.

## Topics
The website features a wide range of topics, including:
- Space and science
- Global travel
- Business
- Health and wellness
- Technology
- Politics
- Entertainment

## Live TV and Audio
- Users can watch live CNN TV channels.
- Access various podcasts, including "Chasing Life with Dr. Sanjay Gupta" and "The Assignment with Audie Cornish".

## Interactive Features
- The website offers interactive games like Crosswords, Sudoku, and Jumble Crossword.
- Viewers can explore the latest news in photos.

## Subscription and Settings
- Users can sign up for newsletters and topics they follow.
- Adjust settings to customize their viewing experience.

In [27]:
display_summary("https://anthropic.com")

# Anthropic Website Summary
=====================================

## Mission and Values

Anthropic aims to build AI that serves humanity's long-term well-being. They prioritize responsible development, focusing on tools with human benefit at their foundation.

## Products and Research

*   **Claude 3.7 Sonnet**: Their most intelligent AI model, available for use.
*   **Research Overview**: Daily research, policy work, and product design aimed at demonstrating responsible AI development.
*   **Alignment Science**: Studies on aligning large language models with human values.

## News and Announcements

### Recent Articles

*   **Mar 27, 2025: Anthropic Economic Index** - Societal impacts of AI
*   **Mar 27, 2025: Tracing the thoughts of a large language model** - Interpretability research
*   **Feb 24, 2025: Claude's extended thinking** - Product announcement

### Open Roles

Anthropic is seeking help to build the future of safe AI. Check out open roles on their website.

## FAQs and Resources

*   **Claude Code**: Available for use.
*   **API Platform**: Developer documentation and pricing plans.
*   **Research Overview**: Learn more about Anthropic's research efforts.

### Partnerships

Anthropic has partnerships with Google Cloud's Vertex AI and Amazon Bedrock.