# Summarizing a web page

## Use Case
This program uses a frontier model of OpenAI to summarize a given web page

In [51]:
#import packages
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

In [52]:
# Load environment variables. 
# Here we're loading the OpenAPI Key, instead of hardcoding it in the code.
# This requires keeping the key in .env file. 

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')
MODEL = "gpt-4o-mini"    # The OpenAI mode to use

# Check the key
if not api_key:
    print("No API key was found, please check that .env file exists.")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start with sk-proj-; please check you're using the right key")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them")
else:
    print("API key found, and looks good so far!")

API key found, and looks good so far!


In [53]:
# Create object of OpenAI
openai = OpenAI()

In [54]:
# Define a class to represent a Webpage
# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [55]:
# Define your system prompt
system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [56]:
# Generate our user prompt
# A function that writes a User Prompt that asks for summaries of websites:
def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [57]:
# Combine prompts in the format OpenAI understands
# See how this function creates exactly the format above
def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [58]:
# Call the OpenAI API to summarize the web page for a given URL
def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = MODEL,
        messages = messages_for(website)
    )
    return response.choices[0].message.content

In [59]:
# A function to display summary nicely in the Jupyter output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [60]:
display_summary("https://huggingface.com")

# Hugging Face – The AI Community Building the Future

Hugging Face is a collaborative platform for the machine learning community, focusing on models, datasets, and applications. It provides a space where users can create, discover, and collaborate on machine learning projects.

## Trending Models
This week, some of the popular models include:
- **deepseek-ai/DeepSeek-R1** - 131k updates
- **deepseek-ai/DeepSeek-R1-Distill-Qwen-32B** - 98.1k updates
- **hexgrad/Kokoro-82M** - 37.5k updates
- **deepseek-ai/DeepSeek-R1-Zero** - 6k updates
- **deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B** - 107k updates

## Spaces
Several applications are highlighted, such as:
- **Hunyuan3D-2.0**: Text-to-3D and Image-to-3D Generation
- **Kokoro TTS**: Now in 5 languages
- **IC Light V2** 
- **DeepSeek-R1 WebGPU**: A next-generation reasoning model
- **TRELLIS**: 3D generation from images

## Datasets
Highlighted datasets include:
- **fka/awesome-chatgpt-prompts** 
- **cais/hle**
- **HumanLLMs/Human-Like-DPO-Dataset**
- **yale-nlp/MMVU**

## Services Offered
Hugging Face offers paid compute and enterprise solutions starting at $0.60/hour for GPU and $20/user/month for enterprise. The platform supports collaboration on unlimited public models, datasets, and applications.

## Open Source Contributions
The site emphasizes its commitment to open-source tools, featuring popular libraries such as:
- **Transformers**: For state-of-the-art ML
- **Diffusers**: For image and audio generation
- **Safetensors**: For safely storing model weights

Hugging Face is a resource for both organizations and individuals looking to advance their capabilities in machine learning and AI development, reinforcing its role as a key player in the AI community.