### create a brochure using scraper:

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.

See the end of this notebook for examples of real-world business applications.

In [1]:
# imports
# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt

import os
import json
from dotenv import load_dotenv
from IPython.display import Markdown, display, update_display
from scraper import fetch_website_links, fetch_website_contents
from openai import OpenAI

In [2]:
# Initialize and constants

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-5-nano'
openai = OpenAI()

API key looks good so far


In [4]:
links = fetch_website_links("https://github.com")
links

['#start-of-content',
 'https://github.com/events/universe/recap?utm_source=github-banner-recap&utm_medium=web&utm_campaign=universe25post',
 'https://github.com/events/universe/recap?utm_source=github-banner-recap&utm_medium=web&utm_campaign=universe25post',
 '/',
 '/login',
 'https://github.com/features/copilot',
 'https://github.com/features/spark',
 'https://github.com/features/models',
 'https://github.com/security/advanced-security',
 'https://github.com/features/actions',
 'https://github.com/features/codespaces',
 'https://github.com/features/issues',
 'https://github.com/features/code-review',
 'https://github.com/features/discussions',
 'https://github.com/features/code-search',
 'https://github.com/why-github',
 'https://docs.github.com',
 'https://skills.github.com',
 'https://github.blog',
 'https://github.com/marketplace',
 'https://github.com/mcp',
 'https://github.com/features',
 'https://github.com/enterprise',
 'https://github.com/team',
 'https://github.com/enterpris

## First step: Have GPT-5-nano figure out which links are relevant

### Use a call to gpt-5-nano to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

This is an excellent use case for an LLM, because it requires nuanced understanding. Imagine trying to code this without LLMs by parsing and analyzing the webpage - it would be very hard!

Sidenote: there is a more advanced technique called "Structured Outputs" in which we require the model to respond according to a spec. We cover this technique in Week 8 during our autonomous Agentic AI project.

In [3]:
link_system_prompt = """
You are provided with a list of links found on a webpage.
You are able to decide which of the links would be most relevant to include in a brochure about the company,
such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:

{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}
"""

In [4]:
def get_links_user_prompt(url):
    user_prompt = f"""
Here is the list of links on the website {url} -
Please decide which of these are relevant web links for a brochure about the company, 
respond with the full https URL in JSON format.
Do not include Terms of Service, Privacy, email links.

Links (some might be relative links):

"""
    links = fetch_website_links(url)
    user_prompt += "\n".join(links)
    return user_prompt

In [5]:
print(get_links_user_prompt("https://github.com"))


Here is the list of links on the website https://github.com -
Please decide which of these are relevant web links for a brochure about the company, 
respond with the full https URL in JSON format.
Do not include Terms of Service, Privacy, email links.

Links (some might be relative links):

#start-of-content
https://github.com/events/universe/recap?utm_source=github-banner-recap&utm_medium=web&utm_campaign=universe25post
https://github.com/events/universe/recap?utm_source=github-banner-recap&utm_medium=web&utm_campaign=universe25post
/
/login
https://github.com/features/copilot
https://github.com/features/spark
https://github.com/features/models
https://github.com/mcp
https://github.com/features/actions
https://github.com/features/codespaces
https://github.com/features/issues
https://github.com/features/code-review
https://github.com/security/advanced-security
https://github.com/security/advanced-security/code-security
https://github.com/security/advanced-security/secret-protection
ht

In [8]:
def select_relevant_links(url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(url)}
        ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    links = json.loads(result)
    return links
    

In [7]:
select_relevant_links("https://github.com")

{'links': [{'type': 'about page', 'url': 'https://github.com/about'},
  {'type': 'why GitHub page', 'url': 'https://github.com/why-github'},
  {'type': 'team page', 'url': 'https://github.com/team'},
  {'type': 'enterprise page', 'url': 'https://github.com/enterprise'},
  {'type': 'marketplace', 'url': 'https://github.com/marketplace'},
  {'type': 'customer stories', 'url': 'https://github.com/customer-stories'},
  {'type': 'customer stories (enterprise)',
   'url': 'https://github.com/customer-stories?type=enterprise'},
  {'type': 'solutions page', 'url': 'https://github.com/solutions'},
  {'type': 'roadmap', 'url': 'https://github.com/github/roadmap'},
  {'type': 'blog', 'url': 'https://github.blog'},
  {'type': 'newsroom', 'url': 'https://github.com/newsroom'},
  {'type': 'partners', 'url': 'https://github.com/partners'},
  {'type': 'partner program', 'url': 'https://partner.github.com'},
  {'type': 'GitHub status', 'url': 'https://www.githubstatus.com'}]}

In [9]:
def select_relevant_links(url):
    print(f"Selecting relevant links for {url} by calling {MODEL}")
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(url)}
        ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    links = json.loads(result)
    print(f"Found {len(links['links'])} relevant links")
    return links

In [10]:
select_relevant_links("https://huggingface.co")

Selecting relevant links for https://huggingface.co by calling gpt-5-nano
Found 14 relevant links


{'links': [{'type': 'home page', 'url': 'https://huggingface.co/'},
  {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'},
  {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'},
  {'type': 'models catalog', 'url': 'https://huggingface.co/models'},
  {'type': 'datasets catalog', 'url': 'https://huggingface.co/datasets'},
  {'type': 'spaces catalog', 'url': 'https://huggingface.co/spaces'},
  {'type': 'API endpoints', 'url': 'https://endpoints.huggingface.co'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'GitHub', 'url': 'https://github.com/huggingface'},
  {'type': 'Twitter', 'url': 'https://twitter.com/huggingface'},
  {'type': 'LinkedIn', 'url': 'https://www.linkedin.com/company/huggingface/'},
  {'type': 'Discussion forum', 'url': 'https://discuss.huggingface.co'},
  {'type': 'Blog', 'url': 'https://huggingface.co/blog'},
  {'type': 'Discord community', 'url': 'https://huggingface.co/join/discord'}]}

## Second step: make the brochure!

Assemble all the details into another prompt to GPT-5-nano

In [12]:
def fetch_page_and_all_relevant_links(url):
    contents = fetch_website_contents(url)
    relevant_links = select_relevant_links(url)
    result = f"## Landing Page:\n\n{contents}\n## Relevant Links:\n"
    for link in relevant_links['links']:
        result += f"\n\n### Link: {link['type']}\n"
        result += fetch_website_contents(link["url"])
    return result

In [13]:
print(fetch_page_and_all_relevant_links("https://huggingface.co"))

Selecting relevant links for https://huggingface.co by calling gpt-5-nano
Found 3 relevant links
## Landing Page:

Hugging Face ‚Äì The AI community building the future.

Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Browse 1M+ models
Trending on
this week
Models
MiniMaxAI/MiniMax-M2
Updated
6 days ago
‚Ä¢
810k
‚Ä¢
1.05k
moonshotai/Kimi-Linear-48B-A3B-Instruct
Updated
4 days ago
‚Ä¢
19.4k
‚Ä¢
347
deepseek-ai/DeepSeek-OCR
Updated
1 day ago
‚Ä¢
2.25M
‚Ä¢
2.46k
briaai/FIBO
Updated
2 days ago
‚Ä¢
3.03k
‚Ä¢
221
dx8152/Qwen-Edit-2509-Multiple-angles
Updated
about 14 hours ago
‚Ä¢
218
Browse 1M+ models
Spaces
Running
on
CPU Upgrade
1.38k
1.38k
The Smol Training Playbook: The Secrets to Building World-Class LLMs
üìù
Running
15.7k
15.7k
DeepSite v3
üê≥
Generate any application by Vibe Coding
Runnin

In [18]:
brochure_system_prompt = """
You are an assistant that analyzes the contents of several relevant pages from a company website
and creates a short brochure about the company for prospective customers, investors and recruits.
Respond in markdown without code blocks.
Include details of company culture, customers and careers/jobs if you have the information.
"""

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# brochure_system_prompt = """
# You are an assistant that analyzes the contents of several relevant pages from a company website
# and creates a short, humorous, entertaining, witty brochure about the company for prospective customers, investors and recruits.
# Respond in markdown without code blocks.
# Include details of company culture, customers and careers/jobs if you have the information.
# """


In [14]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"""
You are looking at a company called: {company_name}
Here are the contents of its landing page and other relevant pages;
use this information to build a short brochure of the company in markdown without code blocks.\n\n
"""
    user_prompt += fetch_page_and_all_relevant_links(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [15]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

Selecting relevant links for https://huggingface.co by calling gpt-5-nano
Found 12 relevant links


'\nYou are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages;\nuse this information to build a short brochure of the company in markdown without code blocks.\n\n\n## Landing Page:\n\nHugging Face ‚Äì The AI community building the future.\n\nHugging Face\nModels\nDatasets\nSpaces\nCommunity\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\nMiniMaxAI/MiniMax-M2\nUpdated\n6 days ago\n‚Ä¢\n810k\n‚Ä¢\n1.05k\nmoonshotai/Kimi-Linear-48B-A3B-Instruct\nUpdated\n4 days ago\n‚Ä¢\n19.4k\n‚Ä¢\n347\ndeepseek-ai/DeepSeek-OCR\nUpdated\n1 day ago\n‚Ä¢\n2.25M\n‚Ä¢\n2.46k\nbriaai/FIBO\nUpdated\n2 days ago\n‚Ä¢\n3.03k\n‚Ä¢\n221\ndx8152/Qwen-Edit-2509-Multiple-angles\nUpdated\nabout 14 hours ago\n‚Ä¢\n218\nBrowse 1M+ models\nSpaces\nRunning\no

In [16]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model="gpt-5-nano",
        messages=[
            {"role": "system", "content": brochure_system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
        ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [19]:
create_brochure("HuggingFace", "https://huggingface.co")

Selecting relevant links for https://huggingface.co by calling gpt-5-nano
Found 11 relevant links


# Hugging Face ‚Äî The AI community building the future

Hugging Face is a collaborative platform where the machine learning community comes together to build, share, and deploy models, datasets, and applications. With a strong emphasis on openness and collaboration, we empower researchers, developers, and organizations to move faster in AI.

## Who we are
- The home of machine learning collaboration: a vibrant community shaping the future of AI.
- An open, community-driven platform offering access to models, datasets, spaces, and documentation.
- A hub for innovation across all modalities: text, image, video, audio, and even 3D.

## What we offer
- Open collaboration on unlimited public models, datasets, and applications.
- A robust open-source stack to accelerate development and experimentation.
- A space to showcase work and build your ML portfolio for visibility in the community.

## The platform at a glance
- Models: Browse 1M+ models created by the community.
  - Examples trending this week include MiniMaxAI/MiniMax-M2, moonshotai/Kimi-Linear-48B-A3B-Instruct, deep Seek-OCR, and more.
- Datasets: Access 250k+ datasets to train, evaluate, and benchmark.
- Spaces: Interactive applications and demos to deploy and share AI experiences.
  - Spaces run across CPU and other environments, with popular projects like The Smol Training Playbook and DeepSite v3.
- All modalities: Text, image, video, audio, and 3D support.
- Build your ML portfolio: Share work with the world and grow your professional profile.

## Why it‚Äôs unique
- The HF Open Source stack powers faster innovation and collaboration.
- A single platform to explore, build, and deploy across models, datasets, and apps.
- A thriving community with ongoing activity, updates, and shared learnings.

## For customers, teams and enterprises
- Accelerate ML with paid compute and enterprise solutions.
- Team & Enterprise: The most advanced platform to build AI with enterprise-grade features.
- Move faster with a single platform that scales with your organization‚Äôs needs.

## Culture and community
- A community-first philosophy: collaboration, openness, and shared progress are core.
- Active, ongoing contribution across datasets, models, documentation, and tooling.
- A culture of learning and growing together through open resources and shared projects.

## Careers and joining the team
- Hugging Face values talent who thrive in a collaborative, open-source environment.
- If you‚Äôre excited to contribute to the future of AI, check our site for opportunities and browse roles across teams and disciplines.

## Get involved
- Explore AI Apps and browse 1M+ models to find inspiration or open-source projects to contribute to.
- Browse 250k+ datasets to fuel your research and development.
- Explore Spaces to see real-world applications and demos built on the platform.
- Sign up to join the community, publish your work, and collaborate with others.

For more details and current openings, visit Hugging Face‚Äôs site and join the growing community of researchers, developers, and enterprises building AI together.

## Finally - a minor improvement

With a small adjustment, we can change this so that the results stream back from OpenAI,
with the familiar typewriter animation

In [21]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model="gpt-5-nano",
        messages=[
            {"role": "system", "content": brochure_system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        update_display(Markdown(response), display_id=display_handle.display_id)

In [22]:
stream_brochure("HuggingFace", "https://huggingface.co")

Selecting relevant links for https://huggingface.co by calling gpt-5-nano
Found 12 relevant links


# Hugging Face ‚Äî The AI Community Building the Future

Hugging Face is the home of the machine learning community, a platform where people collaborate on models, datasets, and applications. We‚Äôre on a mission to democratize good machine learning, one commit at a time.

## What we offer

- Models, Datasets, Spaces, Community, Docs, Enterprise, Pricing
- Explore AI Apps and browse 1M+ models and 250k+ datasets
- Spaces to run and share interactive AI applications
- Open source stack designed to move faster and empower collaboration
- Support for all modalities: text, image, video, audio, and even 3D
- Build and share your ML portfolio with the world

## The platform in action

- Browse and contribute to a thriving ecosystem of models like MiniMax-M2, Kimi-Linear-48B, DeepSeek-OCR, FIBO, and more
- Run applications with Spaces such as ‚ÄúThe Smol Training Playbook,‚Äù DeepSite v3, Wan2.2 Animate, and Dream-wan2-2-faster-Pro
- Access a rich collection of datasets, including NVIDIA‚Äôs PhysicalAI Autonomous Vehicles, awesome prompts, and many community-curated resources

## For customers and enterprises

- Enterprise-grade platform to scale AI with security and governance
- Access controls, single sign-on (SSO), and regional data management
- Audit logs for oversight and compliance
- Team pricing starting at $20 per user/month; flexible enterprise options available
- Dedicated support and enterprise features to accelerate ML initiatives

## For developers, researchers, and the community

- Host and collaborate on unlimited public models, datasets, and applications
- Leverage the HF Open Source stack to move faster and collaborate globally
- Create, discover, and showcase your ML projects; build a robust ML portfolio
- Explore all modalities and build end-to-end AI solutions with ease

## Culture and impact

- Mission-driven: democratize good machine learning, one commit at a time
- A vibrant, collaborative community with thousands of active contributors
- A culture that values openness, collaboration, and shared progress

## Careers and opportunities

- Hugging Face is actively hiring and maintains a page for Current Openings
- Join a growing global team of experts passionate about AI and open science
- If you‚Äôre excited by democratizing ML and contributing to a thriving ecosystem, Hugging Face welcomes you

## How to get started

- Sign up to join the community and start exploring Models, Datasets, and Spaces
- Explore paid Compute and Enterprise solutions if you‚Äôre coordinating a team or organization
- Visit the Enterprise Hub to learn about security, access controls, SSO, regions, and audit capabilities

## Press and contacts

- For press inquiries, Hugging Face provides a contact channel for the media team

## Quick takeaways

- The AI community building the future through collaboration, openness, and a powerful open-source stack
- A broad platform spanning models, datasets, spaces, and enterprise solutions
- A culture focused on democratizing ML and empowering people to contribute and build

Visit Hugging Face to explore models, datasets, spaces, and opportunities to collaborate or join the team.

In [22]:
# Try changing the system prompt to the humorous version when you make the Brochure for Hugging Face:

stream_brochure("HuggingFace", "https://huggingface.co")

Selecting relevant links for https://huggingface.co by calling gpt-5-nano
Found 14 relevant links


# Welcome to Hugging Face ‚Äì The AI Community Building the Future!

---

## Who Are We?  
Imagine a place where machine learning wizards, data sorcerers, and AI alchemists gather to share their spells ‚Äî uh, models ‚Äî datasets, and apps. That‚Äôs Hugging Face! We‚Äôre *the* platform where the AI community collaborates, creates, and sometimes even has a little fun while building the future.

Our motto? **"Keep it open. Keep it ethical. Keep it hugging."** üíõ

---

## What‚Äôs Cooking in the AI Kitchen?

- **1 Million+ Models** ‚Äî From image generators to language wizards, our treasure trove of open-source ML models grows faster than you can say "neural network."  
- **250,000+ Datasets** ‚Äî Feeding AI brains with everything from chat prompts to persona profiles. Hungry for data? Dig in!  
- **400,000+ Applications & Spaces** ‚Äî Launch apps, share your ML portfolio, or just show off cool demos that make your friends say, ‚ÄúWhoa, AI can do that?‚Äù  
- **Multimodal Madness** ‚Äî Text, image, video, audio, even 3D...if AI had a Swiss Army knife, we‚Äôd be it.  

---

## Customers & Community  
Whether you‚Äôre a student trying to get your AI feet wet, a startup looking to scale your genius, or an enterprise aiming to deploy heavy-duty models in the real world, Hugging Face has your back.

With the fastest growing community of *machine learning enthusiasts* and the support of some seriously big names and organizations, here‚Äôs a place where:

- **Freelancers** can build a portfolio and get noticed.  
- **Researchers** can push boundaries openly and ethically.  
- **Businesses** can accelerate AI adoption with our paid Compute and Enterprise suites.  

Join 1.29k+ Spaces and thousands more running models that power everything from video generation to AI-powered image editing.

---

## Culture & Career ‚Äì Geek Out with Us!  
We believe collaboration beats isolation every day. Our culture?

- Open source at heart ‚ù§Ô∏è  
- Ethical AI advocates  
- Casual tea-drinkers and serious problem solvers  
- Always learning, always sharing, always growing  

Want to build machine learning tools that millions will use? Hugging Face is where your skills meet endless possibilities. From ML engineers to community managers, our doors are wide open (virtual hugs included).

---

## Speed Up Your AI Journey  
No need to code in the dark alone or fight for GPU time ‚Äî deploy models and apps with a few clicks on optimized inference endpoints, starting at just $0.60/hour for GPU!

Whether you want to host that killer new model or just tweak an existing one, we give you the tools and community support to **move faster, build smarter, and hug tighter**.

---

## Quick Hugging Face Facts  
- **Founded:** Around the corner from the future  
- **Colors:** Bright yellow (#FFD21E), orange (#FF9D00), and sleek gray (#6B7280) ‚Äî because AI should be as vibrant as its ideas!  
- **Mascot:** Friendly face with a warm smile (because AIs could learn a thing or two about friendliness here)  

---

## Ready to Join the AI Hug Circle?  

Sign up, share your work, explore millions of models and datasets, and get your AI career (or project!) hugging new heights.

[Explore AI Apps](#) | [Browse 1M+ Models](#) | [Sign Up & Join The Fun](#)

---

*Hugging Face ‚Äî where the future of AI isn‚Äôt just created; it‚Äôs hugged into existence.* ü§ó‚ú®