A full business solution
Now we will take our project from Day 1 to the next level
BUSINESS CHALLENGE:
Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.

See the end of this notebook for examples of real-world business applications.

And remember: I'm always available if you have problems or ideas! Please do reach out.

In [2]:
# imports
# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [3]:
# Initialize and constants

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


In [4]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [6]:
ed = Website("https://edwarddonner.com")
ed.links

['https://edwarddonner.com/',
 'https://edwarddonner.com/connect-four/',
 'https://edwarddonner.com/outsmart/',
 'https://edwarddonner.com/about-me-and-about-nebula/',
 'https://edwarddonner.com/posts/',
 'https://edwarddonner.com/',
 'https://news.ycombinator.com',
 'https://nebula.io/?utm_source=ed&utm_medium=referral',
 'https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html',
 'https://patents.google.com/patent/US20210049536A1/',
 'https://www.linkedin.com/in/eddonner/',
 'https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/',
 'https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/',
 'https://edwarddonner.com/2025/05/18/2025-ai-executive-briefing/',
 'https://edwarddonner.com/2025/05/18/2025-ai-executive-briefing/',
 'https://edwarddonner.com/2025/04/21/the-complete-agentic-ai-engineering-course/',
 'https://edwarddonner.com/2025/04/21/the-

<h2>First step: Have GPT-4o-mini figure out which links are relevant
<h3>Use a call to gpt-4o-mini to read the links on a webpage, and respond in structured JSON.

In [7]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}
"""

In [8]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}



In [9]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [10]:
print(get_links_user_prompt(ed))

Here is the list of links on the website of https://edwarddonner.com - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
https://edwarddonner.com/
https://edwarddonner.com/connect-four/
https://edwarddonner.com/outsmart/
https://edwarddonner.com/about-me-and-about-nebula/
https://edwarddonner.com/posts/
https://edwarddonner.com/
https://news.ycombinator.com
https://nebula.io/?utm_source=ed&utm_medium=referral
https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html
https://patents.google.com/patent/US20210049536A1/
https://www.linkedin.com/in/eddonner/
https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/
https://edwarddonner.com/2025/05/28/connecting-my-courses-become-an-llm-expert-and-leader/
https://edwarddo

In [11]:
def get_links(url):
    website = Website(url)
    response=openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system","content": link_system_prompt},
            {"role":"user","content": get_links_user_prompt(website)}
        ],
        response_format={"type":"json_object"}
    )
    result=response.choices[0].message.content
    return json.loads(result)

In [12]:
huggingface = Website("https://huggingface.co")
huggingface.links

['/',
 '/models',
 '/datasets',
 '/spaces',
 '/docs',
 '/enterprise',
 '/pricing',
 '/login',
 '/join',
 '/spaces',
 '/models',
 '/openai/gpt-oss-120b',
 '/openai/gpt-oss-20b',
 '/Qwen/Qwen-Image',
 '/tencent/Hunyuan-1.8B-Instruct',
 '/rednote-hilab/dots.ocr',
 '/models',
 '/spaces/Qwen/Qwen-Image',
 '/spaces/enzostvs/deepsite',
 '/spaces/black-forest-labs/FLUX.1-Krea-dev',
 '/spaces/Wan-AI/Wan-2.2-5B',
 '/spaces/Qwen/Qwen3-Coder-WebDev',
 '/spaces',
 '/datasets/fka/awesome-chatgpt-prompts',
 '/datasets/HuggingFaceH4/Multilingual-Thinking',
 '/datasets/nvidia/Nemotron-Post-Training-Dataset-v1',
 '/datasets/spatialverse/InteriorGS',
 '/datasets/spatialverse/InteriorAgent',
 '/datasets',
 '/join',
 '/pricing#endpoints',
 '/pricing#spaces',
 '/pricing',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/allenai',
 '/facebook',
 '/amazon',
 '/google',
 '/Intel',
 '/microsoft',
 '/grammarly',
 '/Writer',
 '/docs/transformers',


In [13]:
get_links("https://huggingface.co")

{'links': [{'type': 'about page', 'url': 'https://huggingface.co/'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'},
  {'type': 'blog page', 'url': 'https://huggingface.co/blog'},
  {'type': 'company page', 'url': 'https://huggingface.co/huggingface'},
  {'type': 'community page', 'url': 'https://discuss.huggingface.co'},
  {'type': 'GitHub page', 'url': 'https://github.com/huggingface'},
  {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'},
  {'type': 'LinkedIn page',
   'url': 'https://www.linkedin.com/company/huggingface/'}]}

<h3>Second step: make the brochure!
<h4>Assemble all the details into another prompt to GPT4-o

In [16]:
def get_all_details(url):
    result = "Landing Page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:",links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [17]:
print(get_all_details("https://huggingface.co"))

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'Twitter page', 'url': 'https://twitter.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}
Landing Page:
Webpage Title:
Hugging Face – The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Browse 1M+ 

In [20]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."

In [22]:
def get_brochure_user_prompt(company_name,url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5000 characters
    return user_prompt

In [23]:
get_brochure_user_prompt("Hugging Face","https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'company profile on LinkedIn', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'community discussions', 'url': 'https://discuss.huggingface.co'}]}


'You are looking at a company called: Hugging Face\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding Page:\nWebpage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nCommunity\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\nopenai/gpt-oss-120b\nUpdated\nabout 18 hours ago\n•\n325k\n•\n3.04k\nopenai/gpt-oss-20b\nUpdated\nabout 18 hours ago\n•\n1.26M\n•\n2.6k\nQwen/Qwen-Image\nUpdated\n3 days ago\n•\n49.6k\n•\n1.35k\ntencent/Hunyuan-1.8B-Instruct\nUpdated\n3 days ago\n•\n2.66k\n•\n557\nrednote-hilab/dots.ocr\nUpdated\n2 days ago\n•\n12.1k\n•\n526\nBrowse 1M+ models\nSpaces\nRunning\non\nZero\n368\n368\nQwen Im

In [25]:
def create_brochure(company_name,url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role":"system","content":system_prompt},
            {"role":"user","content":get_brochure_user_prompt(company_name,url)}
        ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [26]:
create_brochure("Hugging Face","https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog', 'url': 'https://huggingface.co/blog'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'support forum', 'url': 'https://discuss.huggingface.co'}]}


# Hugging Face Company Brochure

## About Us
**Hugging Face** is the heart of the **AI community** that is passionately building the future of artificial intelligence. Our platform serves as a collaborative environment for machine learning enthusiasts, researchers, and enterprises to share and develop innovative models, datasets, and applications. 

### Our Mission
We aim to democratize AI and foster collaboration where everyone can participate in the advancement of machine learning technology. With a robust library of over **1 million models** and **250k datasets**, we create a dynamic space for contributors to innovate and accelerate their projects.

## Services
- **Models:** Access to a vast collection of state-of-the-art AI models.
- **Datasets:** A comprehensive repository to collaborate, share, and discover datasets tailored for any ML task.
- **Spaces:** A platform to run applications and models in a user-friendly setup.
- **Enterprise Solutions:** Tailored services for organizations that include enhanced security, priority support, and dedicated resources.

## Our Customers
Hugging Face collaborates with more than **50,000 organizations**, including major enterprises such as:
- **AI at Meta**
- **Amazon**
- **Google**
- **Microsoft**
- **Grammarly**

These partnerships not only bolster our community's trust but also enhance our technology offerings.

## Company Culture
At Hugging Face, we believe in a culture of **transparency**, **collaboration**, and **inclusivity**. Our community-driven approach encourages every member to contribute ideas and solutions, propelling innovation. We celebrate diversity and strive to create an environment where everyone feels welcome and valued.

## Careers at Hugging Face
Join us on our mission to build the future of AI! We are continually seeking passionate individuals to contribute to our growing team in various roles, including engineering, research, and community development. 

### Benefits of Working with Us:
- **Flexible work environment** that promotes work-life balance.
- Opportunities for growth and professional **development**.
- Engage with a vibrant community and work alongside some of the brightest minds in AI.

### Current Openings
Explore our careers page to find exciting opportunities to make an impact in the world of AI with Hugging Face!

## Join Our Community
Become a part of **Hugging Face** today! Whether you're looking for resources, need solutions for your enterprise, or want to collaborate with fellow enthusiasts, our platform has something for everyone.

- **Explore our Models:** [Browse Models](https://huggingface.co/models)
- **Get Started:** [Sign Up](https://huggingface.co/signup)
- **Join the Discussion:** [Community Forum](https://huggingface.co/forum)

---

Together, we can shape the future of AI!

<h3>Finally - a minor improvement</h3>
With a small adjustment, we can change this so that the results stream back from OpenAI, with the familiar typewriter animation

In [29]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [30]:
stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'company page', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}]}


# Hugging Face Brochure

**Welcome to Hugging Face – The AI Community Building the Future!**

At Hugging Face, we are at the forefront of the artificial intelligence revolution, providing a collaborative platform where machine learning enthusiasts, researchers, and organizations come together to create, discover, and enhance AI models, datasets, and applications.

---

## Our Vision and Mission

We believe in making artificial intelligence accessible to everyone, fostering an open community where innovation thrives. Our mission is to empower individuals and organizations to build cutting-edge AI solutions while promoting transparency, collaboration, and responsible AI development.

---

## What We Offer

### 1. Extensive Model Repository
- **1M+ ML Models:** Collaborate and access an extensive library of machine learning models.
- **Trending Models:** Stay updated with the latest advancements; for instance, models like `openai/gpt-oss-120b` and `tencent/Hunyuan-1.8B-Instruct` are actively used by thousands.

### 2. Diverse Datasets
- **250k+ Datasets:** A vast array of datasets for varied machine learning tasks to facilitate rich learning experiences.
  
### 3. Interactive Spaces
- **AI Apps & Collaborations:** Explore 400k+ applications and engage in the community-driven projects that are reshaping AI.

### 4. Enterprise Solutions
- **Tailored for Businesses:** We offer comprehensive enterprise-grade solutions that ensure security and robust support for organizations eager to harness AI effectively.

---

## Who Uses Hugging Face?

Over **50,000 organizations** leverage our platform, including industry giants such as:
- **Google**
- **Amazon**
- **Microsoft**
- **Meta**
- **Intel**

Whether you’re a seasoned enterprise or a budding startup, Hugging Face provides the tools necessary to thrive in the AI landscape.

---

## Company Culture

At Hugging Face, our culture revolves around **collaboration, openness, and innovation**. We encourage continuous learning, welcoming insights from our community to improve our products and services. Our diverse workforce thrives in a supportive atmosphere where every voice matters, facilitating personal and professional growth.

### Join Our Community!
We actively seek passionate individuals to join our dynamic team. If you’re looking to make a significant impact in the AI domain while working alongside some of the brightest minds, explore our **[careers page](https://huggingface.co/jobs)** for opportunities.

---

## Why Choose Hugging Face?

- **Collaboration:** Engage with a global community of AI practitioners.
- **Cutting-Edge Technology:** Access to state-of-the-art ML models and tools.
- **Support and Resources:** Extensive documentation and community support to help you along your journey.
- **Scalable Solutions:** From hobbyists to enterprises, our scalable solutions adapt to your needs.

---

**Get involved today and help shape the future of AI with Hugging Face!**  
[Explore more](https://huggingface.co) | [Sign Up](https://huggingface.co/signup) | [Join Our Community](https://huggingface.co/community)

---

*Together, let's build the future of AI, one model at a time.*