## BUSINESS CHALLENGE:

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.


In [51]:
import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display


headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [41]:

ed = Website("https://edwarddonner.com")
print(ed.title)
print(ed.text)

Home - Edward Donner
Home
Connect Four
Outsmart
An arena that pits LLMs against each other in a battle of diplomacy and deviousness
About
Posts
Well, hi there.
I’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (
very
amateur) and losing myself in
Hacker News
, nodding my head sagely to things I only half understand.
I’m the co-founder and CTO of
Nebula.io
. We’re applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I’m previously the founder and CEO of AI startup untapt,
acquired in 2021
.
We work with groundbreaking, proprietary LLMs verticalized for talent, we’ve
patented
our matching model, and our award-winning platform has happy customers and tons of press coverage.
Connec

## First step: Have LLM figure out which links are relevant

### Use a call to LLM to read the links on a webpage, and respond in structured JSON.

In [42]:
import os
from dotenv import load_dotenv
from groq import Groq

load_dotenv()

link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt


def get_links(url):
    website = Website(url)
    client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
    response = client.chat.completions.create(model="llama3-70b-8192",
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)


# huggingface = Website("https://huggingface.co")
# huggingface.links
get_links("https://huggingface.co")

{'links': [{'type': 'about page', 'url': 'https://huggingface.co/brand'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'company page', 'url': 'https://huggingface.co'}]}

## Second step: make the brochure!

Assemble all the details into another prompt

In [43]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [44]:
print(get_all_details("https://huggingface.co"))

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/brand'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'about company', 'url': 'https://huggingface.co'}, {'type': 'blog', 'url': 'https://huggingface.co/blog'}]}
Landing page:
Webpage Title:
Hugging Face – The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Browse 1M+ models
Trending on
this week
Models
deepseek-ai/DeepSeek-Prover-V2-671B
Updated
6 days ago
•
2.03k
•
671
Qwen/Qwen3-235B-A22B
Updated
5 days ago
•
30.5k
•
692
nari-labs/Dia-1.6B
Updated
9 days ago
•
113k
•
1.85k
Qwen/Qwen3-30B-A3B
Updated
6 days ago
•
67.6k
•
452
JetBrains/Mellum-4b-base
Updated
6 days ago
•
654
•
229
Browse 1M+ models
Spaces
Running
5.96k
5.96k
Deep

In [45]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."


In [46]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [47]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/brand'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}]}


'You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nPosts\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\ndeepseek-ai/DeepSeek-Prover-V2-671B\nUpdated\n6 days ago\n•\n2.03k\n•\n671\nQwen/Qwen3-235B-A22B\nUpdated\n5 days ago\n•\n30.5k\n•\n692\nnari-labs/Dia-1.6B\nUpdated\n9 days ago\n•\n113k\n•\n1.85k\nQwen/Qwen3-30B-A3B\nUpdated\n6 days ago\n•\n67.6k\n•\n452\nJetBrains/Mellum-4b-base\nUpdated\n6 days ago\n•\n654\n•\n229\nBrowse 1M+ models\nSpaces\nRunning\n5.96k\n5.96k\nDeepSite\n🐳\nGenerate any a

In [61]:
def create_brochure(company_name, url):
    client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
    response = client.chat.completions.create(model="llama3-70b-8192",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [62]:
create_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company page', 'url': 'https://huggingface.co/brand'}]}


**Hugging Face Brochure**
===============

**The AI Community Building the Future**
---------------------------

Hugging Face is the platform where the machine learning community collaborates on models, datasets, and applications. Our mission is to create a future where AI is developed and used for the betterment of society.

**Explore Our Platform**
------------------------

* **Models**: Browse over 1 million models, discover new ones, and collaborate with others.
* **Datasets**: Access over 250,000 datasets and share your own with the community.
* **Applications**: Explore over 400,000 applications and build your own with our Spaces feature.

**Our Community**
----------------

* **More than 50,000 organizations** are using Hugging Face, including AI2, Amazon, Google, Intel.
* **Over 1 million users** are collaborating on models, datasets, and applications.

**Our Open-Source Stack**
---------------------------

* **Transformers**: State-of-the-art ML for PyTorch, TensorFlow, JAX.
* **Diffusers**: State-of-the-art diffusion models in PyTorch.
* **Safetensors**: Safe way to store/distribute neural network weights.

**Careers**
----------

Join our team of innovators and contributors shaping the future of AI. Check out our **jobs page** for available positions.

**Learn More**
-------------

* **About**: Learn more about our mission and values.
* **Resources**: Access our documentation, blog, forum, and service status.
* **Social**: Follow us on GitHub, Twitter, LinkedIn, and more.

**Get Started**
------------

Sign up for free and start exploring our platform today!

## Finally - a minor improvement

With a small adjustment, we can change this so that the results stream back from LLM,
with the familiar typewriter animation

In [66]:
def stream_brochure(company_name, url):
    client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
    stream = client.chat.completions.create(model="llama3-70b-8192",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
          stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [68]:
stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/brand'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'blog', 'url': 'https://huggingface.co/blog'}]}


**Hugging Face Brochure**
=====================

**The AI Community Building the Future**
-------------------------------------

Hugging Face is the collaboration platform for the machine learning community, empowering the next generation of machine learning engineers, scientists, and end-users to learn, and share their work to build an open and ethical AI future together.

**Platform Overview**
----------------

### Models

* Browse over 1 million+ models
* Discover trending models, updated daily
* Explore AI apps and applications

### Datasets

* Browse over 250,000+ datasets
* Discover updated datasets, daily
* Explore datasets for various ML tasks

### Spaces

* Browse over 400,000+ applications
* Discover running applications, daily
* Explore AI applications and demos

**Community**
--------------

### Open-Source

* Transformers: 143,884 stars
* Diffusers: 28,858 stars
* Safetensors: 3,251 stars
* ...and many more

### Customers

* Over 50,000 organizations using Hugging Face
* Notable customers: Ai2, Amazon, Google, Intel, Microsoft, Grammarly, and more

**Culture**
-------------

### Our Mission

* Empower the next generation of machine learning engineers, scientists, and end-users
* Build an open and ethical AI future together

### Values

* Collaboration: Central place for sharing, exploring, discovering, and experimenting with open-source ML
* Innovation: Fast-growing community, cutting-edge science team, and innovative tools and libraries
* Openness: Building an open and ethical AI future together

**Careers**
--------------

### Join Our Team

* Explore job openings at Hugging Face
* Contribute to building the future of AI

**Get Started**
----------------

### Sign Up

* Create an account and start collaborating on models, datasets, and applications

### Learn More

* Explore our documentation, forum, and resources to learn more about Hugging Face ecosystem

**Contact Us**
-------------------

* [hello@huggingface.com
* GitHub: [@huggingface](https://twitter.com/huggingface)
* Twitter: [@huggingface](https://twitter.com/huggingface)
* LinkedIn: [Hugging Face](https://www.linkedin.com/company/huggingface/)
* Discord: [Hugging Face](https://discord.com/invite/huggingface)

**Stay up-to-date**
-------------------

* Subscribe to our blog for updates, news, and announcements
* Follow us on social media to stay informed about the latest developments