# Brochure for a company based on their Webpage

This notebook will build a brochure for a company based on their name and their home page. 

## Initial Setup

In [2]:
import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI


In [3]:
load_dotenv(override=True)

api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print('API key looks good')
else:
    print('Incorrect API key')
    


API key looks good


In [4]:
MODEL = 'gpt-4o-mini'
openai = OpenAI()

In [5]:

headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

## Use GPT-4o-mini to figure out which links are relevant

Use a call to gpt-4o-mini to read the links on a webpage, and respond in structured JSON. 
It will decide which links are relevant, and replace relative links such as "/about" with the full url. 

In [6]:
# System prompt for the gpt call to get relevant links

link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "full url"},
        {"type": "careers page": "url": "another full url"}
    ]
}
"""

In [7]:
# Function to get user prompt 

def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [9]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

## Make the brochure

In [24]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [18]:
# System prompt for generating brochure

system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

In [19]:
# Function to get user prompt

def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [22]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [25]:
create_brochure("HuggingFace", "https://huggingface.co")

# Hugging Face Brochure

## Welcome to Hugging Face: The AI Community Building the Future

At **Hugging Face**, we are dedicated to creating a collaborative platform for the machine learning community. Our mission is to accelerate innovation by providing tools, models, and resources that empower developers, researchers, and organizations to create groundbreaking AI applications.

---

### What We Offer

- **Models**: Access over **400,000** cutting-edge machine learning models, ranging from natural language processing to computer vision.
- **Datasets**: Discover and share **100,000+ datasets** curated for various ML tasks to support your projects.
- **Spaces**: Explore **150,000+ applications** to find solutions that inspire and demonstrate the power of machine learning.
  
#### Unique Features:

- **Open Source Collaboration**: Leverage our open-source tools like Transformers and Diffusers to build sophisticated ML models with community support.
- **Compute Solutions**: Deploy optimized inference endpoints and exclusive enterprise solutions to accelerate ML workflows.

---

### Our Culture

At Hugging Face, we believe in fostering a culture of inclusivity, innovation, and collaboration. We encourage our team members to challenge the status quo and share their insights. Our open-source ethos ensures that everyone has a voice, and we value contributions from diverse backgrounds and experiences.

---

### Who We Serve

Hugging Face is proud to support **over 50,000 organizations**, including industry giants like:

- **Amazon Web Services**
- **Google**
- **Microsoft**
- **Intel**
- **Meta**

Our platform serves a diverse user base, from individual developers and researchers to large enterprises seeking enterprise-grade solutions.

---

### Careers at Hugging Face

Join us on our journey to build the future of AI! We are always on the lookout for talented individuals who are passionate about machine learning and technology. At Hugging Face, you'll have the opportunity to work on cutting-edge AI projects and collaborate with some of the brightest minds in the field.

If you are interested in being part of a forward-thinking team, check our [Careers Page](#) for current job openings.

---

### Join the AI Revolution

Whether you're an investor, a potential recruit, or a customer wanting to explore ML solutions, Hugging Face welcomes you. Join our vibrant community and help shape the future of artificial intelligence!

**Get Started: [Sign Up Here](#)**

---

For more information, visit our website or connect with us on social media:

- [GitHub](#)
- [Twitter](#)
- [LinkedIn](#)
- [Discord](#)

**Hugging Face**: The AI community building the future. Together, we can make a difference!

In [26]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [27]:
stream_brochure("HuggingFace", "https://huggingface.co")


# Hugging Face Brochure

## Welcome to Hugging Face

### The AI Community Building the Future

At Hugging Face, we are redefining the landscape of Artificial Intelligence. Our platform serves as a collaborative hub for machine learning enthusiasts, researchers, and professionals to engage with state-of-the-art models, datasets, and applications.

---

### What We Offer

- **Models**: Explore and utilize over **400,000 models** ranging from NLP to computer vision and 3D generation.
- **Datasets**: Access and share **100,000+ datasets** designed for various ML tasks.
- **Spaces**: An interactive platform for showcasing AI demos and applications.
- **Enterprise Solutions**: Customizable offerings with advanced security features for organizations integrating AI into their operations.

---

### Our Customers

Hugging Face is a trusted partner to over **50,000 organizations**, including leading names like:

- **Amazon Web Services**
- **Google**
- **Microsoft**
- **Meta AI**
- **Intel**

We offer tailored solutions that support enterprises in leveraging AI for their specific needs.

---

### Company Culture

- **Community-Driven**: We prioritize collaboration and open-source principles. Our mission is to democratize AI.
- **Innovative Environment**: We're dedicated to continuous learning and pushing the boundaries of what's possible with AI technology. 
- **Diverse Workforce**: We value different perspectives and experiences, which helps us foster creativity and innovation in our projects.

---

### Careers at Hugging Face

Join a team that is shaping the future of AI! We offer:

- *Career Development*: Opportunities to grow and lead within the community.
- *Flexible Work Environment*: Embrace a healthy work-life balance.
- *Impactful Work*: Contribute to projects that have the potential to change industries and improve lives.

### Open Positions

We are always on the lookout for passionate individuals in the following areas:

- Machine Learning Engineering
- Data Science
- Software Development
- Community Management

Explore our [Jobs Page](https://huggingface.co/jobs) for current openings!

---

### Connect With Us

Join our thriving community on:

- [GitHub](https://github.com/huggingface)
- [Twitter](https://twitter.com/huggingface)
- [LinkedIn](https://www.linkedin.com/company/huggingface)
- [Discord](https://discord.com/invite/huggingface)

---

### Join Us

Experience the future of AI and become a part of the Hugging Face community! 

### Contact Us

For inquiries about our services or partnerships, please visit our [Contact Page](https://huggingface.co/contact).

---

**Hugging Face** - Building a future where AI is accessible to all.


Feel free to adjust any sections or add additional details as needed!