# Crear un Folleto de el sitio web completo

Crear un programa que cree un folleto para una compañia y este pueda ser usado para clientes, inversores o reclutadores potenciales.



In [1]:
import requests
import re
from urllib.parse import urlparse
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from langchain_ollama.llms import OllamaLLM


In [2]:
# Misma clase que archivo de la carpeta intro, con una funcionalidad extra
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "Company Name"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""

        # Creación de una lista que contiene todos los links, utilizando find_all para obtener todas las equitetas<a href=""></a> de html
        links = [link.get('href') for link in soup.find_all('a')]

        # asignando dicha lista al atributo links de cada instancia de clase, solo incluyendo aquellos que empiecen con `/` o `https`
        links = [link for link in links if link and (link.startswith('/') or link.startswith('https'))]
        for i, elem in enumerate(links):
            if elem.startswith('/'):
                links[i] = url + elem
        self.links = links


    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [3]:
langchain = Website("https://www.langchain.com")
langchain.links

['https://www.langchain.com/',
 'https://www.langchain.com/langchain',
 'https://www.langchain.com/langsmith',
 'https://www.langchain.com/langgraph',
 'https://www.langchain.com/retrieval',
 'https://www.langchain.com/agents',
 'https://www.langchain.com/evaluation',
 'https://blog.langchain.dev/',
 'https://www.langchain.com/customers',
 'https://academy.langchain.com/',
 'https://www.langchain.com/community',
 'https://www.langchain.com/experts',
 'https://changelog.langchain.com/',
 'https://www.langchain.com/testing-guide-ebook',
 'https://www.langchain.com/stateofaiagents',
 'https://www.langchain.com/breakoutagents',
 'https://python.langchain.com/docs/introduction/',
 'https://docs.smith.langchain.com/',
 'https://langchain-ai.github.io/langgraph/tutorials/introduction/',
 'https://js.langchain.com/docs/introduction/',
 'https://docs.smith.langchain.com/',
 'https://langchain-ai.github.io/langgraphjs/tutorials/quickstart/',
 'https://www.langchain.com/about',
 'https://www.lang

### Seleccion de links por medio de codigo

El definir una manera de 'filtrar' los links mas relevantes para la construccion del folleto, utilizando solo codigo, es posible, sin embargo no tan sencillo, incluso dependiendo de la estructura del sitio web, el rendimiento puede variar

La manera mas sencilla de realizar esto es por medio de un 'filtro' el cual por medio de la aparacion de las palabra clave mas comunes, o que nos interesen, en la estructura de los URL, tome los resultados mas relevantes

In [4]:
def filter_relevant_links(url):
    link_list = Website(url).links
    # Define common keywords in the URL
    # print(f"from {len(link_list)} urls")
    common_keywords = ['company', 'about', 'contact', 'support', 'team', 'careers']
    # Calculate a relevance score based on the number of common keywords found in the URL path
    scores = {link: sum(1 for keyword in common_keywords if re.search(r'\b' + keyword + r'\b', urlparse(link).path)) for link in link_list}
    # print(scores)
    # Sort links by their relevance score and filter out top-scoring links until we reach a certain number of links
    num_relevant_links = int(len(link_list) * 0.2)  # Calculate the percentage of relevant links to keep
    sorted_scores = sorted(scores.items(), key=lambda x: x[1], reverse=True)
    filtered_links = [link for link, score in sorted_scores[:num_relevant_links]]
    # print(f"to {len(filtered_links)} urls")
    return filtered_links

In [5]:
print(filter_relevant_links('https://www.langchain.com'))

['https://blog.langchain.dev/llms-accelerate-adyens-support-team-through-smart-ticket-routing-and-support-agent-copilot/', 'https://www.langchain.com/about', 'https://www.langchain.com/careers', 'https://www.langchain.com/contact-sales', 'https://www.linkedin.com/company/langchain/', 'https://www.langchain.com/', 'https://www.langchain.com/langchain', 'https://www.langchain.com/langsmith', 'https://www.langchain.com/langgraph', 'https://www.langchain.com/retrieval', 'https://www.langchain.com/agents', 'https://www.langchain.com/evaluation', 'https://blog.langchain.dev/', 'https://www.langchain.com/customers', 'https://academy.langchain.com/', 'https://www.langchain.com/community', 'https://www.langchain.com/experts', 'https://changelog.langchain.com/']


## Creacion del Folleto

In [6]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = filter_relevant_links(url)
    print("Found links:", links)
    for link in links:
        
        result += f"{link}\n"
        result += Website(link).get_contents()
    return result

In [7]:
print(get_all_details("https://www.langchain.com"))

Found links: ['https://blog.langchain.dev/llms-accelerate-adyens-support-team-through-smart-ticket-routing-and-support-agent-copilot/', 'https://www.langchain.com/about', 'https://www.langchain.com/careers', 'https://www.langchain.com/contact-sales', 'https://www.linkedin.com/company/langchain/', 'https://www.langchain.com/', 'https://www.langchain.com/langchain', 'https://www.langchain.com/langsmith', 'https://www.langchain.com/langgraph', 'https://www.langchain.com/retrieval', 'https://www.langchain.com/agents', 'https://www.langchain.com/evaluation', 'https://blog.langchain.dev/', 'https://www.langchain.com/customers', 'https://academy.langchain.com/', 'https://www.langchain.com/community', 'https://www.langchain.com/experts', 'https://changelog.langchain.com/']
Landing page:
Webpage Title:
LangChain
Webpage Contents:
Products
LangChain
LangSmith
LangGraph
Methods
Retrieval
Agents
Evaluation
Resources
Blog
Customer Stories
LangChain Academy
Community
Experts
Changelog
LLM Evaluation

In [8]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."


In [9]:
def get_brochure_user_prompt(website):
    user_prompt = f"You are looking at a company called: {website.title}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(website.url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [10]:
get_brochure_user_prompt(langchain)

Found links: ['https://blog.langchain.dev/llms-accelerate-adyens-support-team-through-smart-ticket-routing-and-support-agent-copilot/', 'https://www.langchain.com/about', 'https://www.langchain.com/careers', 'https://www.langchain.com/contact-sales', 'https://www.linkedin.com/company/langchain/', 'https://www.langchain.com/', 'https://www.langchain.com/langchain', 'https://www.langchain.com/langsmith', 'https://www.langchain.com/langgraph', 'https://www.langchain.com/retrieval', 'https://www.langchain.com/agents', 'https://www.langchain.com/evaluation', 'https://blog.langchain.dev/', 'https://www.langchain.com/customers', 'https://academy.langchain.com/', 'https://www.langchain.com/community', 'https://www.langchain.com/experts', 'https://changelog.langchain.com/']


"You are looking at a company called: LangChain\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nLangChain\nWebpage Contents:\nProducts\nLangChain\nLangSmith\nLangGraph\nMethods\nRetrieval\nAgents\nEvaluation\nResources\nBlog\nCustomer Stories\nLangChain Academy\nCommunity\nExperts\nChangelog\nLLM Evaluations Guide\nState of AI Agents\nBreakout Agent Stories\nDocs\nPython\nLangChain\nLangSmith\nLangGraph\nJavaScript\nLangChain\nLangSmith\nLangGraph\nCompany\nAbout\nCareers\nPricing\nLangSmith\nLangGraph Platform\nGet a demo\nSign up\nProducts\nLangChain\nLangSmith\nLangGraph\nMethods\nRetrieval\nAgents\nEvaluation\nResources\nBlog\nCustomer Stories\nLangChain Academy\nCommunity\nExperts\nChangelog\nLLM Evaluations Guide\nState of AI Agents\nBreakout Agent Stories\nDocs\nPython\nLangChain\nLangSmith\nLangGraph\nJavaScript\nLangChain\nLangSmith\nLangGraph\nCompan

In [11]:
def messages_for_LLM(system_prompt, website):
    return [
        {"role": "system", "content": system_prompt}, #configuracion del system prompt
        {"role": "user", "content": get_brochure_user_prompt(website)} #configuracion del input de usuario con los datos de la pagina web
    ]

In [12]:
messages_for_LLM(system_prompt, langchain)

Found links: ['https://blog.langchain.dev/llms-accelerate-adyens-support-team-through-smart-ticket-routing-and-support-agent-copilot/', 'https://www.langchain.com/about', 'https://www.langchain.com/careers', 'https://www.langchain.com/contact-sales', 'https://www.linkedin.com/company/langchain/', 'https://www.langchain.com/', 'https://www.langchain.com/langchain', 'https://www.langchain.com/langsmith', 'https://www.langchain.com/langgraph', 'https://www.langchain.com/retrieval', 'https://www.langchain.com/agents', 'https://www.langchain.com/evaluation', 'https://blog.langchain.dev/', 'https://www.langchain.com/customers', 'https://academy.langchain.com/', 'https://www.langchain.com/community', 'https://www.langchain.com/experts', 'https://changelog.langchain.com/']


[{'role': 'system',
  'content': 'You are an assistant that analyzes the contents of several relevant pages from a company website and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.Include details of company culture, customers and careers/jobs if you have the information.'},
 {'role': 'user',
  'content': "You are looking at a company called: LangChain\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nLangChain\nWebpage Contents:\nProducts\nLangChain\nLangSmith\nLangGraph\nMethods\nRetrieval\nAgents\nEvaluation\nResources\nBlog\nCustomer Stories\nLangChain Academy\nCommunity\nExperts\nChangelog\nLLM Evaluations Guide\nState of AI Agents\nBreakout Agent Stories\nDocs\nPython\nLangChain\nLangSmith\nLangGraph\nJavaScript\nLangChain\nLangSmith\nLangGraph\nCompany\nAbout\nCareers\nPricing\nLangSmith\

In [13]:
def create_brochure(model, url):
    website = Website(url)
    brochure_content = model.invoke(messages_for_LLM(system_prompt, website))
    display(Markdown(brochure_content))

In [14]:
# Instanciar LLM
model = OllamaLLM(model="qwen2.5-coder:0.5b")

create_brochure(model, "https://www.langchain.com")

Found links: ['https://blog.langchain.dev/llms-accelerate-adyens-support-team-through-smart-ticket-routing-and-support-agent-copilot/', 'https://www.langchain.com/about', 'https://www.langchain.com/careers', 'https://www.langchain.com/contact-sales', 'https://www.linkedin.com/company/langchain/', 'https://www.langchain.com/', 'https://www.langchain.com/langchain', 'https://www.langchain.com/langsmith', 'https://www.langchain.com/langgraph', 'https://www.langchain.com/retrieval', 'https://www.langchain.com/agents', 'https://www.langchain.com/evaluation', 'https://blog.langchain.dev/', 'https://www.langchain.com/customers', 'https://academy.langchain.com/', 'https://www.langchain.com/community', 'https://www.langchain.com/experts', 'https://changelog.langchain.com/']


---

# LangChain: The Leading AI Platform for Developers

## Overview

LangChain is a comprehensive suite of software tools designed for building with Language Models (LLMs). It offers developers an array of services across different applications, from basic text generation to complex reasoning and machine learning. With LangChain's platform, you can create AI-powered applications that are both efficient and secure.

---

## Services

### Products

#### General
- **LangSmith:** The orchestration framework for controllable agentic workflows.
- **LangGraph:** The API-driven user experience framework featuring human-in-the-loop, multi-agent collaboration, conversation history, long-term memory, and time-travel.
- **LLM Evaluations Guide:** Comprehensive documentation on how to use LangChain's evaluation tools.

#### AI
- **Agents:** AI algorithms that can reason and generate human-like text.
- **Retrieval:** Tools for extracting relevant information from large amounts of data.

#### Evaluation
- **Evaluation Tools:** Metrics and reports to measure the performance of AI applications.

#### Resources
- **Docs:** Comprehensive documentation on LangChain's tools and best practices.
- **Community:** Join our community forums, join our Slack channel, or contact us for support.

---

## Customer Stories

**Companies Built with LangChain**

1. **Elastic AI Assistant (ELA)**: A leading solution for building highly secure AI applications.
2. **Adaptive Chatbot**: A platform that enables users to interact with the chatbot without explicit consent, enhancing user privacy and trust.
3. **Customer Experience Improvement Tools (CEITT)**: A suite of tools to improve the accuracy and performance of existing LLMs.
4. **Healthcare AI**: A platform for building intelligent medical applications using LLMs.

---

## Community

**LangChain** is a community-driven project, fostering collaboration between developers, researchers, and industry experts. Follow us on GitHub for updates and discussions, as well as our official social media channels for news and support.

---

## Expertise and Trends

- **Open-source**: LangChain is open-source, with a strong focus on innovation and community engagement.
- **LLM Frameworks**: LangChain uses several LLM frameworks, such as LangChain's itself, Langsmith's platform, and LangGraph, making it versatile for different types of applications.
- **AI in GenAI**: LangChain has played a significant role in driving the development of genAI applications, especially those with high security requirements.

---

## Legal Considerations

**Legal Compliance**: Ensure that your company complies with relevant laws and regulations related to AI and data protection.
- **Data Privacy**: Protect user data by implementing appropriate encryption practices and ensuring compliance with GDPR and CCPA regulations.
- **Security**: Regularly update and patch LangChain, as well as LangSmith's platform, to address security vulnerabilities.

---

## Partnerships

**Industry Partnerships**: Partner with AI experts, technology companies, and other key players to enhance the capabilities of LangChain.
- **AI Accelerators**: Collaborate with AI accelerators, such as IBM Watson and Microsoft Azure, to leverage their expertise in advanced AI models.

---

## Conclusion

LangChain is a powerful platform for developers looking to build efficient, secure, and reliable AI applications. By leveraging its robust suite of products and services, you can achieve rapid development, scalability, and better performance. Join us on the road to creating your own custom LLM solutions that drive innovation and efficiency in today's digital landscape.