# Using llama 3.2 to build a website summarizer that runs on local machine

In [1]:
# imports

import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display

In [2]:
# Constants

OLLAMA_API = "http://localhost:11434/api/chat"
HEADERS = {"Content-Type": "application/json"}
MODEL = "llama3.2:1b"

In [9]:
# Downloads the Llama 3.2 model (1B parameters) from Ollama to local system.
!ollama pull llama3.2:1b

[?25lpulling manifest â ‹ [?25h[?25l[2K[1Gpulling manifest â ™ [?25h[?25l[2K[1Gpulling manifest â ¹ [?25h[?25l[2K[1Gpulling manifest â ¸ [?25h[?25l[2K[1Gpulling manifest â ¼ [?25h[?25l[2K[1Gpulling manifest â ´ [?25h[?25l[2K[1Gpulling manifest â ¦ [?25h[?25l[2K[1Gpulling manifest â § [?25h[?25l[2K[1Gpulling manifest â ‡ [?25h[?25l[2K[1Gpulling manifest 
pulling 74701a8c35f6... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–� 1.3 GB                         
pulling 966de95ca8a6... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–� 1.4 KB                         
pulling fcc5a6bec9da... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–� 7.7 KB                         
pulling a70ff7e570d9... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–� 6.0 KB                         
pulling 4f659a1e86d7... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–�  485 B                         
verifying sha256 digest 
wr

In [3]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [4]:
system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [5]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [6]:
#Creates messages list
def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

#### Calling llama 3.2 using requests API

In [8]:
# Function to create custom payloads
def payload_for(website):
    return {
        "model": MODEL,
        "messages": messages_for(website),
        "stream": False
    }

In [10]:
# And now: calling llama using requests

def summarize(url):
    website = Website(url)
    response = requests.post(OLLAMA_API, json=payload_for(website), headers=HEADERS)
    return response.json()['message']['content']

#### Calling llama 3.2 using ollama package

In [13]:
# Calling llama using ollama package

import ollama
def summarize(url):
    website = Website(url)
    response = ollama.chat(model=MODEL, messages=messages_for(website))
    return response['message']['content']

#### Calling llama 3.2 using Open AI API

In [18]:
# Calling llama using OpenAI's API

from openai import OpenAI
ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')

def summarize(url):
    website = Website(url)
    response = ollama_via_openai.chat.completions.create(
    model=MODEL,
    messages=messages_for(website)
    )
    return response.choices[0].message.content

In [14]:
# Function that calls summarize function and outputs a summary in Markdown format
def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [15]:
display_summary('https://finance.yahoo.com/')

The article discusses the current market trends and outlook, including:

1. **NVIDIA (NVDA) stock**: The company's recent earnings report showed a strong performance, with a 63% increase in quarterly revenue.
2. **Alibaba Group Holding Limited (BABA)**: The Chinese conglomerate's stock was trading at $140.11, up 0.88%, driven by strong demand for its cloud computing and e-commerce platforms.
3. **Dow Jones Industrial Average (DJIA) index**: The overall market sentiment remains cautious, with investors waiting for the Federal Reserve to announce its monetary policy decisions.

However, there are also some concerns:

1. **Tariffs and trade tensions**: Trade wars and tariffs between countries have raised uncertainty about global economic growth.
2. **Global inflation**: Inflation has been rising in various parts of the world, posing challenges for central banks and policymakers.
3. **Cryptocurrency market volatility**: Cryptocurrencies have experienced significant price fluctuations in recent days, affecting investors who hold cryptocurrencies.

To navigate these complexities, it's essential to:

1. **Monitor economic indicators**: Keep an eye on key indicators such as inflation rates, employment numbers, and GDP growth.
2. **Stay informed about trade policies**: Be aware of the ongoing trade tensions between countries and how they might impact your investments.
3. **Diversify your portfolio**: Consider spreading your investments across various asset classes to minimize risks.

Overall, while there are some concerns in the market, the overall trend remains positive, with growth stocks like NVIDIA (NVDA) and Alibaba Group Holding Limited (BABA) continuing to perform well.

In [12]:
display_summary('https://reddit.com')

**Reddit - Dive into Anything**

This subreddit is a platform where users can share and discuss various topics, including news, entertainment, sports, and more. The site has a vast array of content, including articles, discussions, and live streams.

**News and Announcements**

* A Canadian team won the Four Nations hockey tournament, with Connor McDavid scoring the winning goal.
* TruGreen, an official lawn care treatment provider for the PGA TOUR, is promoting its services through a special offer.
* Severance, an Apple TV+ series, has received positive reviews for its unique storyline and performances.

**Popular Communities**

* r/AskMen: A community focused on male relationships and advice.
* r/NBA2k: A subreddit dedicated to basketball video games, with discussions and polls.
* r/MadeMeSmile: A community where users share things that brighten up their day or make them smile.

In [19]:
display_summary('https://www.scrapethissite.com/pages/')

### Learn Web Scraping | Scrape This Site
#### A Public Sandbox for Learning Web Scraping

*   **Sandbox**: A resource where users can practice web scraping by browsing a public database of NHL team stats since 1990.
*   **Lessons**: Provides hands-on experience with web scraping and data extraction from popular websites like Countries of the World, Hockey Teams, Oscar Winning Films, and Turtles All the Way Down. Each lesson aims to cover specific techniques or concepts related to web scraping, such as navigating different website interface components and handling common network errors.
*   **FAQ**: Offers general questions and answers about web scraping, navigation, and other related topics.
*   **Login**: A feature for users who want to access the site's advanced features without exposing their login credentials.