## Implemented text summarization using llama3 (ollama) and Beautiful Soup (Web scraping).

In [1]:
# > pip install ollama 
# > ollama pull llama3

In [2]:
# import library

import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
import ollama

In [3]:
# A class to represent a Webpage.

class Website:
    """
    A utility class to represent a Website that we have scraped
    """
    url: str
    title: str
    text: str

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [4]:
ed = Website("https://www.cdc.gov/")
print(ed.title)
print(ed.text)

Centers for Disease Control and Prevention | CDC
Skip to site content
Skip to search
An official website of the United States government.
Here's how you know
Español
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS
A
lock
(
) or
https://
means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.
Español
Search
Close
Health Topics
Outbreaks
About CDC
Health Topics
Health Topics
Adult Vaccinations
Alzheimer’s Disease
COVID-19
Diabetes
Flu
H5 Bird Flu
Handwashing
Hantavirus
Healthy Weight
High Blood Pressure
Lyme Disease
Measles
Overdose Prevention
Preventing Dengue
Quit Smoking
Respiratory Syncytial Virus Infection (RSV)
Strep Throat
Travelers' Health
Strep Throat
A-Z Index
outbreaks
Outbreaks
Supplement Shakes
Listeria
Outbreak
Geckos
Salmonella
Outbreak
Marburg Outbreak in Rwanda
Marburg Virus Disease
Measles
2025 Outbreaks
Cinnamon Applesa

### Two Types of prompts:

**1. System prompt** that tells them what task they are performing and what tone they should use

**2. User prompt** -- the conversation starter that they should reply to

In [5]:
# Define system prompt

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary (within 50 words), ignoring text that might be navigation related. \
Respond in markdown."

In [6]:
# Function of User Prompt for summaries of websites

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "The contents of this website is as follows; \
please provide a short summary (within 50 words) of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [7]:
# Function (Ollama API)

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [8]:
# Call the Ollama function
MODEL = "llama3"
def summarize(url):
    website = Website(url)
    messages = messages_for(website)
    response = ollama.chat(model=MODEL, messages=messages)
    return response['message']['content']

In [9]:
summarize("https://www.cdc.gov/")

"Here is a summary of the website in markdown:\n\n**Summary**\n\nThe Centers for Disease Control and Prevention (CDC) is the nation's leading science-based, data-driven, service organization that protects the public's health. The website provides information on various health topics such as adult vaccinations, Alzheimer's disease, COVID-19, diabetes, flu, H5 Bird Flu, handwashing, and more.\n\n**News**\n\n* CDC Statement on Measles Outbreak (Feb 27)\n* CDC Reports Nearly 24% Decline in U.S. Drug Overdose Deaths (Feb 25)\n* Listeria outbreak linked to supplement shakes distributed in long-term care facilities (Jan 10)\n\n**Journals**\n\n* MMWR: a weekly epidemiological digest\n* EID: a monthly peer-reviewed journal on infectious diseases\n* PCD: a peer-reviewed journal on chronic diseases\n\n**About CDC**\n\n* Mission and Org Charts\n* Leadership\n* Pressroom\n* Organization\n* Funding and Grants\n* Careers at CDC"

In [10]:
# A function for displaying content in Markdown format.

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [11]:
display_summary("https://www.cdc.gov/")

Here is a summary of the website in markdown:

**Centers for Disease Control and Prevention (CDC)**
=====================



### Featured Topics

* Prevent Norovirus
* Good Nutrition Starts Early
* Respiratory Illnesses
* First Aid for Seizures
* Fatigue and Work
* Health Benefits of Sleep
* Building Healthy Habits


### News

* CDC Statement on Measles Outbreak (February 27)
* CDC Reports Nearly 24% Decline in U.S. Drug Overdose Deaths (February 25)
* CDC warns of Listeria outbreak linked to supplement shakes distributed in long-term care facilities (January 10)

And more news articles...

### About CDC

* Mission and Org Charts
* Leadership
* Pressroom
* Organization
* Lab Safety

This website provides information on various health topics, outbreaks, and news related to the Centers for Disease Control and Prevention (CDC).

In [12]:
display_summary("https://www.cdc.gov/diabetes/prevention-type-2/index.html")

Here is a summary of the website content within 50 words:
**Preventing Type 2 Diabetes**: Learn about prediabetes, its risk factors, and how to reverse it with lifestyle changes. Take a 1-minute risk test to find out if you're at risk. The CDC-recognized National Diabetes Prevention Program (National DPP) offers a lifestyle change program to prevent or delay type 2 diabetes.

In [13]:
display_summary("https://www.cdc.gov/diabetes/prevention-type-2/truth-about-prediabetes.html")

Here is a summary of the website within 50 words, ignoring navigation-related text:
**The Surprising Truth About Prediabetes**

Over 1 in 3 US adults has prediabetes. It's a major risk factor for type 2 diabetes and can increase the risk of heart disease and stroke. Lifestyle changes can stop or slow the development of type 2 diabetes. Talk to your doctor about getting your blood sugar tested.

#### Acknowledge: This exercise is from the Udemy ("LLM Engineering") course.