Upgrade the website summarizer project to summarize a webpage to use an Open Source model running locally via Ollama rather than OpenAI

**Benefits:**
1. No API charges - open-source
2. Data doesn't leave your box

**Disadvantages:**
1. Significantly less power than Frontier Model

## Recap on installation of Ollama

Simply visit [ollama.com](https://ollama.com) and install!

Once complete, the ollama server should already be running locally.  
If you visit:  
[http://localhost:11434/](http://localhost:11434/)

You should see the message `Ollama is running`.  

If not, bring up a new Terminal (Mac) or Powershell (Windows) and enter `ollama serve`  
Then try [http://localhost:11434/](http://localhost:11434/) again.

In [17]:
#imports

import requests
import ollama
from bs4 import BeautifulSoup
from IPython.display import Markdown, display

In [19]:
MODEL="llama3.2"

In [23]:
class Website:
    """
    A utility class to represent a Website that we have scrapped
    """
    url: str
    title:str
    text: str

    def __init__(self,url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url=url
        response=requests.get(url)
        soup=BeautifulSoup(response.content,'html.parser')
        self.title=soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script","style","img","input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n",strip=True)

In [24]:
# ed=Website("https://www.scirp.org/journal/paperinformation?paperid=31320")
ed=Website("https://www.hostinger.in/tutorials/what-is-ollama")
print(ed.title)
print(ed.text)

What is Ollama? Introduction to the AI model management tool
WordPress
VPS
Website Development
eCommerce
Website Errors
How to Make a Website
search
VPS
VPS for Web Devs
Nov 05, 2024
Ariffud M.
6min                      Read
What is Ollama? Understanding how it works, main features and models
Copy link
Copied!
Ollama is an open-source tool that runs large language models (LLMs) directly on a local machine. This makes it particularly appealing to AI developers, researchers, and businesses concerned with data control and privacy.
By running models locally, you maintain full data ownership and avoid the potential security risks associated with cloud storage. Offline AI tools like Ollama also help reduce latency and reliance on external servers, making them faster and more reliable.
This article will explore Ollama’s key features, supported models, and practical use cases. By the end, you’ll be able to determine if this LLM tool suits your AI-based projects and needs.
Download ChatGPT chea

## Types of prompts

You may know this already - but if not, you will get very familiar with it!

Models like GPT4o have been trained to receive instructions in a particular way.

They expect to receive:

**A system prompt** that tells them what task they are performing and what tone they should use

**A user prompt** -- the conversation starter that they should reply to

In [25]:
# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish."

system_prompt="You are an assistant that analyzes the contents of a website \
    and provide a short summary, ignoring text that might be navigation related. \
        Respond in markdown"

In [26]:
# A function that writes a User Prompt that asks for summaries of websites:
# See how this function creates exactly the format above

def user_prompt_for(website):
    user_prompt=f"You are looking at a website titled {website.title}"
    user_prompt +="The contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt +=website.text
    return user_prompt

## Messages

The API from Ollama expects to receive messages in format as OpenAI

```
[
    {"role": "system", "content": "system message goes here"},
    {"role": "user", "content": "user message goes here"}
]

In [27]:
# See how this function creates exactly the format above
def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

## Time to bring it together - now with Ollama instead of OpenAI 

In [29]:
# And now: call the OpenAI API. You will get very familiar with this!
def summarize(url):
    website=Website(url)
    
    response=ollama.chat(
        model=model,
        messages=messages_for(website)
    )
    return response['message']['content']

In [30]:
summarize("https://www.hostinger.in/tutorials/what-is-ollama")

"Ollama is an open-source, locally hosted AI solution that allows users to run large language models (LLMs) on their own machines, providing a more private and cost-effective alternative to cloud-based AI services.\n\n**Benefits of using Ollama:**\n\n1. **Enhanced privacy and data security**: Ollama keeps sensitive data on local machines, reducing the risk of exposure through third-party cloud providers.\n2. **No reliance on cloud services**: Businesses maintain complete control over their infrastructure without relying on external cloud providers.\n3. **Customization flexibility**: Ollama lets developers and researchers tweak models according to specific project requirements.\n4. **Offline access**: Running AI models locally means you can work without internet access.\n5. **Cost savings**: By eliminating the need for cloud infrastructure, users avoid recurring costs related to cloud storage, data transfer, and usage fees.\n\n**Who is Ollama suitable for?**\n\n1. Developers and researc

In [31]:
# A function to display this nicely in the Jupyter output, using markdown
def display_summary(url):
    summary=summarize(url)
    display(Markdown(summary))

In [None]:
display_summary("https://en.wikipedia.org/wiki/Large_language_model")

# Let's try more websites

Note that this will only work on websites that can be scraped using this simplistic approach.

Websites that are rendered with Javascript, like React apps, won't show up. See the community-contributions folder for a Selenium implementation that gets around this. You'll need to read up on installing Selenium (ask ChatGPT!)

Also Websites protected with CloudFront (and similar) may give 403 errors - many thanks Andy J for pointing this out.

But many websites will work just fine!

In [None]:
display_summary("https://www.aajtak.in/")

In [None]:
display_summary("https://anthropic.com")