* The Goal of this application of ollama solution is to keep data on my box while checking website content, summerizing it for me

In [32]:
# imports

import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
import ollama

In [33]:
# Constants
MODEL = "llama3.2"

In [34]:
# A class to represent a Webpage

class Website:
    """
    A utility class to represent a Website that we have scraped
    """
    url: str
    title: str
    text: str

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [35]:
# Let's try one out

ed = Website("https://edwarddonner.com")
print(ed.title)
print(ed.text)

Home - Edward Donner
Home
Connect Four
Outsmart
An arena that pits LLMs against each other in a battle of diplomacy and deviousness
About
Posts
Well, hi there.
I’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (
very
amateur) and losing myself in
Hacker News
, nodding my head sagely to things I only half understand.
I’m the co-founder and CTO of
Nebula.io
. We’re applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I’m previously the founder and CEO of AI startup untapt,
acquired in 2021
.
We work with groundbreaking, proprietary LLMs verticalized for talent, we’ve
patented
our matching model, and our award-winning platform has happy customers and tons of press coverage.
Connec

In [36]:
# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish."

system_prompt = "You are a data scientist that analyzes the contents of a website \
and provides a short summary froma data science perspective on what models,\
model sturcture and data was used for any given research, and provide actionable solution where applicable.\
Your overall aim is the create something better than what the author suggested\
ignoring text that might be navigation related. \
Respond in markdown."

In [37]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "The contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [38]:
# See how this function creates exactly the format above

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [39]:
# And now: call the Ollama function instead of OpenAI

def summarize(url):
    website = Website(url)
    messages = messages_for(website)
    response = ollama.chat(model=MODEL, messages=messages)
    return response['message']['content']

In [40]:
# A function to display this nicely in the Jupyter output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [41]:
display_summary("https://medium.com/@nedwinvivek/a-practical-guide-to-tackling-clustering-problems-in-machine-learning-6cf39dff2c36")

**Summary of the Website**

The website provides a practical guide to tackling clustering problems in machine learning. The author, Edwin Vivek, presents a step-by-step approach to effectively apply clustering techniques.

**Key Points**

* Clustering is an unsupervised learning technique used to uncover hidden patterns in data.
* The key success criteria for clustering models include:
	+ Is the model interpretable?
	+ Does the clustering output provide value to the business?
	+ Has the process revealed new insights or uncovered patterns in the data that were previously unknown?
* The article uses the popular "Mall Customer Segmentation" dataset from Kaggle to explore how clustering techniques can be applied to segment customers based on their shopping behavior.

**Methodology**

1. **Understanding the Domain and Problem Context**: Frame the problem correctly, define the objective clearly, and pin down the unit of analysis.
2. **Data Preparation**: Handle missing values, skewness, outliers, feature scaling/encoding, and selecting the right features.
3. **Choosing the Right Clustering Algorithm**: Visualize the data using dimensionality reduction tools like PCA, t-SNE, or UMAP to get a feel for the distribution and shape of clusters.
4. **Evaluating Clusters**: Use visualization, evaluation metrics (e.g., Elbow Method, Silhouette Score, Davies-Bouldin Index), and business interpretability to decide what works best for the dataset.
5. **Interpreting Clusters**: Conduct statistical analysis, visual exploration, cluster size balance, feature-level insights, and create labels based on traits.

**Results**

The author performs clustering using K-Means with 3 clusters initially but finds that it doesn't provide meaningful results. After re-running the model with different numbers of clusters (5 and 7), they observe:

* Silhouette scores range from 0.41 to 0.42, indicating moderate separation.
* Some clusters have poor cluster separation or high variability.
* The author identifies potential churn risk in Cluster 1.

**Actionable Insights**

Based on the analysis, the author recommends:

* Targeted marketing strategies to minimize customer churn and reduce overall churn rates.
* Upselling campaigns for high-spending customers.
* Executing targeted promotions.

**Conclusion**

Clustering is an art and science that requires careful preprocessing, the right analytical tools, and domain knowledge. The article provides a practical guide to tackling clustering problems in machine learning, highlighting the importance of interpretation and actionable strategies.