# Unsupervised Learning: Types of Unsupervised Learning

Unsupervised learning is all about discovering hidden structure in unlabeled data. In this notebook, we'll explore several different clustering approaches and see how each one handles different shapes and patterns in data.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans, DBSCAN, AgglomerativeClustering, BisectingKMeans
from sklearn.datasets import make_blobs, make_moons, make_circles
from sklearn.metrics import silhouette_score, rand_score
from scipy.cluster.hierarchy import dendrogram, linkage
import pandas as pd
import seaborn as sns

## What is Unsupervised Learning?

Unsupervised learning deals with data that has **no labels**, meaning the algorithm must discover structure on its own. Instead of learning by example (like supervised learning), unsupervised methods identify patterns, groupings, or relationships based only on the data itself.

### Supervised vs Unsupervised: A Quick Recap
- **Supervised Learning:** You have inputs *and* outputs — like a study guide with an answer key. The model learns to map features to known labels.
- **Unsupervised Learning:** You only have inputs — like taking notes without knowing what the exam questions will be. The model looks for structure or grouping on its own.

### Why "Unsupervised"?
There’s no teacher, no correct answers provided ahead of time. The algorithm must uncover meaningful clusters or patterns entirely from the data structure.

### Common Real-World Uses
- Spotify: Groups similar songs into playlists
- Retail: Identifying customer segments for marketing
- Biology: Grouping gene expression profiles
- News: Clustering articles by topics
- Gaming: Detecting different play styles

### Intuition Example
Imagine sorting students into teams for an event without any categories given. You might group by behavior, interests, or who hangs out together — that's clustering!

## Centroid-Based Clustering

Centroid-based clustering groups data points according to their similarity to a central representative called a **centroid**. This centroid acts as the "center of mass" for the cluster.

### The Big Idea
Suppose you're organizing a pizza party. You want to place pizza stations (centroids) in a way that minimizes how far students must walk. Students naturally go to the station closest to them. Over time, stations would shift until each one is in the ideal location.

Clustering works similarly:
- Pick the number of clusters (number of pizza stations)
- Assign each data point to its nearest centroid
- Adjust centroids based on cluster members
- Repeat until stable

### Key Goals
- **Minimize distances within clusters** (points in the same cluster should be close)
- **Maximize distances between clusters** (clusters should be well-separated)

This makes centroid-based clustering intuitive and computationally efficient.

### Visualization of the Clustering Process

To get an intuition for how centroid-based methods work, let's walk through the process visually using K-Means as an example.

In [None]:
np.random.seed(42)
X_demo, _ = make_blobs(n_samples=150, centers=3, cluster_std=0.6, random_state=42)

fig, axes = plt.subplots(2, 2, figsize=(14, 12))
axes = axes.ravel()

axes[0].scatter(X_demo[:, 0], X_demo[:, 1], c='gray', s=80, alpha=0.6, edgecolors='black')
axes[0].set_title('Step 1: Raw Data (No Labels!)')

initial_centroids = X_demo[np.random.choice(len(X_demo), 3, replace=False)]
axes[1].scatter(X_demo[:, 0], X_demo[:, 1], c='gray', s=80, alpha=0.6, edgecolors='black')
axes[1].scatter(initial_centroids[:, 0], initial_centroids[:, 1], c='red', s=400, marker='*', edgecolors='black')
axes[1].set_title('Step 2: Random Initial Centroids')

kmeans_demo = KMeans(n_clusters=3, init=initial_centroids, n_init=1, max_iter=1, random_state=42)
labels_step3 = kmeans_demo.fit_predict(X_demo)
axes[2].scatter(X_demo[:, 0], X_demo[:, 1], c=labels_step3, s=80, cmap='viridis', alpha=0.6, edgecolors='black')
axes[2].set_title('Step 3: Assign Points to Nearest Centroid')

kmeans_final = KMeans(n_clusters=3, random_state=42)
labels_final = kmeans_final.fit_predict(X_demo)
axes[3].scatter(X_demo[:, 0], X_demo[:, 1], c=labels_final, s=80, cmap='viridis', alpha=0.6, edgecolors='black')
axes[3].scatter(kmeans_final.cluster_centers_[:, 0], kmeans_final.cluster_centers_[:, 1], c='red', s=400, marker='*', edgecolors='black')
axes[3].set_title('Step 4: Updated Centroids (Converged)')

plt.tight_layout()
plt.show()

## K-Means Clustering

K-Means is the most widely used clustering algorithm because it's fast, intuitive, and works well when clusters are roughly spherical.

### What Makes K-Means Unique?
K-Means uses the **mean** (average position) of points in a cluster as the centroid.

### The K-Means Algorithm
1. Pick k
2. Randomly place centroids
3. Assign points
4. Update centroids
5. Repeat until stable

### Example: Clustering Students by Study Habits

In [None]:
np.random.seed(42)
n_students = 200
X_students, true_labels = make_blobs(n_samples=n_students, centers=3, cluster_std=1.0, random_state=42)
X_students[:, 0] = X_students[:, 0] * 5 + 20
X_students[:, 1] = X_students[:, 1] * 10 + 75
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
labels = kmeans.fit_predict(X_students)
centers = kmeans.cluster_centers_

plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.scatter(X_students[:, 0], X_students[:, 1], c='gray', s=80, alpha=0.6, edgecolors='black')
plt.title('Before K-Means')

plt.subplot(1, 2, 2)
plt.scatter(X_students[:, 0], X_students[:, 1], c=labels, cmap='viridis', s=80, alpha=0.6, edgecolors='black')
plt.scatter(centers[:, 0], centers[:, 1], c='red', s=500, marker='*', edgecolors='black')
plt.title('After K-Means')

plt.show()

### Q1: Understanding K-Means
a. What happens if you increase k from 3 to 4?
b. What does a cluster center (mean) represent?
c. Name a scenario where K-Means might perform poorly.

### A:
YOUR ANSWER HERE

## Density-Based Clustering: DBSCAN

DBSCAN groups points by density rather than distance to centroids.

### Core, Border, Noise Points
- **Core:** many nearby points
- **Border:** near a core but not dense
- **Noise:** isolated

In [None]:
X_moons, _ = make_moons(n_samples=300, noise=0.05, random_state=42)
kmeans_moons = KMeans(n_clusters=2, random_state=42)
labels_kmeans = kmeans_moons.fit_predict(X_moons)

dbscan = DBSCAN(eps=0.2, min_samples=5)
labels_dbscan = dbscan.fit_predict(X_moons)

fig, axes = plt.subplots(1, 3, figsize=(16, 5))
axes[0].scatter(X_moons[:, 0], X_moons[:, 1], c='gray', s=80, edgecolors='black')
axes[0].set_title('Original Data')

axes[1].scatter(X_moons[:, 0], X_moons[:, 1], c=labels_kmeans, cmap='viridis', s=80, edgecolors='black')
axes[1].set_title('K-Means (Incorrect)')

axes[2].scatter(X_moons[:, 0], X_moons[:, 1], c=labels_dbscan, cmap='viridis', s=80, edgecolors='black')
axes[2].set_title('DBSCAN (Correct)')

plt.show()

### Q2: Understanding DBSCAN
a. Why does DBSCAN work better than K-Means on moon shapes?
b. What happens if eps is too small or too large?
c. Give a real-world dataset where DBSCAN excels.

### A:
YOUR ANSWER HERE

## Connectivity-Based Clustering (Hierarchical Clustering)

Hierarchical clustering builds a **tree-like structure** called a dendrogram.

### Approaches
- **Agglomerative:** bottom-up merging
- **Divisive:** top-down splitting

## Agglomerative Clustering

### Linkage Criteria
- **Single linkage:** smallest distance
- **Complete linkage:** largest distance
- **Average linkage:** average pairwise distance
- **Ward’s method:** minimizes variance

Average linkage gives a balanced view of cluster similarity by looking at *all* pairwise distances rather than extremes.

## Divisive Clustering (Bisecting K-Means)

## Evaluation of Clustering

### Silhouette Score
Measures cohesion vs separation.

### Rand Index
Compares clustering to known labels.

In [None]:
np.random.seed(42)
X_eval, _ = make_blobs(n_samples=300, centers=3, cluster_std=0.7, random_state=42)
scores = []

for k in range(2, 7):
    kmeans_eval = KMeans(n_clusters=k, random_state=42)
    labels = kmeans_eval.fit_predict(X_eval)
    scores.append(silhouette_score(X_eval, labels))

plt.plot(range(2, 7), scores, marker='o')
plt.title('Silhouette Score vs K')
plt.xlabel('k')
plt.ylabel('Score')
plt.show()

## Comparison of Methods

| Algorithm | Strengths | Weaknesses | Best Use |
|----------|-----------|-------------|----------|
| K-Means | Fast, easy | Bad for irregular shapes | Simple clusters |
| DBSCAN | Arbitrary shapes, finds noise | Sensitive to eps | Irregular/noisy data |
| Bisecting K-Means | Hierarchy, efficient | Still centroid-based | When hierarchy needed |

In [None]:
X_compare, _ = make_circles(n_samples=400, factor=0.5, noise=0.05, random_state=42)

labels_k = KMeans(n_clusters=2, random_state=42).fit_predict(X_compare)
labels_d = DBSCAN(eps=0.2).fit_predict(X_compare)
labels_b = BisectingKMeans(n_clusters=2, random_state=42).fit_predict(X_compare)

fig, axes = plt.subplots(1, 3, figsize=(15, 5))
axes[0].scatter(X_compare[:, 0], X_compare[:, 1], c=labels_k)
axes[0].set_title('K-Means')

axes[1].scatter(X_compare[:, 0], X_compare[:, 1], c=labels_d)
axes[1].set_title('DBSCAN')

axes[2].scatter(X_compare[:, 0], X_compare[:, 1], c=labels_b)
axes[2].set_title('Bisecting K-Means')

plt.show()

## Discussion
1. Which algorithm handles circles well?
2. Which struggles?
3. Why does DBSCAN succeed?
4. How does noise affect each algorithm?