# Unsupervised Learning: Types of Unsupervised Learning

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans, DBSCAN, AgglomerativeClustering, BisectingKMeans
from sklearn.datasets import make_blobs, make_moons, make_circles
from sklearn.metrics import silhouette_score, rand_score
from scipy.cluster.hierarchy import dendrogram, linkage
import pandas as pd
import seaborn as sns

## What is Unsupervised Learning?

**Quick recap of supervised vs unsupervised:**
- **Supervised Learning**: You have labeled data (like studying for a test with an answer key)
  - Example: Classifying emails as spam or not spam when you already know which is which
  
- **Unsupervised Learning**: You have data but NO labels (like organizing your closet without instructions)
  - Example: Grouping customers by shopping habits when you don't know the groups ahead of time

### Why is it called "Unsupervised"?
Because there's no "teacher" telling the algorithm what the right answer is! The algorithm must find patterns on its own.

### Common Applications:
Often thought of as **"clustering" algorithms** used for data exploration:
- Spotify: Grouping songs into playlists by similarity
- Amazon: Finding customer segments for targeted marketing
- Biology: Classifying species based on genetic data
- News: Organizing articles into topic categories
- Gaming: Grouping players by playstyle

### Fun Example:
Imagine you're organizing a school party and need to group students into teams. You don't know who should be with whom, but you want:
- Friends to be together
- People with similar interests together
- Balanced teams

That's clustering!

## Centroid-Based Clustering

Also known as **partitioning methods**.

### The Big Idea:
Imagine you're planning a pizza party and placing pizza stations (centroids) around the school so that:
- Everyone can reach the closest station
- The stations aren't too close to each other

### Common Structure (All Centroid Methods):
1. Initialize k centroids randomly
2. Assign each point to the nearest centroid
3. Update centroids based on current group members
4. Repeat until centroids stop moving

### Goals:
- **Minimize within-cluster distance** (group members close together)
- **Maximize between-cluster distance** (groups separated)

### Visualization of the Clustering Process

In [None]:
# Create simple dataset
np.random.seed(42)
X_demo, _ = make_blobs(n_samples=150, centers=3, cluster_std=0.6)

fig, axes = plt.subplots(2, 2, figsize=(14, 12))
axes = axes.ravel()

# Step 1
axes[0].scatter(X_demo[:, 0], X_demo[:, 1], c='gray', s=80, edgecolors='black', alpha=0.6)
axes[0].set_title('Step 1: Raw Data')

# Step 2
initial_centroids = X_demo[np.random.choice(len(X_demo), 3, replace=False)]
axes[1].scatter(X_demo[:, 0], X_demo[:, 1], c='gray', s=80, edgecolors='black', alpha=0.6)
axes[1].scatter(initial_centroids[:, 0], initial_centroids[:, 1], c='red', marker='*', s=300, edgecolors='black')
axes[1].set_title('Step 2: Random Centroids')

# Step 3
kmeans_demo = KMeans(n_clusters=3, init=initial_centroids, n_init=1, max_iter=1)
labels_step3 = kmeans_demo.fit_predict(X_demo)
axes[2].scatter(X_demo[:, 0], X_demo[:, 1], c=labels_step3, cmap='viridis', s=80, edgecolors='black')
axes[2].set_title('Step 3: Assign Points')

# Step 4
kmeans_final = KMeans(n_clusters=3, random_state=42)
labels_final = kmeans_final.fit_predict(X_demo)
axes[3].scatter(X_demo[:, 0], X_demo[:, 1], c=labels_final, cmap='viridis', s=80, edgecolors='black')
axes[3].scatter(kmeans_final.cluster_centers_[:, 0], kmeans_final.cluster_centers_[:, 1], c='red', marker='*', s=300)
axes[3].set_title('Step 4: Final Centroids')

plt.tight_layout()
plt.show()

## K-Means Clustering

### What Makes K-Means Special?
It uses the **mean** of each cluster to define its center.

### The Algorithm:
1. Choose k
2. Randomly place k centroids
3. Assign points to nearest centroid
4. Update centroid = mean of cluster
5. Repeat until stable

### Pros:
- Fast and scalable
- Easy to understand

### Cons:
- Must choose k
- Sensitive to outliers
- Assumes circular clusters

### Example: Clustering Students by Study Habits

In [None]:
# Create student study data
np.random.seed(42)
X_students, true_labels = make_blobs(n_samples=200, centers=3, cluster_std=1.0)

# Scale features
X_students[:, 0] = X_students[:, 0] * 5 + 20
X_students[:, 1] = X_students[:, 1] * 10 + 75

kmeans = KMeans(n_clusters=3, random_state=42)
labels = kmeans.fit_predict(X_students)
centers = kmeans.cluster_centers_

plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.scatter(X_students[:, 0], X_students[:, 1], c='gray')
plt.title('Before Clustering')

plt.subplot(1, 2, 2)
plt.scatter(X_students[:, 0], X_students[:, 1], c=labels, cmap='viridis')
plt.scatter(centers[:, 0], centers[:, 1], c='red', marker='*', s=300)
plt.title('After K-Means')
plt.show()

### Q1: Understanding K-Means
a. What happens if you change k from 3 to 4?
b. What does the mean represent?
c. When might K-Means fail?

## Density-Based Clustering: DBSCAN

### Key Concepts:
- **Core points**: Points with many neighbors nearby
- **Border points**: Points near edge of cluster
- **Noise**: Outliers

### Parameters:
- **eps**: Neighborhood radius
- **min_samples**: Minimum points to form dense region

### Advantages:
- No need to choose k
- Finds arbitrary shapes
- Identifies noise

### Disadvantages:
- Sensitive to eps
- Struggles with varying densities

In [None]:
# Moon data
X_moons, _ = make_moons(n_samples=300, noise=0.05)

labels_kmeans = KMeans(n_clusters=2).fit_predict(X_moons)
labels_dbscan = DBSCAN(eps=0.2, min_samples=5).fit_predict(X_moons)

fig, axes = plt.subplots(1, 3, figsize=(15, 5))
axes[0].scatter(X_moons[:, 0], X_moons[:, 1], c='gray')
axes[0].set_title('Original Data')

axes[1].scatter(X_moons[:, 0], X_moons[:, 1], c=labels_kmeans)
axes[1].set_title('K-Means')

axes[2].scatter(X_moons[:, 0], X_moons[:, 1], c=labels_dbscan)
axes[2].set_title('DBSCAN')

plt.show()

## Connectivity-Based Clustering (Hierarchical Clustering)

### Big Idea:
Builds nested groups (like a family tree)

### Two types:
- **Agglomerative** (bottom-up)
- **Divisive** (top-down)

### What is a Dendrogram?
A tree showing how clusters merge as distance increases.

## Agglomerative Clustering

### Algorithm:
1. Start with each point as own cluster
2. Merge closest pair
3. Repeat
4. Choose number of clusters by cutting tree

### Linkage methods:
- Single linkage
- Complete linkage
- Average linkage

## Divisive Clustering (Top-Down)
- Start with one big cluster
- Repeatedly split using Bisecting K-Means
- Useful for hierarchical structure

## Evaluation of Clustering Methods

### Silhouette Score
- Measures how close points are to their own cluster vs other clusters
- Ranges from -1 to 1

### Rand Index
- Compares two clusterings
- 1 = perfect match

In [None]:
X_eval, _ = make_blobs(n_samples=300, centers=3)
scores = []
for k in range(2,7):
    labels = KMeans(n_clusters=k).fit_predict(X_eval)
    scores.append(silhouette_score(X_eval, labels))

plt.plot(range(2,7), scores, marker='o')
plt.title('Silhouette Score by K')
plt.xlabel('K')
plt.ylabel('Score')
plt.show()

## Comparison of Methods

| Algorithm | Strengths | Weaknesses | Best Use |
|----------|-----------|------------|----------|
| K-Means | Fast, simple | Assumes spherical clusters | Balanced circular clusters |
| DBSCAN | Finds irregular shapes, detects noise | Sensitive to eps | Arbitrary shapes, noise |
| Bisecting K-Means | Hierarchical, efficient | Assumes spherical clusters | When structure matters |

In [None]:
X_compare, _ = make_circles(n_samples=400, factor=0.5, noise=0.05)

labels_k = KMeans(n_clusters=2).fit_predict(X_compare)
labels_d = DBSCAN(eps=0.2, min_samples=5).fit_predict(X_compare)
labels_b = BisectingKMeans(n_clusters=2).fit_predict(X_compare)

fig, axes = plt.subplots(1, 3, figsize=(15,5))
axes[0].scatter(X_compare[:,0], X_compare[:,1], c=labels_k)
axes[0].set_title('K-Means')

axes[1].scatter(X_compare[:,0], X_compare[:,1], c=labels_d)
axes[1].set_title('DBSCAN')

axes[2].scatter(X_compare[:,0], X_compare[:,1], c=labels_b)
axes[2].set_title('Bisecting K-Means')

plt.show()

## Discussion
1. Which algorithm handled circular shapes best?
2. Which struggled and why?
3. What happens when noise is added?
4. Which algorithm seems most flexible?