#**Theoretical Questions**

### **1. What is unsupervised learning in the context of machine learning?**

  - Unsupervised learning involves training models on data **without labeled outputs**. The goal is to discover **patterns**, **groupings**, or **structure** in the data (e.g., clustering, dimensionality reduction).

---

### **2. How does the K-Means clustering algorithm work?**

  - K-Means works as follows:

    1. Choose `k` initial cluster centroids randomly.
    2. Assign each data point to the nearest centroid.
    3. Update centroids as the mean of all assigned points.
    4. Repeat steps 2 and 3 until convergence (no significant change in centroids).

---

### **3. Explain the concept of a dendrogram in hierarchical clustering.**

  - A dendrogram is a **tree-like diagram** that shows the **merging or splitting** of clusters in **hierarchical clustering**. The height of each merge represents the **distance** or **dissimilarity** between clusters.

---

### **4. What is the main difference between K-Means and Hierarchical Clustering?**

  * **K-Means**: Requires predefining `k`; uses centroid-based partitioning.
  * **Hierarchical**: Builds a tree (dendrogram); does not require predefined `k`.

---

### **5. What are the advantages of DBSCAN over K-Means?**

  * Can find **arbitrarily shaped** clusters.
  * Doesn’t require specifying number of clusters.
  * Can **identify outliers** (noise).

---

### **6. When would you use Silhouette Score in clustering?**

  - To **evaluate clustering quality** by measuring how similar a point is to its own cluster vs. other clusters. Used for choosing the **optimal number of clusters**.

---

### **7. What are the limitations of Hierarchical Clustering?**

  * **Computationally expensive** for large datasets.
  * **Irreversible**: once merged/split, can't be undone.
  * Sensitive to **noise** and **outliers**.

---

### **8. Why is feature scaling important in clustering algorithms like K-Means?**

  - Because K-Means uses **distance metrics** (like Euclidean), and features with large scales can dominate clustering results. Scaling ensures **fair influence**.

---

### **9. How does DBSCAN identify noise points?**

  - DBSCAN labels a point as **noise** if it's **not a core point** and is **not within the neighborhood** of any core point (based on `eps` and `min_samples`).

---

### **10. Define inertia in the context of K-Means.**

  - Inertia is the **sum of squared distances** of samples to their closest cluster center. Lower inertia indicates more compact clusters.

---

### **11. What is the elbow method in K-Means clustering?**

  - A technique to determine optimal `k` by plotting **inertia vs. k**. The “elbow” point is where inertia reduction slows, indicating a good `k`.

---

### **12. Describe the concept of "density" in DBSCAN.**

  - In DBSCAN, density refers to the **number of points** within a given radius (`eps`). High density areas form clusters; low density areas are noise.

---

### **13. Can hierarchical clustering be used on categorical data?**

  - Yes, but only with **appropriate distance measures** (e.g., Hamming distance). Standard linkage methods assume numerical data.

---

### **14. What does a negative Silhouette Score indicate?**

  - It indicates that the point is **likely misclassified**, being **closer to a different cluster** than its own.

---

### **15. Explain the term "linkage criteria" in hierarchical clustering.**

  - Linkage criteria determine how distances between clusters are calculated. Common types: **single**, **complete**, **average**, and **ward**.

---

### **16. Why might K-Means clustering perform poorly on data with varying cluster sizes or densities?**

  - Because it assumes **equal-sized, spherical clusters**. Varying sizes or densities violate these assumptions, causing misclassification.

---

### **17. What are the core parameters in DBSCAN, and how do they influence clustering?**

  * **`eps`**: radius around a point.
  * **`min_samples`**: minimum points in `eps` to form a core point.
  These determine **cluster formation** and **noise identification**.

---

### **18. How does K-Means++ improve upon standard K-Means initialization?**

  - K-Means++ selects **initial centroids more strategically** to be far apart, leading to **faster convergence** and **better clustering**.

---

### **19. What is agglomerative clustering?**

  - A bottom-up hierarchical clustering method where **each point starts as its own cluster**, and clusters are **merged iteratively**.

---

### **20. What makes Silhouette Score a better metric than just inertia for model evaluation?**

  - Silhouette Score considers **both cohesion and separation**, while inertia only considers **within-cluster distance**. Hence, it's a **more holistic** measure.


#**Practical Questions**


In [None]:
#21. Generate synthetic data with 4 centers using make_blobs and apply K-Means clustering. Visualize using a scatter plot.

import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans

# Step 1: Generate synthetic data with 4 centers
X, y_true = make_blobs(n_samples=400, centers=4, cluster_std=0.60, random_state=42)

# Step 2: Apply K-Means clustering
kmeans = KMeans(n_clusters=4, random_state=42)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)

# Step 3: Visualize the clusters
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1],
            c='red', s=200, alpha=0.75, marker='X', label='Centroids')
plt.title('K-Means Clustering on Synthetic Data')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.grid(True)
plt.show()


In [None]:
#22.  Load the Iris dataset and use Agglomerative Clustering to group the data into 3 clusters. Display the first 10 predicted labels.

from sklearn.datasets import load_iris
from sklearn.cluster import AgglomerativeClustering

# Step 1: Load the Iris dataset
iris = load_iris()
X = iris.data  # Features

# Step 2: Apply Agglomerative Clustering with 3 clusters
agg_clustering = AgglomerativeClustering(n_clusters=3)
labels = agg_clustering.fit_predict(X)

# Step 3: Display the first 10 predicted labels
print("First 10 predicted labels:", labels[:10])


In [None]:
#23. Generate synthetic data using make_moons and apply DBSCAN. Highlight outliers in the plo

import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from sklearn.cluster import DBSCAN

# Step 1: Generate synthetic moon-shaped data
X, _ = make_moons(n_samples=300, noise=0.1, random_state=42)

# Step 2: Apply DBSCAN clustering
dbscan = DBSCAN(eps=0.2, min_samples=5)
labels = dbscan.fit_predict(X)

# Step 3: Plot results
plt.figure(figsize=(8, 6))

# Assign color: -1 is noise (outlier), others are cluster labels
unique_labels = set(labels)
colors = ['red' if label == -1 else plt.cm.Set1(label / max(unique_labels)) for label in labels]

# Plot each point with its color
for i in range(len(X)):
    plt.scatter(X[i, 0], X[i, 1], color=colors[i], edgecolor='k', s=50)

plt.title("DBSCAN Clustering with Outliers Highlighted")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.grid(True)
plt.show()



In [None]:
#24. Load the Wine dataset and apply K-Means clustering after standardizing the features. Print the size of each cluster.

from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
import numpy as np

# Step 1: Load the Wine dataset
wine = load_wine()
X = wine.data

# Step 2: Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Step 3: Apply K-Means clustering
kmeans = KMeans(n_clusters=3, random_state=42)
labels = kmeans.fit_predict(X_scaled)

# Step 4: Print the size of each cluster
unique, counts = np.unique(labels, return_counts=True)
cluster_sizes = dict(zip(unique, counts))

print("Cluster sizes:")
for cluster_id, size in cluster_sizes.items():
    print(f"Cluster {cluster_id}: {size} samples")


In [None]:
#25. Use make_circles to generate synthetic data and cluster it using DBSCAN. Plot the result.

import matplotlib.pyplot as plt
from sklearn.datasets import make_circles
from sklearn.cluster import DBSCAN

# Step 1: Generate circular synthetic data
X, _ = make_circles(n_samples=500, factor=0.5, noise=0.05, random_state=0)

# Step 2: Apply DBSCAN clustering
dbscan = DBSCAN(eps=0.2, min_samples=5)
labels = dbscan.fit_predict(X)

# Step 3: Plot the result
plt.figure(figsize=(8, 6))

# Assign unique colors to each cluster, and black to noise
unique_labels = set(labels)
for label in unique_labels:
    label_mask = (labels == label)
    color = 'k' if label == -1 else plt.cm.Set1(label / max(unique_labels))
    plt.scatter(X[label_mask, 0], X[label_mask, 1],
                label=f'Cluster {label}' if label != -1 else 'Noise',
                s=50, edgecolor='k', c=[color])

plt.title("DBSCAN Clustering on make_circles Data")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.legend()
plt.grid(True)
plt.show()


In [None]:
#26. Load the Breast Cancer dataset, apply MinMaxScaler, and use K-Means with 2 clusters. Output the cluster centroids.

from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans
import pandas as pd

# Step 1: Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data

# Step 2: Apply MinMaxScaler
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

# Step 3: Apply K-Means with 2 clusters
kmeans = KMeans(n_clusters=2, random_state=42)
kmeans.fit(X_scaled)

# Step 4: Output cluster centroids
centroids = kmeans.cluster_centers_
centroids_df = pd.DataFrame(centroids, columns=data.feature_names)

print("Cluster Centroids (scaled features):")
print(centroids_df)


In [None]:
# 27. Generate synthetic data using make_blobs with varying cluster standard deviations and cluster with DBSCAN.

from sklearn.datasets import make_blobs
from sklearn.cluster import DBSCAN
import matplotlib.pyplot as plt

X, _ = make_blobs(n_samples=500, centers=3, cluster_std=[0.2, 1.0, 2.5], random_state=42)

db = DBSCAN(eps=0.5, min_samples=5)
labels = db.fit_predict(X)

plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', edgecolor='k')
plt.title("DBSCAN with Varying Cluster Standard Deviations")
plt.show()


In [None]:
# 28. Load the Digits dataset, reduce it to 2D using PCA, and visualize clusters from K-Means.

from sklearn.datasets import load_digits
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

digits = load_digits()
X = digits.data

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

kmeans = KMeans(n_clusters=10, random_state=42)
labels = kmeans.fit_predict(X_pca)

plt.scatter(X_pca[:, 0], X_pca[:, 1], c=labels, cmap='tab10', s=50)
plt.title("K-Means Clusters on Digits Dataset (PCA)")
plt.show()


In [None]:
# 29. Create synthetic data using make_blobs and evaluate silhouette scores for k = 2 to 5. Display as a bar chart.

from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import matplotlib.pyplot as plt

X, _ = make_blobs(n_samples=400, centers=4, cluster_std=0.6, random_state=0)

scores = []
ks = range(2, 6)
for k in ks:
    kmeans = KMeans(n_clusters=k, random_state=42)
    labels = kmeans.fit_predict(X)
    scores.append(silhouette_score(X, labels))

plt.bar(ks, scores, color='skyblue')
plt.title("Silhouette Scores for K = 2 to 5")
plt.xlabel("Number of Clusters (k)")
plt.ylabel("Silhouette Score")
plt.show()


In [None]:
# 30. Load the Iris dataset and use hierarchical clustering to group data. Plot a dendrogram with average linkage.

from sklearn.datasets import load_iris
from scipy.cluster.hierarchy import dendrogram, linkage
import matplotlib.pyplot as plt

iris = load_iris()
X = iris.data

linkage_matrix = linkage(X, method='average')

plt.figure(figsize=(10, 5))
dendrogram(linkage_matrix, labels=iris.target)
plt.title("Hierarchical Clustering Dendrogram (Average Linkage)")
plt.xlabel("Sample Index")
plt.ylabel("Distance")
plt.show()


In [None]:
# 31. Generate synthetic data with overlapping clusters using make_blobs, then apply K-Means and visualize with decision boundaries.

from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import numpy as np

X, _ = make_blobs(n_samples=300, centers=3, cluster_std=1.5, random_state=42)
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)
labels = kmeans.predict(X)

# Plot decision boundaries
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
                     np.arange(y_min, y_max, 0.02))
Z = kmeans.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, alpha=0.2)
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', edgecolor='k')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1],
            s=200, c='red', marker='X')
plt.title("K-Means with Decision Boundaries")
plt.show()


In [None]:
# 32. Load the Digits dataset and apply DBSCAN after reducing dimensions with t-SNE. Visualize the results.

from sklearn.datasets import load_digits
from sklearn.manifold import TSNE
from sklearn.cluster import DBSCAN
import matplotlib.pyplot as plt

digits = load_digits()
X = digits.data

X_tsne = TSNE(n_components=2, random_state=42).fit_transform(X)
db = DBSCAN(eps=3, min_samples=5)
labels = db.fit_predict(X_tsne)

plt.scatter(X_tsne[:, 0], X_tsne[:, 1], c=labels, cmap='tab10', s=40)
plt.title("DBSCAN on Digits (t-SNE Reduced)")
plt.show()


In [None]:
# 33. Generate synthetic data using make_blobs and apply Agglomerative Clustering with complete linkage. Plot the result.

from sklearn.datasets import make_blobs
from sklearn.cluster import AgglomerativeClustering
import matplotlib.pyplot as plt

X, _ = make_blobs(n_samples=300, centers=4, random_state=42)

agg = AgglomerativeClustering(n_clusters=4, linkage='complete')
labels = agg.fit_predict(X)

plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='tab10', edgecolor='k')
plt.title("Agglomerative Clustering with Complete Linkage")
plt.show()


In [None]:
# 34. Load the Breast Cancer dataset and compare inertia values for K = 2 to 6 using K-Means. Show results in a line plot.

from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

X = load_breast_cancer().data
X_scaled = StandardScaler().fit_transform(X)

inertias = []
ks = range(2, 7)
for k in ks:
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(X_scaled)
    inertias.append(kmeans.inertia_)

plt.plot(ks, inertias, marker='o')
plt.title("Inertia for K = 2 to 6 (Breast Cancer Dataset)")
plt.xlabel("Number of Clusters")
plt.ylabel("Inertia")
plt.grid(True)
plt.show()


In [None]:
# 35. Generate synthetic concentric circles using make_circles and cluster using Agglomerative Clustering with single linkage.

from sklearn.datasets import make_circles
from sklearn.cluster import AgglomerativeClustering
import matplotlib.pyplot as plt

X, _ = make_circles(n_samples=400, noise=0.05, factor=0.5, random_state=0)

agg = AgglomerativeClustering(n_clusters=2, linkage='single')
labels = agg.fit_predict(X)

plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='coolwarm', edgecolor='k')
plt.title("Agglomerative Clustering with Single Linkage on Circles")
plt.show()


In [None]:
# 36. Use the Wine dataset, apply DBSCAN after scaling the data, and count the number of clusters (excluding noise)

from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import DBSCAN

X = load_wine().data
X_scaled = StandardScaler().fit_transform(X)

db = DBSCAN(eps=1.5, min_samples=5)
labels = db.fit_predict(X)

n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
print("Number of clusters (excluding noise):", n_clusters)


In [None]:
# 37. Generate synthetic data with make_blobs and apply KMeans. Then plot the cluster centers on top of the data points.

from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

X, _ = make_blobs(n_samples=300, centers=3, random_state=42)
kmeans = KMeans(n_clusters=3, random_state=42)
labels = kmeans.fit_predict(X)

plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', edgecolor='k')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1],
            c='red', s=200, marker='X', label='Centroids')
plt.legend()
plt.title("KMeans Clustering with Cluster Centers")
plt.show()


In [None]:
# 38. Load the Iris dataset, cluster with DBSCAN, and print how many samples were identified as noise.

from sklearn.datasets import load_iris
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler

X = load_iris().data
X_scaled = StandardScaler().fit_transform(X)

db = DBSCAN(eps=0.6, min_samples=5)
labels = db.fit_predict(X_scaled)

n_noise = list(labels).count(-1)
print("Number of noise samples:", n_noise)


In [None]:
# 39. Generate synthetic non-linearly separable data using make_moons, apply K-Means, and visualize the clustering result.

from sklearn.datasets import make_moons
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

X, _ = make_moons(n_samples=300, noise=0.1, random_state=0)
kmeans = KMeans(n_clusters=2, random_state=42)
labels = kmeans.fit_predict(X)

plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='Accent', edgecolor='k')
plt.title("K-Means on make_moons (Non-linear Data)")
plt.show()


In [None]:
# 40. Load the Digits dataset, apply PCA to reduce to 3 components, then use KMeans and visualize with a 3D scatter plot.

from sklearn.datasets import load_digits
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

X = load_digits().data
pca = PCA(n_components=3)
X_3d = pca.fit_transform(X)

kmeans = KMeans(n_clusters=10, random_state=42)
labels = kmeans.fit_predict(X_3d)

fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')
sc = ax.scatter(X_3d[:, 0], X_3d[:, 1], X_3d[:, 2], c=labels, cmap='tab10', s=40)
plt.title("3D PCA + KMeans on Digits Dataset")
plt.show()


In [None]:
# 41. Generate synthetic blobs with 5 centers and apply KMeans. Then use silhouette_score to evaluate the clustering.

from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

X, _ = make_blobs(n_samples=500, centers=5, cluster_std=0.6, random_state=42)
kmeans = KMeans(n_clusters=5, random_state=42)
labels = kmeans.fit_predict(X)

score = silhouette_score(X, labels)
print("Silhouette Score for 5-cluster KMeans:", score)


In [None]:
# 42. Load the Breast Cancer dataset, reduce dimensionality using PCA, and apply Agglomerative Clustering. Visualize in 2D.

from sklearn.datasets import load_breast_cancer
from sklearn.decomposition import PCA
from sklearn.cluster import AgglomerativeClustering
import matplotlib.pyplot as plt

X = load_breast_cancer().data
X_pca = PCA(n_components=2).fit_transform(X)

agg = AgglomerativeClustering(n_clusters=2)
labels = agg.fit_predict(X_pca)

plt.scatter(X_pca[:, 0], X_pca[:, 1], c=labels, cmap='coolwarm', edgecolor='k')
plt.title("Agglomerative Clustering on Breast Cancer (PCA Reduced)")
plt.xlabel("PCA 1")
plt.ylabel("PCA 2")
plt.show()


In [None]:
# 43. Generate noisy circular data using make_circles and visualize clustering results from KMeans and DBSCAN side-by-side.

from sklearn.datasets import make_circles
from sklearn.cluster import KMeans, DBSCAN
import matplotlib.pyplot as plt

X, _ = make_circles(n_samples=500, noise=0.05, factor=0.5, random_state=0)

kmeans = KMeans(n_clusters=2, random_state=42).fit(X)
dbscan = DBSCAN(eps=0.2, min_samples=5).fit(X)

plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.title("KMeans Clustering")
plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_, cmap='Accent', edgecolor='k')

plt.subplot(1, 2, 2)
plt.title("DBSCAN Clustering")
plt.scatter(X[:, 0], X[:, 1], c=dbscan.labels_, cmap='Accent', edgecolor='k')

plt.tight_layout()
plt.show()


In [None]:
# 44. Load the Iris dataset and plot the Silhouette Coefficient for each sample after KMeans clustering.

from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_samples
import matplotlib.pyplot as plt
import numpy as np

X = load_iris().data
kmeans = KMeans(n_clusters=3, random_state=42).fit(X)
labels = kmeans.labels_
sil_samples = silhouette_samples(X, labels)

plt.bar(range(len(sil_samples)), sil_samples, color='skyblue')
plt.title("Silhouette Coefficient for Each Sample (Iris Dataset)")
plt.xlabel("Sample Index")
plt.ylabel("Silhouette Coefficient")
plt.show()


In [None]:
# 45. Generate synthetic data using make_blobs and apply Agglomerative Clustering with 'average' linkage. Visualize clusters.

from sklearn.datasets import make_blobs
from sklearn.cluster import AgglomerativeClustering
import matplotlib.pyplot as plt

X, _ = make_blobs(n_samples=400, centers=4, random_state=42)

agg = AgglomerativeClustering(n_clusters=4, linkage='average')
labels = agg.fit_predict(X)

plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='Set1', edgecolor='k')
plt.title("Agglomerative Clustering (Average Linkage)")
plt.show()


In [None]:
# 46. Load the Wine dataset, apply KMeans, and visualize the cluster assignments in a seaborn pairplot (first 4 features).

from sklearn.datasets import load_wine
from sklearn.cluster import KMeans
import pandas as pd
import seaborn as sns

data = load_wine()
X = pd.DataFrame(data.data[:, :4], columns=data.feature_names[:4])

kmeans = KMeans(n_clusters=3, random_state=42)
X['Cluster'] = kmeans.fit_predict(data.data)

sns.pairplot(X, hue='Cluster', palette='Set1', corner=True)
plt.suptitle("KMeans Clustering (First 4 Features of Wine Dataset)", y=1.02)
plt.show()


In [None]:
# 47. Generate noisy blobs using make_blobs and use DBSCAN to identify both clusters and noise points. Print the count.

from sklearn.datasets import make_blobs
from sklearn.cluster import DBSCAN

X, _ = make_blobs(n_samples=400, centers=3, cluster_std=1.2, random_state=42)

db = DBSCAN(eps=0.9, min_samples=5)
labels = db.fit_predict(X)

n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
n_noise = list(labels).count(-1)

print("Number of clusters found:", n_clusters)
print("Number of noise points:", n_noise)


In [None]:
# 48. Load the Digits dataset, reduce dimensions using t-SNE, then apply Agglomerative Clustering and plot the clusters.

from sklearn.datasets import load_digits
from sklearn.manifold import TSNE
from sklearn.cluster import AgglomerativeClustering
import matplotlib.pyplot as plt

X = load_digits().data
X_tsne = TSNE(n_components=2, random_state=42).fit_transform(X)

agg = AgglomerativeClustering(n_clusters=10)
labels = agg.fit_predict(X_tsne)

plt.scatter(X_tsne[:, 0], X_tsne[:, 1], c=labels, cmap='tab10', s=40, edgecolor='k')
plt.title("Agglomerative Clustering on Digits (t-SNE Reduced)")
plt.show()
