<a href="https://colab.research.google.com/github/yoseforaz0990/ML-templates/blob/main/clustering/k_means_clustering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

| Step                                              | Explanation                                                                                                                     |
|---------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------|
| 1. Importing necessary libraries                 | We import the required libraries, including NumPy, Matplotlib, and KMeans from scikit-learn, for performing the clustering task. |
| 2. The Elbow Method for finding the optimal number of clusters | We use the Elbow Method to determine the optimal number of clusters for K-Means. The Elbow Method helps us identify the point where adding more clusters does not significantly decrease the Within-Cluster Sum of Squares (WCSS). We plot the WCSS against the number of clusters and look for the "elbow" point on the plot. |
| 3. Training the K-Means model                    | After determining the optimal number of clusters from the Elbow Method, we create a KMeans instance with that number of clusters and train it on the dataset using the fit method. We initialize the centroids using the 'k-means++' method for better convergence. |
| 4. Visualizing the clusters                       | Finally, we plot the clusters and their centroids on a 2D scatter plot to visualize how the data points have been grouped into different clusters. Each cluster is represented with a different color, and the centroids are marked with yellow color. |


In [None]:
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

# Assuming you have your data stored in a variable 'X'

# Using the elbow method to find the optimal number of clusters
wcss = []
for i in range(1, 11):
    kmeans = KMeans(n_clusters=i, init='k-means++', random_state=42)
    kmeans.fit(X)
    wcss.append(kmeans.inertia_)

# Plotting the elbow method graph
plt.plot(range(1, 11), wcss)
plt.title('The Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')  # Within-cluster sum of squares
plt.show()

# Training the K-Means model on the dataset with the chosen number of clusters
kmeans = KMeans(n_clusters=5, init='k-means++', random_state=42)
y_kmeans = kmeans.fit_predict(X)

# Visualizing the clusters
plt.scatter(X[y_kmeans == 0, 0], X[y_kmeans == 0, 1], s=100, c='red', label='Cluster 1')
plt.scatter(X[y_kmeans == 1, 0], X[y_kmeans == 1, 1], s=100, c='blue', label='Cluster 2')
plt.scatter(X[y_kmeans == 2, 0], X[y_kmeans == 2, 1], s=100, c='green', label='Cluster 3')
plt.scatter(X[y_kmeans == 3, 0], X[y_kmeans == 3, 1], s=100, c='cyan', label='Cluster 4')
plt.scatter(X[y_kmeans == 4, 0], X[y_kmeans == 4, 1], s=100, c='magenta', label='Cluster 5')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=300, c='yellow', label='Centroids')
plt.title('Clusters of customers')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend()
plt.show()
