Find file History
Latest commit 3b89d57 Dec 24, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information. Add K-Means. Dec 21, 2018 Fixing Pylint errors. Dec 23, 2018 Fixing Pylint errors. Dec 24, 2018

K-Means Algorithm

Jupyter Demos

▶️ Demo | K-means Algorithm - split Iris flowers into clusters based on petal_length and petal_width


K-means clustering aims to partition n observations into K clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.

The result of a cluster analysis shown below as the coloring of the squares into three clusters.



Given a training set of observations:

Training set


Where each observation is a d-dimensional real vector, k-means clustering aims to partition the m observations into K (≤ m) clusters:


... so as to minimize the within-cluster sum of squares (i.e. variance).

Below you may find an example of 4 random cluster centroids initialization and further clusters convergence:


Picture Source

Another illustration of k-means convergence:


Cost Function (Distortion)

c-i - index of cluster (1, 2, ..., K) to which example x(i) is currently assigned.

mu-k - cluster centroid k (mu-k-2) and k.

mu-c-i - cluster centroid of a cluster to which the example x(i) has been assigned.

For example:

Cluster example

In this case optimization objective will look like the following:

Cost Function


The Algorithm

Randomly initialize K cluster centroids (randomly pick K training examples and set K cluster centroids to that examples).