K-means clustering aims to partition n observations into K clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.
The result of a cluster analysis shown below as the coloring of the squares into three clusters.
Given a training set of observations:
Where each observation is a d-dimensional real vector, k-means clustering aims to partition the m observations into K (≤ m) clusters:
... so as to minimize the within-cluster sum of squares (i.e. variance).
Below you may find an example of 4 random cluster centroids initialization and further clusters convergence:
Another illustration of k-means convergence:
Cost Function (Distortion)
In this case optimization objective will look like the following:
Randomly initialize K cluster centroids (randomly pick K training examples and set K cluster centroids to that examples).