# Clustering

Clustering is a type of unsupervised learning that groups similar data points together. 
Unsupervised learning refers to the process of training a model on data without labeled responses (i.e. you do not know the $y$ for the input $x$).

Because of this, clustering is often used for exploratory data analysis (EDA for short), 
which is the process of analyzing datasets to summarize their main statistical characteristics
and pave the way for further modeling.

Today we will cover two clustering algorithms:
- K-Means
- Gaussian Mixture Models (GMMs)

## 1. K-Means

In a nutshell, K-means is a clustering algorithm that partitions data into $k$ clusters, 
where each data point is assigned to the cluster with the nearest mean.

The [formal definition](https://en.wikipedia.org/wiki/K-means_clustering) goes:

_Given a set of observations $(x_1, x_2, ..., x_n)$, $k$-means clustering aims to partition the $n$ observations into $k$ ($\leq n$) sets $\mathbf{S} = \{S_1, S_2, ..., S_k\}$ so as to minimize the within-cluster sum of squares (WCSS) (i.e. variance)._

$$
\arg \min_{\mathbf{S}} \sum_{i=1}^{k} \sum_{\mathbf{x} \in S_i} \|\mathbf{x} - \boldsymbol{\mu}_i\|^2
$$

_where $\boldsymbol{\mu}_i$ is the mean (also called centroid) of points in $S_i$, i.e._

$$\boldsymbol{\mu}_i = \frac{1}{|S_i|} \sum_{\mathbf{x} \in S_i} \mathbf{x},$$

*$|S_i|$ is the size of $S_i$, and $\| \cdot \|$ is the usual $L^2$ norm.*

K-means can be easily used by `scikit-learn`.
Let us use our Iris dataset to implement a demo for K-means clustering.