# Unsupervised Learning 

The defining characteristic of this class of learning models, is that they learn purely from a set of observed data $\{x_i\}$ with no labels.

That being said some of the models may have different goals such as: 
1. Generate plausible new samples from the dataset.
2. Manipulate, denoise interpolate or compress instances.

In this chapter we'll discuss the toxonomy of unsupervised learning models, their desired property of these models and how we measure performance. 

In sub-chapters of this file will present specific (Generative) models:
1. Generative adversarial networks (GANs)
2. Normalizing Flows
3. Variational Autoencoders (VAEs)
4. Diffusion Models

These Generative Models can be sub-categorised:

| Generative Type | Data | Goal | Exmaples |
|-----------------|------|------|----------|
| **Explicit Models** | $\{\mathbf{x_i}\}$, i.i.d. sampled from some **unknown** distribution $P(\mathbf{x})$| Estimate the proability function $q(\mathbf{x}) \approx p(\mathbf{x})$ | VAE, Noramlizing Flow|
| **Implicit Model** | $\{\mathbf{x_i}\}$, i.i.d. sampled from some **unknown** distribution $P(\mathbf{x})$ | Generate new samples $\mathbf{x^*}$ ~ $Pr(\mathbf{x})$ | GANs and Diffusion models | 

## Taxonomy of unsupervied Learning models

- Define a mapping between the data examples $\mathbf{x}$ and a set of unseen $latent$ variables $\mathbf{z}$.

$\mathbf{z}$ capture underlying structure in the dataset and usually have a lower dimension than the original data, can be thought of as **compressed version** of $\mathbf{x}$.

This mapping works in both ways meaning: 

1. Some models map from $\mathbf{x} \to \mathbf{z}$ - $k–Means: \mathbf{z} \in \{1, 2, \dots, K\}$
2. Other models map from the latent variables $\mathbf{z}$ to data $\mathbf{x}$ - $Generative Models$

## Desried Properties of Generative Models

|**Property** | **Explanation** |
|-------------|-----------------|
| **Efficient Sampling** | Generating samples from the model should be computationally **Inexpensive** and should make use of GPUs|
| **High Quality Sampling** | The samples should be **indistinguishable** from the real data, which the model was trained |
| **Coverage** | The model should be able to produce samples from the whole training distribution |
| **Well-behaved Latent space**| Every Latent variable $\mathbf{z}$ corresponds to a plausibble data example $\mathbf{x}$ | 
| **Disentangled latent space** | If we manipulate a dimension in $\mathbf{z}$ then it should correspond to some interpertable property of $\mathbf{x^*}$ |
|**Efficient Likelihood Computation** | If the model is probabilistic, then calculating the probabilitiy of new examples should be **accurate** and **efficient**|

Below presents a table of the aformentioned properties pertained in the models to be discussed.

| Model | Efficient | Sample Quality | Coverage | Well-behaved Latent Space | Disentangled Latent Space | Efficient Likelihood |
|------|-----------|-----------------|----------|---------------------------|--------------------------|-----------------------|
| GANs |  Yes | Yes | No | Yes | Unclear | Not Probabilistic | 
| VAEs | Yes | No | Unclear | Yes | Unclear | No | 
| Flows | Yes | No | Unclear | Yes | Unclear | Yes | 
| Diffusion | No | Yes | Unclear | No | No | No |

## Quantifying Performance 

### Test Likelihood

**Why we don't measure on training data** 
It's ineffective to measure the training data likelihood beaucse a model could assing a very high probability to each training point and low on other areas, and of course the model would be able to generate the data it recieved, but this doesn't provide us with a good indication of the models capabilities. 

The test likelihood captures how well the model generalizes from the training data and also the coverage.

This method isn't always revelvant: 
- GANs isn't probailistic 
- VAEs and Diffusion are not effecient in computing the likelihood

### Inception Score

This is a score specialised from images, and ideally for generative models trained on ImageNet database.

We calculate the score using pre-trained classification model (such as Inception model). We pass the generated image to the Inception model:
- The model should be able to classify one of the classes with a high probability. (Quality Sampling)
- Given the generative model created $N$ images, it images it creates should be uniformly distributed across all classes (Coverage Sampling)

$$IS = \exp\left[\frac{1}{I} \sum_{i=1}^ID_{KL}\left[Pr(y | \mathbf{x_i^*}) || Pr(y) \right]\right]$$

This measures the distance between two dstributions

In this case we're measuring the average distance between the classes in which the model is producing the images compared with the general distribution of the classes.

### Fréchet Inception Distance:
This issue with the $D_{KL}$ divergence is that the model, isn't symmetric, meaning $D_{KL}(A || B) \ne D_{KL}( B || A)$ therefore the Fréchet distance, converts this divergence to be symmetrical.

Note: That this measure isn't based on the original data, but rather the deepest activations of the Inception Network.