# Image Compression

In this challenge we will use **images** as data. We will use **K-means** for image compression, by reducing the colors in an image to only the most frequent ones.

## Images with Python

This section should give you some understanding of how we deal with images in Python.

### Black & white images

An image can be described as a matrix where each element is a pixel.  

The scale of colors is between 0 (black) and 255 (white).  

Thus, each element in the matrix will have a value between 0 and 255.

When dealing with matrix, `numpy` is always a good choice!  

Execute the cell below to see a black and white image created only with `numpy`.

<img src="https://miro.medium.com/max/1386/1*bV7S0zACdidh11ikjYpLpQ.png">

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

In [None]:
# Let's create  a vector of 100 elements with a value between 0 and 255
image_bw = np.random.uniform(low=0, high=255, size=100)

# Reshape it into a squared matrix, we have an image of 10x10 pixels
image_bw = image_bw.reshape((10, 10))

# Finally display the generated image
plt.imshow(image_bw, cmap='gray');

### Colored images :

Every color is a linear combination of primary colors : Red, Green, Blue  

A colored image is just the overlay of three layers, one for each primary color.  

So for each pixel, we have three values (Red, Green, Blue).

<div>
<img src="files/color_channels_rgb.jpg" width="50%" align='center' source="https://www.missnumerique.com/blog/le-canal-alpha-en-video-quest-ce-que-cest/" />
</div>

Again, execute the cells below to see a colored image created only with `numpy`.

In [None]:
# Let's create vector of 100*3 elements with a value between 0 and 1
image_c = np.random.uniform(low=0., high=1., size=100*3)

# Reshape it into a squared image of 10x10 pixels with 3 colors
image_c = image_c.reshape((10, 10, 3))

# Finally display the generated image
plt.imshow(image_c);

In [None]:
# To get some intuition, let's plot each color layer
fig, axs = plt.subplots(1, 3, figsize=(8, 6))
colors = {0:'Reds', 1:'Greens', 2:'Blues'}

for i in colors:
    axs[i].imshow(image_c[:, :, i], cmap=colors[i])
    axs[i].set_title(colors[i])

## Images for ML

### Setup

Install the `skimage` library, it will give us access to classic images. It's better to do it inside a terminal rather than installing libraries in a notebook.

In [None]:
from skimage import data
img = data.astronaut()
plt.imshow(img);

❓ **>>>** What is the shape of this image?

In [None]:
# Code here!


### Number of colors

To make our image ready for compression, or ML in general, we need to represent it as **observations** and **features**.  

Each **pixel** is an **observation**  
Each **color value (Red, Green, Blue)** is a **feature**  

In our image, we have **262,144 observations** (512 * 512 pixels) and **3 features** (RGB)

❓ **>>>** Reshape the image:  
- From its current size of `width * height * 3`
- To a matrix  of size `N * 3` where `N = width * height`  

Assign the reshaped image to `X`.

**Hint**: using the method ```.reshape()``` with the array seems natural.

In [None]:
# Code here!


A colored image may contain up to 16 Millions potential colors.  

Indeed, for each pixel we have **3** channels, and for each channel we have **256** possible values (from 0 to 255).
Hence a potential maximum of **16,777,216** *($256^3$)* colors in a colored image.

Since our image is only 262,144 pixels (512 * 512), it uses at most 262,144 colors, as each pixel contains a single color defined by the three channels' values

Let's count!

❓ **>>>** Find the number of unique colors in this picture.

In [None]:
# Code here!


Let's break down what we have: 
- 262,144 observations, each observation is a pixel.
- 3 features (Red, Green & Blue values) for each observation.
- An unknown number of clusters!

**Let's see what we can do to reduce the number of colors**

## Compression with K-means

We want to reduce the **113,382** colors to **K** colors.  

Using a `KMeans` algorithm over the pixels, we can assign each pixel to one of **K** clusters!  

The center of each cluster is going to be the average color of the pixels that belong to it. 

We can then use this "mean cluster color" as the RGB values for each pixel in the cluster.  

Our objective is to use only **32** colors!

❓ **>>>** Fit a K-means with ```n_clusters=32``` and ```random_state=42``` on your ML-ready image ```X```, and assign it to ```kmeans```.

In [None]:
# Code here!


Check the `labels_` of your `kmeans`, then check their `shape` and `number of unique values`. What do all these information mean?

In [None]:
# Code here!


Let's break it down:

- Each label is a cluster.
- There is one label assigned to each observations.
- There are a total of 32 different labels, one for each cluster.

❓ **>>>** Let's take a look at the `cluster_centers_` of your `KMeans`, especially the shape and first element of this property.

Each cluster_center is a vector of RGB values, it represents the mean color of the cluster.

❓ **>>>** Create an array `X_compressed`, which stores for each pixel picture the mean color of its corresponding cluster.

For example, if the first pixel in X (```X[0]```) has been assigned to the cluster ```8``` in ```kmeans.labels_```, then we want its value which was array ```[154, 147, 151]``` to be replaced by the value of the center of its cluster which is ```[151.23777276, 141.39909707, 140.88924003]``` as we can see if we take a look at ```kmeans.cluster_centers_```.

You can achieve this by using only ```kmeans.cluster_centers_``` and ```kmeans.labels_```. There are many different ways to achieve this.

**Hint** : Make sure the shape of ```X_compressed``` is be (262144, 3).

In [None]:
# Code here!


RGB values must be integer.  

Our colors in `X_compressed` are in `float64`.  

❓ **>>>** Convert `X_compressed` to `uint8`, the unsigned 8-bits integer type which lies between 0 and 255. No need to think here, it's just syntax.

In [None]:
# Code here!


❓ **>>>** Verify that the number of unique colors used is indeed 32.

In [None]:
# Code here!



## Plot compressed image

Our `X_compressed` has shape (262144, 3), we can't display it as an image.

❓ **>>>** Reshape your compressed image and plot it side by side with your original image.  

**Hint**: You need to reshape your flattened `X_compressed` into the right dimensions for an image and your original image has such dimensions.

In [None]:
# Code here!


### Plotting the 2 images side by side

In [None]:
# Plot the original and the compressed image.
fig, ax = plt.subplots(1, 2, figsize = (7, 7))
ax[0].imshow(img)
ax[0].set_title('Original Image')
ax[0].axis('off')

ax[1].imshow(img_compressed)
ax[1].set_title('Compressed Image')
ax[1].axis('off')

plt.tight_layout()

Some colors are lost, but your can easily recognize the original image.

## Bonus : the elbow method

We can try to use the Elbow method to find the optimal compression that loses the least color information.   

Try to plot the `inertia` for `n_clusters` in the list [5, 10, 20, 30, 50, 70, 100] for instance  

**Watch out**, it might take several minutes.

In [None]:
# Code here!


Run the cell below to visualize your compressed image using different number of clusters  
This is a long training, you will wait a few minutes.

In [None]:
wcss = []
compressed_astronaut = []
for i in [1, 2, 5, 10, 20, 30, 50, 70, 100]:
    print('working with ' + str(i) + ' clusters...')
    kmeans = KMeans(n_clusters=i)
    kmeans.fit(X)
    
    # compute inertia
    wcss.append(kmeans.inertia_)
    
    # compute compressed astronaut
    labels = kmeans.labels_
    centers = kmeans.cluster_centers_
    X_compressed = centers[labels]
    X_compressed = X_compressed.astype('uint8')
    # img_compressed = X_compressed.reshape(img_shape[0], img_shape[1], img_shape[2])
    img_compressed = X_compressed.reshape(img_shape)
    compressed_astronaut.append(img_compressed)
    
    # printing
    print("-"*120)
    print(f"If we compress our 113382 unique colors into {i} clusters...")
    print(f"The Kmeans inertia is equal to {kmeans.inertia_}")
    
    # Plotting
    fig, ax = plt.subplots(1, 2, figsize = (7, 7))
    ax[0].imshow(img)
    ax[0].set_title('Original Image')
    ax[0].axis('off')
    
    ax[1].imshow(img_compressed)
    ax[1].set_title('Compressed Image')
    ax[1].axis('off')

    plt.tight_layout()
    plt.show()