# Autoencoder

In this work we will build simple and more complicated autoencoders on the MNIST dataset.

An autoencoder is a neural network that is trained to attempt to copy its input to its output. It has two parts :


- An encoder function $h_{\theta_{e}} : \mathcal{X} \rightarrow \mathcal{Z}$ that pushes the inputs $x$ in a smaller dimensional space.
- A decoder function $g_{\theta_{d}} : \mathcal{Z} \rightarrow \mathcal{X}$ that reconstructs from the low dimensional space to the initial space

Very generally autoencoders aim at solving  : 

$$\underset{\theta_{e},\theta_{d}}{\text{min}} \ \underset{x \sim \mathbb{P}_{r}}{\mathbb{E}}[L(x,g_{\theta_{d}},h_{\theta_{e}})]$$

<img src="imgs/autoencoder.png" alt="Drawing" style="width: 500px;"/>



In [None]:
from keras.layers import Input, Dense
from keras.models import Model
import matplotlib.pyplot as plt
import matplotlib.colors as mcol
from matplotlib import cm
def graph_colors(nx_graph):
    #cm1 = mcol.LinearSegmentedColormap.from_list("MyCmapName",["blue","red"])
    #cm1 = mcol.Colormap('viridis')

    cnorm = mcol.Normalize(vmin=0,vmax=9)
    cpick = cm.ScalarMappable(norm=cnorm,cmap='Set1')
    cpick.set_array([])
    val_map = {}
    for k,v in nx.get_node_attributes(nx_graph,'attr').items():
        #print(v)
        val_map[k]=cpick.to_rgba(v)
    #print(val_map)
    colors=[]
    for node in nx_graph.nodes():
        #print(node,val_map.get(str(node), 'black'))
        colors.append(val_map[node])
    return colors

Load the MNIST dataset using the following command:

In [None]:
from keras.datasets import mnist
import numpy as np
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))
print(x_train.shape)
print(x_test.shape)

##### Question 1. Write a function that builds a simple autoencoder 

The autoencoder must have a simple Dense layer with relu activation. The number of node of the dense layer is a parameter of the function.

The function must return the entire autoencoder model as well as the encoder and the decoder.
You will need the following classes:
- [Input](https://keras.io/layers/core/)
- [Dense](https://keras.io/layers/core/)
- [Model](https://keras.io/models/model/)

##### Question 2. Build the autoencoder with a embedding size of 32 and print the number of parameters of the model. What do they relate to ?


##### Question 3. Fit the autoencoder using 32 epochs with a batch size of 256

##### Question 4. Using the history module of the autoencoder write a function that plots the learning curves with respect to the epochs on the train and test set. What can you say about these learning curves ? Give also the last loss on the test set

##### Question 5. Write a function that plots a fix number of example of the original images on the test as well as their reconstruction

### Nearest neighbours graphs
The goal of this part is to visualize the neighbors graph in the embedding. It corresponds the the graph of the k-nearest neighbours using the euclidean distance of points the element in the embedding

The function that computes the neighbors graphs can be found here [kneighbors_graph](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.kneighbors_graph.html)

In [None]:
from sklearn.neighbors import kneighbors_graph
import networkx as nx

In [None]:
def plot_nearest_neighbour_graph(encoder,x_test,y_test,ntest=100,p=3): #to explain
    X=encoder.predict(x_test[1:ntest])
    y=y_test[1:ntest]
    A = kneighbors_graph(X, p, mode='connectivity', include_self=True)
    G=nx.from_numpy_array(A.toarray())
    nx.set_node_attributes(G,dict(zip(range(ntest),y)),'attr')
    fig, ax = plt.subplots(figsize=(10,10))
    pos=nx.layout.kamada_kawai_layout(G)
    nx.draw(G,pos=pos
            ,with_labels=True
            ,labels=nx.get_node_attributes(G,'attr')
            ,node_color=graph_colors(G))
    plt.tight_layout()
    plt.title('Nearest Neighbours Graph',fontsize=15)
    plt.show()

In [None]:
plot_nearest_neighbour_graph(encoder,x_test,y_test,ntest=100,p=3)

We can also plot a 2D MDS of the embedding space: 

In [None]:
from sklearn.manifold import MDS

ntest=500
mds=MDS(n_components=2)
X=encoder.predict(x_test[1:ntest])
X_dim2=mds.fit_transform(X)
colors = ['red','green','blue','purple','darkblue','yellow','black','pink','orange','grey']
label=y_test[1:ntest]
plt.figure(figsize=(15,10))
plt.scatter(X_dim2[:,0]
            ,X_dim2[:,1]
            ,c=label,cmap=matplotlib.colors.ListedColormap(colors)
           ,s=100)
plt.colorbar()
plt.title('MDS on embedding space')
plt.show()

### Reduce the dimension of the embedding

##### Question 6. Rerun the previous example using an embedding dimension of 16

## Adding sparsity


##### Question 7.  Add sparisity over the weights

In this part we will add sparisity over the weights on the embedding layer. Write a function that build such a autoencoder (using a l1 regularization with a configurable regularization parameter and using the same autoencoder architecture that before)

You will use the [regularizers](https://keras.io/regularizers/) module.

# Deep autoencoder

##### Question 8. Use the following deep autoencoder to rerun the previous example. What can you say about the quality of the autoencoding ?

In [None]:
def build_deep_autoencoder(encoding_dim=32):
    

    input_img = Input(shape=(784,))
    encoded = Dense(128, activation='relu')(input_img)
    encoded = Dense(64, activation='relu')(encoded)
    encoded = Dense(encoding_dim, activation='relu', name="embedding_layer")(encoded)

    encoder=Model(input_img, encoded)

    
    decoded = Dense(64, activation='relu')(encoded)
    decoded = Dense(128, activation='relu')(decoded)
    decoded = Dense(784, activation='sigmoid')(decoded)
    
    autoencoder = Model(input_img, decoded)
    

    autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
    
    return autoencoder,encoder

# Convolutional autoencoder

##### Question 8. Use the following convolutional autoencoder to rerun the previous example. What can you say about the quality of the autoencoding ?

In [None]:
from keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
from keras import backend as K


def build_conv_autoencoder():
    input_img = Input(shape=(28, 28, 1))

    x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
    x = MaxPooling2D((2, 2), padding='same')(x)
    x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
    encoded = MaxPooling2D((2, 2), padding='same')(x)
    
    encoder= Model(input_img, encoded)

    # at this point the representation is (7, 7, 32)

    x = Conv2D(32, (3, 3), activation='relu', padding='same')(encoded)
    x = UpSampling2D((2, 2))(x)
    x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
    x = UpSampling2D((2, 2))(x)
    decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

    autoencoder = Model(input_img, decoded)
    autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

    
    return autoencoder,encoder

# Application to denoising

In this part we will add some noise to the original data to see how the auto-encoding process denoises our data.

In [None]:
(x_train, _), (x_test, _) = mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))  # adapt this if using `channels_first` image data format
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))  # adapt this if using `channels_first` image data format

noise_factor = 0.5
x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape) 
x_test_noisy = x_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_test.shape) 

x_train_noisy = np.clip(x_train_noisy, 0., 1.)
x_test_noisy = np.clip(x_test_noisy, 0., 1.)

##### Question 9. Denoise using the convolutional autoencoder

In [None]:
decoded_imgs = autoencoder.predict(x_test)
n = 10  # how many digits we will display
plt.figure(figsize=(20, 4))
for i in range(n):
    # display original
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(x_test_noisy[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # display reconstruction
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded_imgs[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()