# Section 4 - Autoencoders

### Autoencoder is a supervised machine learning model trying to predict itself.
With that process, encodings in the hidden layers create a different representation of the data.
Generally, the loss function of choice is the squared error function, however, other error function can be used as well, like cross-entropy assuming the output being either 0 or 1.
That is obtained by using the sigmoid function.

### Shared Weights
Share weights is a slight modification to normal approach where each layer has its own set of weights.
In this scenario, with one hidden layers, the network can be described by following equations:
\begin{equation*}
Z = s(X.dot(W) + b_{h})
\end{equation*}

\begin{equation*}
\check{X} = s(Z.dot(W^{T}) + b_{o} )
\end{equation*}

This technique can be considered a regularization technique, since it reduced number of parameters.

### Objective Function - Squared error
\begin{equation*}
J = |X - \check{X}|^2_{F} = |X - s(s(XW)W^{T})|^2_{F} 
\end{equation*}

That resembles PCA's objective function, which is given as:
\begin{equation*}
J = |X - XQQ^{T}|^2_{F} 
\end{equation*}

Because of that autoencoders are like nonlinear PCA.


## Tensorflow implementation

In [1]:
import numpy as np
import pandas as pd
from sklearn.utils import shuffle

In [2]:
# Load the data
# The data is stored in folder for Section 2
# It will be re-used.
def getKaggleMNIST():
    # Column 0 is labels
    # Column 1-785 is data, with values 0 .. 255
    train = pd.read_csv('./data/Section 2/train.csv').as_matrix().astype(np.float32)
    train = shuffle(train)

    Xtrain = train[:-1000, 1:] / 255
    Ytrain = train[:-1000, 0].astype(np.int32)

    Xtest  = train[-1000:, 1:] / 255
    Ytest  = train[-1000:, 0].astype(np.int32)
    return Xtrain, Ytrain, Xtest, Ytest

Xtrain, Ytrain, Xtest, Ytest = getKaggleMNIST()

In [3]:
Xtrain.shape

(41000, 784)

In [4]:
Xtest.shape

(1000, 784)