# <span style="color:brown"> Variational Auto Encoder (VAE)

### <span style="color:blue"> by Victor I. Afolabi

Autoencoders are a type of neural network that can be used to learn efficient codings of input data. Given some inputs, the network first applies a series of transformations that map the input data into a lower dimensional space. This part of the network is called the ***encoder***.

Then, the network uses the encoded data to try and recreate the inputs. This part of the network is the ***decoder***. Using the encoder, we can compress data of the type that is understood by the network. However, autoencoders are rarely used for this purpose, as usually there exist hand-crafted algorithms (like jpg-compression) that are more efficient.

Instead, autoencoders have repeatedly been applied to perform de-noising tasks. The encoder receives pictures that have been tampered with noise, and it learns how to reconstruct the original images.

One such application for *autoencoders* is called the **variational autoencoder**. Using variational autoencoders, it’s not only possible to compress data — it’s also possible to generate new objects of the type the autoencoder has seen before.

In [None]:
import sys
import os
import datetime as dt

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

%matplotlib inline

## Load in dataset

In [None]:
from tensorflow.examples.tutorials.mnist import input_data
data = input_data.read_data_sets('datasets/MNIST', one_hot=True)

## Hyperparameters

In [None]:
# Input
image_size = 28
image_channel = 1
image_size_flat = image_size * image_size * image_channel
image_shape = [image_size, image_size, image_channel]

# Network
keep_prob = 0.8
n_latent = 8
decoder_units = int(32 * image_channel / 2)

# Training
learning_rate=1e-3
batch_size = 24
iterations = 10000
log_step = 100
viz_step = 500

## Model's placeholders

In [None]:
tf.reset_default_graph()

X = tf.placeholder(tf.float32, shape=[None, image_size_flat], name='X_placeholder')
y = tf.placeholder(tf.float32, shape=[None, image_size_flat], name='y_placeholder')

### Helpers

In [None]:
def leakyReLU(X, alpha=0.3):
    return tf.maximum(X, tf.multiply(X, alpha))

def conv2d(X, filters=64, kernel_size=4, strides=2, padding='SAME', activation=tf.nn.relu, dropout=True):
    layer = tf.layers.conv2d(inputs=X, filters=filters, kernel_size=kernel_size, 
                            strides=strides, padding=padding, activation=activation)
    if dropout:
        layer = tf.nn.dropout(layer, keep_prob=keep_prob)
    return layer

def conv2d_transpose(X, filters=64, kernel_size=4, strides=2, padding='SAME', activation=tf.nn.relu, dropout=True):
    layer = tf.layers.conv2d_transpose(inputs=X, filters=filters, kernel_size=kernel_size, 
                                       strides=strides, padding=padding, activation=activation)
    if dropout:
        layer = tf.nn.dropout(layer, keep_prob=keep_prob)
    return layer

def dense(X, units, activation=leakyReLU):
    return tf.layers.dense(inputs=X, units=units, activation=activation)

## The Encoder

As our inputs are images, it’s most reasonable to apply some convolutional transformations to them. What’s most noteworthy is the fact that we are creating two vectors in our encoder, as the encoder is supposed to create objects following a Gaussian Distribution:

* A vector of means
* A vector of standard deviations

You will see later how we *“force”* the encoder to make sure it really creates values following a Normal Distribution. The returned values that will be fed to the decoder are the z-values. We will need the mean and standard deviation of our distributions later, when computing losses.

In [None]:
def encoder(X):
    with tf.variable_scope('encoder', reuse=None):
        X = tf.reshape(X, shape=[-1, image_size, image_size, image_channel])
        X = conv2d(X, activation=leakyReLU, dropout=True)
        X = conv2d(X, activation=leakyReLU, dropout=True)
        X = conv2d(X, strides=1, activation=leakyReLU, dropout=True)
        X = tf.contrib.layers.flatten(X)
        mean = tf.layers.dense(X, units=n_latent)
        stddev = 0.5 * tf.layers.dense(X, units=n_latent) # 0.5 * mean
        noise = tf.random_normal(tf.stack([tf.shape(X)[0], n_latent]))
        print(noise)
        z = mean + tf.multiply(noise, tf.exp(stddev))
        return z, mean, stddev

## The Decoder

The decoder does not care about whether the input values are sampled from some specific distribution that has been defined by us. It simply will try to reconstruct the input images. To this end, we use a series of *transpose convolutions*.

In [None]:
def decoder(z):
    with tf.variable_scope('decoder', reuse=None):
        X = dense(z, units=decoder_units, activation=leakyReLU)
        X = dense(z, units=decoder_units*2, activation=leakyReLU)
        shape = X.get_shape()[1].value // 2
        reshape_dim = [-1, shape, shape, image_channel]
        X = tf.reshape(X, reshape_dim)
        X = conv2d_transpose(X, dropout=True)
        X = conv2d_transpose(X, strides=1, dropout=True)
        X = conv2d_transpose(X, strides=1, dropout=False)
        X = tf.contrib.layers.flatten(X)
        X = dense(X, units=image_size_flat, activation=tf.nn.sigmoid)
        img = tf.reshape(X, [-1, *image_shape])
        return img

In [None]:
z, mean, stddev = encoder(X)
img = decoder(z)

In [None]:
img_reshape = tf.reshape(img, [-1, image_size_flat])

img_loss = tf.reduce_sum(tf.squared_difference(img_reshape, y), axis=1)
latent_loss = -0.5 * tf.reduce_sum(1.0 + 2.0 * stddev - tf.square(mean) - tf.exp(2.0*stddev), axis=1)
loss = tf.reduce_mean(img_loss + latent_loss)

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_step = optimizer.minimize(loss)

## Tensorflow's `Session`

In [None]:
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

### Tensorboard

In [None]:
tensorboard_path = 'tensorboard/'
save_path = 'models/'
logdir = os.path.join(tensorboard_path, 'log')
pretrained = os.path.join(save_path, 'model.ckpt')

saver = tf.train.Saver()
writer = tf.summary.FileWriter(logdir=logdir, graph=sess.graph)

if tf.gfile.Exists(save_path):
    if len(os.listdir(save_path)) > 1:
        saver.restore(sess=sess, save_path=save_path)
else:
    tf.gfile.MakeDirs(save_path)

## Training

In [None]:
train_start = dt.datetime.now()
for i in range(iterations):
    batch = data.train.next_batch(batch_size=batch_size)[0]
    feed_dict = {X: batch, y: batch}
    sess.run(train_step, feed_dict=feed_dict)
    if i % log_step == 0:
        _loss, _img_loss, _latent_loss, _mean, _stddev = sess.run([loss, img_loss, latent_loss, mean, stddev], feed_dict=feed_dict)
        sys.stdout.write('\rLoss={:.2f}\timg_loss = {:.2f}\tlatent_loss = {:.2f}\tmean = {.2f}\tstddev = {:.2f}\tTime taken = {}'.format(
            _loss, _img_loss, _latent_loss, _mean, _stddev, dt.datetime.now() - start_time
        ))
    if i % viz_step == 0:
        _reconstruct = sess.run(img, feed_dict=feed_dict)
        plt.imshow(np.reshape(batch[0], image_shape), cmap='Greys')
        plt.imshow(_reconstruct[0], cmap='Greys')
        plt.show()
        print()
