## Auto-Encoder

In [1]:
import numpy as np
import sklearn.preprocessing as prep
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

It is going to use the Xavier initialization ( mean=$0$, variance=$\frac{2}{(n_{in})+(n_{out})}$ ), so that it will adjust the weights properly.

In [13]:
def xavier_init(inputs, outputs, constant=1):
    low = - constant * np.sqrt(6/(inputs + outputs))
    high = constant * np.sqrt(6/(inputs + outputs))
    return tf.random_uniform((inputs, outputs), minval = low, maxval = high, dtype = tf.float32)

Build a class of de-noise encoder. <br>
1. n_input: number of input nodes<br>
2. n_hidden: number of hidden nodes<br>
3. transfer_function: activated function in hidden layer (softplus)<br>
4. optimizer: (Adam optimization method)<br>
5. scale: gaussian noise parameter, default=0.1<br>

In [20]:
class NoiseAutoencoder(object):
    
    def __init__(self, n_inputs, n_hidden, transfer=tf.nn.softplus, optimizer=tf.train.AdamOptimizer(), scale=0.1):
        self.n_inputs = n_inputs
        self.n_hidden = n_hidden
        self.transfer = transfer
        self.scale = tf.placeholder(tf.float32)
        self.training_scale = scale
        network_weights = self._initialize_weights()
        self.weights = network_weights
        
        # Network Structure
        self.x = tf.placeholder(tf.float32, [None, self.n_inputs])
        self.hidden = self.transfer(tf.add(tf.matmul(self.x+scale*tf.random_normal((n_inputs, )), self.weights['w1']), self.weights['b1']))
        self.reconstruction = tf.add(tf.matmul(self.hidden, self.weights['w2']), self.weights['b2'])
        
        # Loss Function & Optimizer (Squared Error)
        self.cost = 0.5 * tf.reduce_sum(tf.pow( tf.subtract(self.reconstruction, self.x), 2.0 ))
        self.optimizer = optimizer.minimize(self.cost)
        
        init = tf.global_variables_initializer()
        self.sess = tf.Session()
        self.sess.run(init)
        
    def _initialize_weights(self):
        all_weights = dict()
        all_weights['w1'] = tf.Variable(xavier_init(self.n_inputs, self.n_hidden))
        all_weights['b1'] = tf.Variable(tf.zeros([self.n_hidden], dtype=tf.float32))
        all_weights['w2'] = tf.Variable(tf.zeros([self.n_hidden,self.n_inputs], dtype=tf.float32))
        all_weights['b2'] = tf.Variable(tf.zeros([self.n_inputs], dtype=tf.float32))
        return all_weights
    
    def partial_fit(self,X):
        # Train with the batch of data, and return the current cost
        cost, opt = self.sess.run((self.cost, self.optimizer), feed_dict = {self.x:X, self.scale: self.training_scale})
        return cost
    
    def calc_total_cost(self,X):
        # Calculate the cost when training is completed, for evaluation of the performance
        return self.sess.run(self.cost, feed_dict = {self.x:X, self.scale: self.training_scale} )
    
    def transfer(self,X):
        # Return the results of the hidden layer, high-order features of the data
        return self.sess.run(self.hidden, feed_dict = {self.x:X, self.scale: self.training_scale} )
    
    def generate(self,hidden = None):
        # Reconstruct the high-order features into original data
        if hidden is None:
            hidden = np.random.normal(size = self.weights['b1'])
        return self.sess.run(self.reconstruction, feed_dict = {self.hidden: hidden})   
    
    def reconstruction(self,X):
        # Input the original data, extract the high-order features and reconstruct it (Transfer + Generate)
        return self.sess.run(self.reconstruction, feed_dict={self.x:X, self.scale: self.training_scale})
    
    def getWeights(self):
        return self.sess.run(self.weights['w1'])
    
    def getBiases(self):
        return self.sess.run(self.weights['b1'])

So far, de-noising auto-encoder have been defined, including network design, weights initializer, and some member functions. Next, the auto-encoder will be tested on the MNIST data set.

In [23]:
mnist = input_data.read_data_sets('MNIST_data',one_hot = True)

def standard_scale(X_train, X_test):
    # Standard to make 0-mean, 1-variance
    preprocessor = prep.StandardScaler().fit(X_train)
    X_train = preprocessor.transform(X_train)
    X_test = preprocessor.transform(X_test)
    return X_train, X_test

def get_random_block_from_data(data, batch_size):
    start_index = np.random.randint(0, len(data) - batch_size)
    return data[start_index:(start_index + batch_size)]

X_train, X_test = standard_scale(mnist.train.images, mnist.test.images)

n_samples = int(mnist.train.num_examples)
training_epochs = 20
batch_size = 128
display_step = 1

Extracting MNIST_data\train-images-idx3-ubyte.gz
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz


In [24]:
autoencoder = NoiseAutoencoder(n_inputs=784, # input nodes
                               n_hidden=200, # hidden layer nodes
                               transfer=tf.nn.softplus, # hidden layer activated function
                               optimizer=tf.train.AdamOptimizer(learning_rate=0.001), # optimizer
                               scale=0.01) # noise scale

for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = int(n_samples/batch_size)
    for i in range(total_batch):
        batch_xs = get_random_block_from_data(X_train, batch_size)
        cost = autoencoder.partial_fit(batch_xs)
        avg_cost += cost/n_samples * batch_size
        
    if epoch % display_step == 0:
        print("Epoch:", '%04d' % (epoch + 1),"cost=", "{:.9f}".format(avg_cost))

Epoch: 0001 cost= 19880.573754545
Epoch: 0002 cost= 13105.512050000
Epoch: 0003 cost= 11897.797705682
Epoch: 0004 cost= 10509.712414773
Epoch: 0005 cost= 8963.064736364
Epoch: 0006 cost= 10011.154247727
Epoch: 0007 cost= 8993.481925568
Epoch: 0008 cost= 9455.465035795
Epoch: 0009 cost= 8521.460876705
Epoch: 0010 cost= 8446.229340341
Epoch: 0011 cost= 7817.854427841
Epoch: 0012 cost= 8639.855586364
Epoch: 0013 cost= 7929.964800568
Epoch: 0014 cost= 8223.703711364
Epoch: 0015 cost= 9248.268669318
Epoch: 0016 cost= 8118.609974432
Epoch: 0017 cost= 8307.674653977
Epoch: 0018 cost= 8056.903177841
Epoch: 0019 cost= 8096.820605682
Epoch: 0020 cost= 7574.670132955


The cost turns to be around 7000 after 20 epochs, even though keep going with more epochs. It will be lower by adjusting the batch_size, optimizer, hidder layer nodes, or hidden layer...

In [26]:
print("Total cost: "+ str(autoencoder.calc_total_cost(X_test)))

Total cost: 704016.5


After implementation, it is obviously that the auto-encoder is highly like a single hidden layer Neural Network, and what's the different is the input has been standardized and add an Gaussian noise.

The output is not a label but reconstructed data. Auto-encoder served as an unsupervised-learning method, and they are not going to clustering the data, but extract the significant or frequent higher-order features, moreover reconstruct the data based on these higher-order features. 

It is an important idea that: use the unsupervised learning to extract useful features, and initialize the weights into a fair distribution. Then apply the labelled-data for supervised leaning tasks. It idea is also known as "Fine-tune".