### AI2017S project tutorial: Simple Neural Network learning MNIST
+ Original source: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/mnist/mnist_softmax.py
+ Modified and supplemented by Seongho Son, Seongjun Choi

#### Requirements
+ python
+ jupyter notebook
+ tensorflow
+ (If possible) GPU


## Import libraries 
- Tensorflow
- numpy
- input_data(library for loading MNIST dataset)

In [1]:
import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data

## Load MNIST dataset, download if does not exist

+ image (handwritten digit): 28 * 28 = 784 pixels
+ corresponding label (one of 0, 1, 2, ..., 9) in 'one_hot vector' format
+ train / validation / test instances: 55000 / 5000 / 10000
+ training images: 'mnist.train.images'
+ training labels: 'mnist.train.labels'

In [2]:
mnist = input_data.read_data_sets('./MNIST_data', one_hot = True)
print(mnist.train.images.shape)
print(mnist.validation.images.shape)
print(mnist.test.images.shape)

Extracting ./MNIST_data/train-images-idx3-ubyte.gz
Extracting ./MNIST_data/train-labels-idx1-ubyte.gz
Extracting ./MNIST_data/t10k-images-idx3-ubyte.gz
Extracting ./MNIST_data/t10k-labels-idx1-ubyte.gz
(55000, 784)
(5000, 784)
(10000, 784)


## (Reset model), Define session

+ __tf.reset_default_graph()__ resets your model definition, so that you don't have to restart the kernel.
+ If you do not specify the fraction of GPU memory to be used by tensorflow in this code, it automatically assigns __all the remaining GPU memory__, which will prevent you from running separate code which uses GPU memory.
+ You can control the portion of GPU memory used by this code, by setting __tf.GPUOptions__ like below.
       assigning 0.2 to *per_process_gpu_memory_fraction* means that 
       20% of the whole GPU memory is assigned to this code

In [3]:
try:
    tf.reset_default_graph()
    sess.close()
except:
    pass

# Use only 20% of the whole GPU memory for this code
sess = tf.InteractiveSession(config=tf.ConfigProto(gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.2)))

## Define model
+ model parameters
+ placeholders to contain input / target data
+ training algorithm used for learning
+ *__correct_prediction__* is *__None__*-size long, 1-dimentional vector whoes each element means if the prediction for the item was correct or not.
+ *__accurarcy__* is a *__float32__* value of the average of all elements in *__correct_prediction__*. That's what *__reduce__* and *__mean__* means in *__tf.reduce_mean__*.

In [4]:
x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])

W1 = tf.Variable(tf.zeros([784, 50]))
b1 = tf.Variable(tf.zeros([50]))
h = tf.sigmoid(tf.matmul(x, W1) + b1)

W2 = tf.Variable(tf.zeros([50, 10]))
b2 = tf.Variable(tf.zeros([10]))
y = tf.matmul(h, W2) + b2

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
train_step = tf.train.GradientDescentOptimizer(0.3).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


## Dropout

+ use a placeholder for assigning __probability of using each weight__ for training
        assign a value between 0 and 1.0 during training,
        and assign 1.0 during validation or test
+ use *__tf.nn.dropout__* to apply dropout on that layer

In [5]:
keep_prob = tf.placeholder(tf.float32)

W1_DO = tf.Variable(tf.zeros([784, 50]))
b1_DO = tf.Variable(tf.zeros([50]))
h_DO = tf.sigmoid(tf.matmul(x, W1_DO) + b1_DO)
h_drop = tf.nn.dropout(h_DO, keep_prob)

W2_DO = tf.Variable(tf.zeros([50, 10]))
b2_DO = tf.Variable(tf.zeros([10]))
y_DO = tf.matmul(h_drop, W2_DO) + b2_DO
y_drop = tf.nn.dropout(y_DO, keep_prob)

cross_entropy_DO = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_DO))
train_step_DO = tf.train.GradientDescentOptimizer(0.3).minimize(cross_entropy_DO)

correct_prediction_DO = tf.equal(tf.argmax(y_DO, 1), tf.argmax(y_, 1))
accuracy_DO = tf.reduce_mean(tf.cast(correct_prediction_DO, tf.float32))

## Batch Normalization

+ obtain mean and variance of current (mini-)batch by using __tf.nn.moments__
        these mean and variance will be used for normalization
+ Assign two additional vectors to each layer for learning: 'scale' and 'beta'
+ 'epsilon' used for __preventing zero variances__
+ *__tf.nn.batch_normalization__* used for applying batch normalization, whose result will then be the input to the layer's activation function

In [6]:
epsilon = 1e-3

W1_BN = tf.Variable(tf.zeros([784, 50]))
z1_BN = tf.matmul(x, W1_BN)
bmean1, bvar1 = tf.nn.moments(z1_BN, [0])
scale1 = tf.Variable(tf.ones([50]))
beta1 = tf.Variable(tf.zeros([50]))
BN1 = tf.nn.batch_normalization(z1_BN, bmean1, bvar1, beta1, scale1, epsilon)
h_BN = tf.nn.sigmoid(BN1)

W2_BN = tf.Variable(tf.zeros([50, 10]))
z2_BN = tf.matmul(h_BN, W2_BN)
bmean2, bvar2 = tf.nn.moments(z2_BN, [0])
scale2 = tf.Variable(tf.ones([10]))
beta2 = tf.Variable(tf.zeros([10]))
BN2 = tf.nn.batch_normalization(z2_BN, bmean2, bvar2, beta2, scale2, epsilon)
y_BN = BN2

cross_entropy_BN = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_BN))
train_step_BN = tf.train.GradientDescentOptimizer(0.3).minimize(cross_entropy_BN)

correct_prediction_BN = tf.equal(tf.argmax(y_BN, 1), tf.argmax(y_, 1))
accuracy_BN = tf.reduce_mean(tf.cast(correct_prediction_BN, tf.float32))



## Other objective Functions

+ Mean Squared Error
+ L2 normalization (which needs an regularization coefficient)

In [7]:
L2_lambda = 0.05

mean_sq_err = tf.reduce_mean(tf.square(y - y_))
L2_norm = cross_entropy + \
          L2_lambda * (tf.nn.l2_loss(W1) + tf.nn.l2_loss(b1) + tf.nn.l2_loss(W2) + tf.nn.l2_loss(b2))

## Other optimizers
+ __tf.train.MomentumOptimizer__ (requires *momentum coefficient*)
+ __tf.train.AdamOptimizer__
+ Check the tensorflow website for further details, since many arguments can be given and are set to default when not specified

In [8]:
learning_rate = 0.3
momentum = 0.3

train_step_momentum = tf.train.MomentumOptimizer(learning_rate, momentum).minimize(cross_entropy)
train_step_adam = tf.train.AdamOptimizer().minimize(cross_entropy)

## Initialize variables, Training 
+ Inside the second for-loop, fetch 100 images and labels, and train the network
+ A __training epoch__ is completed when the whole training data is trained once
+ Continue training until *__validation accuracy starts to decrease__*, which signals __overfitting__
+ __Look carefully at the values assigned for keep_prob! (Dropout)__

In [9]:
num_epochs = 1000
list_vacc = list()
list_vacc_BN = list()
list_vacc_DO = list()

tf.global_variables_initializer().run()

for i in range(num_epochs): # training epoch setting
    list_tacc = list()
    list_tacc_BN = list()
    list_tacc_DO = list()
    
    for _ in range(550): # fetch & train 100 images at a time, which requires 550 iterations to complete an epoch
        batch_xs, batch_ys = mnist.train.next_batch(100) # minibatch size setting
        sess.run(train_step, feed_dict = {x: batch_xs, y_: batch_ys})
        sess.run(train_step_DO, feed_dict = {x: batch_xs, y_: batch_ys, keep_prob: 0.5})
        sess.run(train_step_BN, feed_dict = {x: batch_xs, y_: batch_ys})
        
        list_tacc.append(accuracy.eval(feed_dict={x: batch_xs, y_: batch_ys}))
        list_tacc_DO.append(accuracy_DO.eval(feed_dict={x: batch_xs, y_: batch_ys, keep_prob: 1}))
        list_tacc_BN.append(accuracy_BN.eval(feed_dict={x: batch_xs, y_: batch_ys}))
        
    vacc = accuracy.eval(feed_dict={x: mnist.validation.images, y_: mnist.validation.labels})
    vacc_DO = accuracy_DO.eval(feed_dict={x: mnist.validation.images, y_: mnist.validation.labels, keep_prob: 1})
    vacc_BN = accuracy_BN.eval(feed_dict={x: mnist.validation.images, y_: mnist.validation.labels})
    list_vacc.append(vacc)
    list_vacc_DO.append(vacc_DO)
    list_vacc_BN.append(vacc_BN)
    
    print("epoch {}: tr_NN {:.3f}, val_NN {:.3f}, tr_DO {:.3f}, val_DO {:.3f}, tr_BN {:.3f}, val_BN {:.3f}".format((i + 1), \
                                 np.mean(list_tacc), vacc, np.mean(list_tacc_DO), vacc_DO, np.mean(list_tacc_BN), vacc_BN))
    , 
    if i > 10 and np.mean(list_vacc[-10:-5]) > np.mean(list_vacc[-5:]):
        print("Seems like it starts to overfit, aborting the training")
        break
    
    

Instructions for updating:
Use `tf.global_variables_initializer` instead.
epoch 1: tr_NN 0.236, val_NN 0.309, tr_DO 0.714, val_DO 0.878, tr_BN 0.545, val_BN 0.638
epoch 2: tr_NN 0.422, val_NN 0.499, tr_DO 0.899, val_DO 0.910, tr_BN 0.664, val_BN 0.660
epoch 3: tr_NN 0.535, val_NN 0.580, tr_DO 0.915, val_DO 0.922, tr_BN 0.676, val_BN 0.673
epoch 4: tr_NN 0.587, val_NN 0.605, tr_DO 0.924, val_DO 0.927, tr_BN 0.680, val_BN 0.676
epoch 5: tr_NN 0.604, val_NN 0.610, tr_DO 0.929, val_DO 0.934, tr_BN 0.682, val_BN 0.684
epoch 6: tr_NN 0.610, val_NN 0.617, tr_DO 0.933, val_DO 0.939, tr_BN 0.683, val_BN 0.687
epoch 7: tr_NN 0.617, val_NN 0.619, tr_DO 0.937, val_DO 0.943, tr_BN 0.684, val_BN 0.693
epoch 8: tr_NN 0.622, val_NN 0.626, tr_DO 0.940, val_DO 0.944, tr_BN 0.682, val_BN 0.695
epoch 9: tr_NN 0.625, val_NN 0.626, tr_DO 0.942, val_DO 0.945, tr_BN 0.681, val_BN 0.688
epoch 10: tr_NN 0.630, val_NN 0.633, tr_DO 0.944, val_DO 0.948, tr_BN 0.679, val_BN 0.692
epoch 11: tr_NN 0.632, val_NN 0.630

## Testing
+ Check the test accuracy of this model

In [10]:
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
print(sess.run(accuracy_DO, feed_dict={x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1}))
print(sess.run(accuracy_BN, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

0.6515
0.9535
0.6932


## Closing Session

+ You must close the opened session.
+ Or you can use 

        with tf.Session() as sess:
            tf.initialize_all_variables().run()
            ... your code here using sess...
            
        # You don't need to close sess. It's already closed here.

In [11]:
sess.close()