# Deep Learning

1. Build a DNN with __five hidden layers__ of __100 neurons__ each, __He initialization__, and the __ELU__ activation function.
2. Using __Adam optimization__ and __early stopping__, try training it on MNIST but only on digits 0 to 4, as we will use transfer learning for digits 5 to 9 in the next exercise. You will need a __softmax output layer__ with __five neurons__, and as always make sure to _save checkpoints_ at regular intervals and save the final model so you can reuse it later.
3. Tune the hyperparameters using __cross-validation__ and see what precision you can achieve.
4. Now try adding __Batch Normalization__ and compare the learning curves: is it converging faster than before? Does it produce a better model?
5. Is the model overfitting the training set? Try adding dropout to every layer and try again. Does it help?

In [11]:
import tensorflow as tf
import numpy as np

In [12]:
n_inputs = 28*28 # MNIST data set
n_hidden1 = 100
n_hidden2 = 100
n_hidden3 = 100
n_hidden4 = 100
n_hidden5 = 100
n_outputs = 5 # digits from 0 to 4 and from 5 to 9

In [13]:
X = tf.placeholder(tf.float32, shape=(None, 784), name='X')
y = tf.placeholder(tf.int64, shape=(None), name='y')

### Neuron layer

In [14]:
from tensorflow.contrib.layers import fully_connected, variance_scaling_initializer, batch_norm

### Batch Normalization

In [15]:
is_training = tf.placeholder(tf.bool, name='is_training')

bn_params = {
    'is_training': is_training,
    'decay': 0.999,
    'updates_collections': None,
    'scale': True
}

In [16]:
def neuron_layer(X, n_neurons, name, activation=None):
    # He initialization
    he_init = variance_scaling_initializer()
    layer = fully_connected(X, n_neurons, weights_initializer=he_init, 
                            normalizer_fn=batch_norm, normalizer_params=bn_params,
                            activation_fn=activation, scope=name)
    return layer

### Neuron network

In [17]:
with tf.name_scope('dnn'):
    hidden1 = neuron_layer(X, n_hidden1, 'hidden1', activation=tf.nn.elu)
    hidden2 = neuron_layer(hidden1, n_hidden2, 'hidden2', activation=tf.nn.elu)
    hidden3 = neuron_layer(hidden2, n_hidden3, 'hidden3', activation=tf.nn.elu)
    hidden4 = neuron_layer(hidden3, n_hidden4, 'hidden4', activation=tf.nn.elu)
    hidden5 = neuron_layer(hidden4, n_hidden5, 'hidden5', activation=tf.nn.elu)
    logits = neuron_layer(hidden5, n_outputs, 'logits')

### Loss function 

In [18]:
with tf.name_scope('loss'):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    loss = tf.reduce_mean(xentropy, name='loss')

### Optimization

In [19]:
learning_rate = 0.01

with tf.name_scope('train'):
    optimizer = tf.train.AdamOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)

### The Perfomance measure

In [20]:
with tf.name_scope('eval'):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

### Execution

In [21]:
init = tf.global_variables_initializer()
saver = tf.train.Saver()

In [22]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('/tmp/data/')

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting /tmp/data/train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


In [23]:
X_train1 = mnist.train.images[mnist.train.labels < 5]
y_train1 = mnist.train.labels[mnist.train.labels < 5]
X_valid1 = mnist.validation.images[mnist.validation.labels < 5]
y_valid1 = mnist.validation.labels[mnist.validation.labels < 5]
X_test1 = mnist.test.images[mnist.test.labels < 5]
y_test1 = mnist.test.labels[mnist.test.labels < 5]

In [32]:
n_epochs = 50
batch_size = 20

In [33]:
with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        rnd_index = np.random.permutation(len(X_train1))
        for rnd_indices in np.array_split(rnd_index, len(X_train1) // batch_size):
            X_batch, y_batch = X_train1[rnd_indices], y_train1[rnd_indices]
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch, is_training: True})
        acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch, is_training: False})
        acc_valid = accuracy.eval(feed_dict={X: X_valid1, y: y_valid1, is_training: False})
        print('Epoch: ', epoch, 'training accuracy: ', acc_train, 'validation accuracy: ', acc_valid)
        
    save_path = saver.save(sess, '../models/chapt11_model1.ckpt')

Epoch:  0 training accuracy:  1.0 validation accuracy:  0.9816263
Epoch:  1 training accuracy:  1.0 validation accuracy:  0.97888976
Epoch:  2 training accuracy:  0.9 validation accuracy:  0.96989834
Epoch:  3 training accuracy:  0.9 validation accuracy:  0.98749024
Epoch:  4 training accuracy:  1.0 validation accuracy:  0.9851446
Epoch:  5 training accuracy:  1.0 validation accuracy:  0.9714621
Epoch:  6 training accuracy:  1.0 validation accuracy:  0.9784988
Epoch:  7 training accuracy:  1.0 validation accuracy:  0.9921814
Epoch:  8 training accuracy:  1.0 validation accuracy:  0.9730258
Epoch:  9 training accuracy:  1.0 validation accuracy:  0.99022675
Epoch:  10 training accuracy:  1.0 validation accuracy:  0.98397183
Epoch:  11 training accuracy:  1.0 validation accuracy:  0.98749024
Epoch:  12 training accuracy:  1.0 validation accuracy:  0.99022675
Epoch:  13 training accuracy:  1.0 validation accuracy:  0.9898358
Epoch:  14 training accuracy:  1.0 validation accuracy:  0.987490

In [37]:
with tf.Session() as sess:
    saver.restore(sess, '../models/chapt11_model1.ckpt')
    
    X_new = mnist.test.images[10:20]
    Z = logits.eval(feed_dict={X: X_new, is_training: False})
    y_pred = np.argmax(Z, axis=1)

INFO:tensorflow:Restoring parameters from ../models/chapt11_model1.ckpt


In [38]:
y_pred

array([0, 0, 4, 0, 1, 3, 4, 3, 3, 4])

In [39]:
mnist.test.labels[10:20]

array([0, 6, 9, 0, 1, 5, 9, 7, 3, 4], dtype=uint8)