# Artificial Neural Networks

## Training a DNN Using Plain TensorFlow

The first step is the construction phase, building the TensorFlow graph. The second step is the execution phase, where we actually run the graph to train the model.

### Construction Phase

In [0]:
import tensorflow as tf
import numpy as np

n_inputs = 28 * 28 # MNIST
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 10

Next, we can use placeholder nodes to represent the traing data and targets. We know that X will be a 2D tensor, with instances along the first dimension and features along the second dimension, and we know that the number of features is going to be 28x28, but we don't know yet how many instances each training batch will contain. So the shape of X is (None, n_inputs). Similary, we know that y will be a 1D tensor with one entry per instance, but again we don't know the size of the training batch at this point, so the shape is (None).

### Placeholders

In [0]:
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name='X')
y = tf.placeholder(tf.int64, shape=(None), name='y')

The placoholder X will act as the input layer; during the execution phase, it will be replaced with one training batch at a time. 

### Neuron Layer

In [0]:
def neuron_layer(X, n_neurons, name, activation=None):
    # Create a name scope using the name of the layer.
    with tf.name_scope(name):
        n_inputs = int(X.get_shape()[1])
        # Truncated normal distribution helps the algorithm converge 
        # much faster
        stddev = 2 / np.sqrt(n_inputs)
        init = tf.truncated_normal((n_inputs, n_neurons), stddev=stddev)

        # W is 2D tensor containing all the conection weights between
        # each input and each neuron 
        W = tf.Variable(init, name='weights')
        
        # one bias parameter per neuron
        b = tf.Variable(tf.zeros([n_neurons]), name='biases')

        # Create a subgraph to compute z = X*W + b. 
        # This vectorized implementation will efficiently compute the 
        # weighted sums of the inputs in the batch in just one shot.
        z = tf.matmul(X, W) + b
        
        # Return an actiovation function
        if activation == 'relu':
            return tf.nn.relu(z)
        else:
            return z

Now we have a function to create a neuron layer. We can create the __deep neural network__. The first hidden layer takes X as its input. The second takes the output of the first hidden layer as its input. And finally, the output layer takes the output of the second hidden layer as its input.

### Neural Network

In [0]:
# Use a name scope for clarity
with tf.name_scope('dnn'):
    hidden1 = neuron_layer(X, n_hidden1, 'hidden_1', activation='relu')
    hidden2 = neuron_layer(hidden1, n_hidden2, 'hidden_2', activation='relu')
    logits = neuron_layer(hidden2, n_outputs, 'outputs')

TensorFlow's `fully_connected()` function creates a fully conected layer, where all the inputs are connected to all the neurons in the layer. It takes care of creating the weights and biases variables, with the proper initialization strategy, and it uses the ReLU actiovation function by default (we can change this using the `activation_fn` argument).

In [0]:
from tensorflow.contrib.layers import fully_connected

with tf.name_scope('dnn'):
    hidden1 = fully_connected(X, n_hidden1, scope='hidden1')
    hidden2 = fully_connected(hidden1, n_hidden2, scope='hidden2')
    logits = fully_connected(hidden2, n_outputs, scope='outputs', activation_fn=None)

The `tensorflow.contrib` package contains many useful functions, but it is a place for experimental code that has not yet graduated to be part of the main TensorFlow API.

### Loss function

In [0]:
with tf.name_scope('loss'):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    loss = tf.reduce_mean(xentropy, name='loss')

We have the neural network model, we have the cost function, and now we need to define a `GradientDescentOptimizer` that will tweak the model parameters to minimize the cost function.

### Optimizer

In [0]:
learning_rate = 0.01

with tf.name_scope('train'):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)

The last important step in the construction phase is to specify how to evaluate the model. First, for each instance, determine if the neural network's prediction is correct by checking whether or not the highest logit corresponds to the target class. For this we can use the `in_top_k()` function. 

### The Perfomance Measure

In [0]:
with tf.name_scope('eval'):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

In [0]:
# A node initializing all variables
init = tf.global_variables_initializer()

# A Saver to save our trained model parameters to disk
saver = tf.train.Saver()

### Execution Phase

In [10]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('/tmp/data/')

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use urllib or similar directly.
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting /tmp/data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


In [0]:
n_epochs = 400
batch_size = 50

In [12]:
%%time
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(mnist.train.num_examples // batch_size):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
        acc_test = accuracy.eval(feed_dict={X: mnist.test.images,
                                             y: mnist.test.labels})
        print('Epoch ', epoch, 'train accuracy ', acc_train, 'test accuracy ', acc_test)
        
    save_path = saver.save(sess, './my_model_final.ckpt')

Epoch  0 train accuracy  0.88 test accuracy  0.9047
Epoch  1 train accuracy  0.98 test accuracy  0.9228
Epoch  2 train accuracy  0.96 test accuracy  0.929
Epoch  3 train accuracy  0.92 test accuracy  0.9371
Epoch  4 train accuracy  0.92 test accuracy  0.9424
Epoch  5 train accuracy  1.0 test accuracy  0.9474
Epoch  6 train accuracy  0.98 test accuracy  0.9498
Epoch  7 train accuracy  0.94 test accuracy  0.9513
Epoch  8 train accuracy  0.96 test accuracy  0.954
Epoch  9 train accuracy  0.96 test accuracy  0.9571
Epoch  10 train accuracy  0.96 test accuracy  0.9604
Epoch  11 train accuracy  0.98 test accuracy  0.9618
Epoch  12 train accuracy  0.98 test accuracy  0.9624
Epoch  13 train accuracy  1.0 test accuracy  0.9653
Epoch  14 train accuracy  0.94 test accuracy  0.9646
Epoch  15 train accuracy  0.98 test accuracy  0.9651
Epoch  16 train accuracy  0.94 test accuracy  0.9667
Epoch  17 train accuracy  1.0 test accuracy  0.9682
Epoch  18 train accuracy  0.98 test accuracy  0.9688
Epoch  1

Epoch  61 train accuracy  1.0 test accuracy  0.98
Epoch  62 train accuracy  1.0 test accuracy  0.9797
Epoch  63 train accuracy  1.0 test accuracy  0.9798
Epoch  64 train accuracy  1.0 test accuracy  0.9795
Epoch  65 train accuracy  1.0 test accuracy  0.9801
Epoch  66 train accuracy  0.98 test accuracy  0.9796
Epoch  67 train accuracy  1.0 test accuracy  0.9806
Epoch  68 train accuracy  1.0 test accuracy  0.9793
Epoch  69 train accuracy  0.98 test accuracy  0.9804
Epoch  70 train accuracy  1.0 test accuracy  0.9811
Epoch  71 train accuracy  1.0 test accuracy  0.9802
Epoch  72 train accuracy  1.0 test accuracy  0.9799
Epoch  73 train accuracy  1.0 test accuracy  0.9803
Epoch  74 train accuracy  1.0 test accuracy  0.9805
Epoch  75 train accuracy  1.0 test accuracy  0.9803
Epoch  76 train accuracy  1.0 test accuracy  0.9796
Epoch  77 train accuracy  1.0 test accuracy  0.9802
Epoch  78 train accuracy  1.0 test accuracy  0.9802
Epoch  79 train accuracy  1.0 test accuracy  0.9804
Epoch  80 tr

Epoch  122 train accuracy  1.0 test accuracy  0.981
Epoch  123 train accuracy  1.0 test accuracy  0.9807
Epoch  124 train accuracy  1.0 test accuracy  0.9808
Epoch  125 train accuracy  1.0 test accuracy  0.9805
Epoch  126 train accuracy  1.0 test accuracy  0.9808
Epoch  127 train accuracy  1.0 test accuracy  0.9806
Epoch  128 train accuracy  1.0 test accuracy  0.981
Epoch  129 train accuracy  1.0 test accuracy  0.9812
Epoch  130 train accuracy  1.0 test accuracy  0.9813
Epoch  131 train accuracy  1.0 test accuracy  0.9805
Epoch  132 train accuracy  1.0 test accuracy  0.981
Epoch  133 train accuracy  1.0 test accuracy  0.9808
Epoch  134 train accuracy  1.0 test accuracy  0.9811
Epoch  135 train accuracy  1.0 test accuracy  0.9807
Epoch  136 train accuracy  1.0 test accuracy  0.9812
Epoch  137 train accuracy  1.0 test accuracy  0.9806
Epoch  138 train accuracy  1.0 test accuracy  0.981
Epoch  139 train accuracy  1.0 test accuracy  0.9802
Epoch  140 train accuracy  1.0 test accuracy  0.98

Epoch  183 train accuracy  1.0 test accuracy  0.9809
Epoch  184 train accuracy  1.0 test accuracy  0.981
Epoch  185 train accuracy  1.0 test accuracy  0.9811
Epoch  186 train accuracy  1.0 test accuracy  0.981
Epoch  187 train accuracy  1.0 test accuracy  0.981
Epoch  188 train accuracy  1.0 test accuracy  0.9811
Epoch  189 train accuracy  1.0 test accuracy  0.9812
Epoch  190 train accuracy  1.0 test accuracy  0.9812
Epoch  191 train accuracy  1.0 test accuracy  0.9811
Epoch  192 train accuracy  1.0 test accuracy  0.9814
Epoch  193 train accuracy  1.0 test accuracy  0.9811
Epoch  194 train accuracy  1.0 test accuracy  0.9812
Epoch  195 train accuracy  1.0 test accuracy  0.9811
Epoch  196 train accuracy  1.0 test accuracy  0.9813
Epoch  197 train accuracy  1.0 test accuracy  0.9813
Epoch  198 train accuracy  1.0 test accuracy  0.9811
Epoch  199 train accuracy  1.0 test accuracy  0.9816
Epoch  200 train accuracy  1.0 test accuracy  0.981
Epoch  201 train accuracy  1.0 test accuracy  0.98

Epoch  243 train accuracy  1.0 test accuracy  0.9809
Epoch  244 train accuracy  1.0 test accuracy  0.9814
Epoch  245 train accuracy  1.0 test accuracy  0.981
Epoch  246 train accuracy  1.0 test accuracy  0.9812
Epoch  247 train accuracy  1.0 test accuracy  0.9815
Epoch  248 train accuracy  1.0 test accuracy  0.9814
Epoch  249 train accuracy  1.0 test accuracy  0.9813
Epoch  250 train accuracy  1.0 test accuracy  0.9815
Epoch  251 train accuracy  1.0 test accuracy  0.9813
Epoch  252 train accuracy  1.0 test accuracy  0.9815
Epoch  253 train accuracy  1.0 test accuracy  0.9815
Epoch  254 train accuracy  1.0 test accuracy  0.9811
Epoch  255 train accuracy  1.0 test accuracy  0.9814
Epoch  256 train accuracy  1.0 test accuracy  0.9812
Epoch  257 train accuracy  1.0 test accuracy  0.9814
Epoch  258 train accuracy  1.0 test accuracy  0.9814
Epoch  259 train accuracy  1.0 test accuracy  0.9814
Epoch  260 train accuracy  1.0 test accuracy  0.9814
Epoch  261 train accuracy  1.0 test accuracy  0

Epoch  303 train accuracy  1.0 test accuracy  0.9813
Epoch  304 train accuracy  1.0 test accuracy  0.9814
Epoch  305 train accuracy  1.0 test accuracy  0.9812
Epoch  306 train accuracy  1.0 test accuracy  0.9813
Epoch  307 train accuracy  1.0 test accuracy  0.9813
Epoch  308 train accuracy  1.0 test accuracy  0.9814
Epoch  309 train accuracy  1.0 test accuracy  0.9815
Epoch  310 train accuracy  1.0 test accuracy  0.9817
Epoch  311 train accuracy  1.0 test accuracy  0.9813
Epoch  312 train accuracy  1.0 test accuracy  0.9815
Epoch  313 train accuracy  1.0 test accuracy  0.9812
Epoch  314 train accuracy  1.0 test accuracy  0.9813
Epoch  315 train accuracy  1.0 test accuracy  0.9815
Epoch  316 train accuracy  1.0 test accuracy  0.9814
Epoch  317 train accuracy  1.0 test accuracy  0.9813
Epoch  318 train accuracy  1.0 test accuracy  0.9813
Epoch  319 train accuracy  1.0 test accuracy  0.9814
Epoch  320 train accuracy  1.0 test accuracy  0.9813
Epoch  321 train accuracy  1.0 test accuracy  

Epoch  363 train accuracy  1.0 test accuracy  0.9813
Epoch  364 train accuracy  1.0 test accuracy  0.9813
Epoch  365 train accuracy  1.0 test accuracy  0.9812
Epoch  366 train accuracy  1.0 test accuracy  0.9813
Epoch  367 train accuracy  1.0 test accuracy  0.9816
Epoch  368 train accuracy  1.0 test accuracy  0.9814
Epoch  369 train accuracy  1.0 test accuracy  0.9813
Epoch  370 train accuracy  1.0 test accuracy  0.9815
Epoch  371 train accuracy  1.0 test accuracy  0.9815
Epoch  372 train accuracy  1.0 test accuracy  0.9814
Epoch  373 train accuracy  1.0 test accuracy  0.9815
Epoch  374 train accuracy  1.0 test accuracy  0.9815
Epoch  375 train accuracy  1.0 test accuracy  0.9815
Epoch  376 train accuracy  1.0 test accuracy  0.9812
Epoch  377 train accuracy  1.0 test accuracy  0.9813
Epoch  378 train accuracy  1.0 test accuracy  0.9814
Epoch  379 train accuracy  1.0 test accuracy  0.9811
Epoch  380 train accuracy  1.0 test accuracy  0.9814
Epoch  381 train accuracy  1.0 test accuracy  

Testing on Macbook Pro 13 (Early 2015) with 2,7 GHz Intel Core i5:
```
CPU times: user 37min 3s, sys: 2min 46s, total: 39min 49s
Wall time: 17min 57s
```
Testing on DigitalOcean Droplet with 1CPU:
```
CPU times: user 23min 10s, sys: 55.7 s, total: 24min 5s
Wall time: 24min 7s
```

Testing on Google Colab with GPU (?):
```
CPU times: user 14min 47s, sys: 4min 10s, total: 18min 57s
Wall time: 14min 30s
```

Testing on Google Colab without GPU: 
```
CPU times: user 30min 40s, sys: 2min 13s, total: 32min 53s
Wall time: 20min 16s
```

### Using the Neural Network

Now that the neural network is trained, we can use it to make predictions. To do that, we cat reuse the same construction phase, but change the execution phase.

In [13]:
with tf.Session() as sess:
    # Load the model parameters from disk
    saver.restore(sess, './my_model_final.ckpt')
    
    # Load new images
    X_new_scaled = mnist.test.images[:20]
    
    # Evaluate the logits node
    Z = logits.eval(feed_dict={X: X_new_scaled})
    
    # Pick the class that has the highest logit value
    y_pred = np.argmax(Z, axis=1)

INFO:tensorflow:Restoring parameters from ./my_model_final.ckpt


In [14]:
y_pred

array([7, 2, 1, 0, 4, 1, 4, 9, 5, 9, 0, 6, 9, 0, 1, 5, 9, 7, 3, 4])

In [15]:
mnist.test.labels[:20]

array([7, 2, 1, 0, 4, 1, 4, 9, 5, 9, 0, 6, 9, 0, 1, 5, 9, 7, 3, 4],
      dtype=uint8)