# Deep Neural Network for MNIST Classification

We'll apply all the knowledge from the lectures in this section to write a deep neural network. The problem we've chosen is referred to as the "Hello World" of deep learning because for most students it is the first deep learning algorithm they see.

The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook). He is one of the pioneers of what we've been talking about and of more complex approaches that are widely used today, such as covolutional neural networks (CNNs). 

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). 

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes. 

Our goal would be to build a neural network with 2 hidden layers.

## Import the relevant packages

In [1]:
import numpy as np
import tensorflow as tf

# TensorFLow includes a data provider for MNIST
import tensorflow_datasets as tfds

## Data

That's where we load and preprocess our data.

In [2]:
# Datasets will be stored in /Users/tyrone/tensorflow_datasets (downloads first time, then fetches locally after)

# Load data
# with_info=True returns tuple containing information about the version, features, number of samples
# as_supervised=True will load the dataset in a 2-tuple structure (input, target) 
# alternatively, as_supervised=False, would return a dictionary
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

[1mDownloading and preparing dataset 11.06 MiB (download: 11.06 MiB, generated: 21.00 MiB, total: 32.06 MiB) to /Users/tyrone/tensorflow_datasets/mnist/3.0.1...[0m


Dl Completed...:   0%|          | 0/4 [00:00<?, ? file/s]

[1mDataset mnist downloaded and prepared to /Users/tyrone/tensorflow_datasets/mnist/3.0.1. Subsequent calls will reuse this data.[0m


In [3]:
# Extract train / test (of type tensorflow.python.data.ops.dataset_ops.PrefetchDataset)
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']

In [17]:
# Defining the number of validation samples as a % of the train samples
# this is also where we make use of mnist_info (we don't have to count the observations)
num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples

# let's cast this number to an integer, as a float may cause an error along the way
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

In [21]:
# Store the number of test samples in a dedicated variable (instead of using the mnist_info one)
num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)

In [26]:
# We scale data to [0, 1] for numerical stability
# since the possible values for the inputs are 0 to 255 (256 different shades of grey) we divide each element by 255
    
def scale(image, label):
    # we make sure the value is a float
    image = tf.cast(image, tf.float32)
    image /= 255.

    return image, label

In [27]:
# the method .map() allows us to apply a custom transformation to a given dataset
# we have already decided that we will get the validation data from mnist_train, so 
scaled_train_and_validation_data = mnist_train.map(scale)

Tensor("args_0:0", shape=(28, 28, 1), dtype=uint8)


In [29]:
# Scale and batch the test data in the same way
# there is no need to shuffle it, because we won't be training on the test data
# there would be a single batch, equal to the size of the test data
test_data = mnist_test.map(scale)

Tensor("args_0:0", shape=(28, 28, 1), dtype=uint8)


In [30]:
# Shuffle the Train data

BUFFER_SIZE = 10000
# this BUFFER_SIZE parameter is here for cases when we're dealing with enormous datasets
# then we can't shuffle the whole dataset in one go because we can't fit it all in memory
# so instead TF only stores BUFFER_SIZE samples in memory at a time and shuffles them
# if BUFFER_SIZE=1 => no shuffling will actually happen
# if BUFFER_SIZE >= num samples => shuffling is uniform
# BUFFER_SIZE in between - a computational optimization to approximate uniform shuffling

# Shuffle method uses buffer_size to shuffle
shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

In [32]:
# Extract the train and validation data using .take() and .skip() method to take that many samples
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

In [35]:
# We create a batch with a batch size (hyperparameter that optimises >1 and < num_train_samples)
# This is very helpful when we train, as we would be able to iterate over the different batches
BATCH_SIZE = 100

train_data = train_data.batch(BATCH_SIZE)

# We don't batch validation data (as it doesn't backpropagate and we want exact evaluation, so set it total sample)
validation_data = validation_data.batch(num_validation_samples)

# Likewise we don't batch the test data, although data format of .batch() still required for model
test_data = test_data.batch(num_test_samples)


# takes next batch (it is the only batch)
# because as_supervized=True, we've got a 2-tuple structure
validation_inputs, validation_targets = next(iter(validation_data))

## Model

### Outline the model
When thinking about a deep learning algorithm, we mostly imagine building the model. So, let's do it :)

In [46]:
input_size = 784
output_size = 10
# Use same hidden layer size for both hidden layers. Not a necessity.
hidden_layer_size = 50
activation_function = 'relu'
    
# define how the model will look like
model = tf.keras.Sequential([
    
    # the first layer (the input layer)
    # each observation is 28x28x1 pixels, therefore it is a tensor of rank 3 and we flatten it into a (784, ) vector
    # this allows us to actually create a feed forward neural network
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), # input layer
    
    # tf.keras.layers.Dense is implementing: output = activation(dot(input, weight) + bias)
    # it takes several arguments, but the most important ones for us are the hidden_layer_size and the activation function
    tf.keras.layers.Dense(hidden_layer_size, activation=activation_function), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation=activation_function), # 2nd hidden layer
    
    # the final layer is no different, we just make sure to activate it with softmax
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])

### Choose the optimizer and the loss function

In [47]:
# we define the optimizer we'd like to use, the loss function, and the metrics we are interested in obtaining at each iteration
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Training
That's where we train the model we have built.

In [48]:
# determine the maximum number of epochs
NUM_EPOCHS = 5

# we fit the model, specifying the training data, the total number of epochs
# and the validation data we just created ourselves in the format: (inputs,targets)
model.fit(train_data, epochs=NUM_EPOCHS, validation_data=(validation_inputs, validation_targets), verbose =2)

Epoch 1/5
540/540 - 3s - loss: 0.4115 - accuracy: 0.8824 - val_loss: 0.2199 - val_accuracy: 0.9360 - 3s/epoch - 5ms/step
Epoch 2/5
540/540 - 1s - loss: 0.1882 - accuracy: 0.9442 - val_loss: 0.1633 - val_accuracy: 0.9505 - 1s/epoch - 2ms/step
Epoch 3/5
540/540 - 1s - loss: 0.1408 - accuracy: 0.9578 - val_loss: 0.1324 - val_accuracy: 0.9610 - 1s/epoch - 2ms/step
Epoch 4/5
540/540 - 1s - loss: 0.1127 - accuracy: 0.9654 - val_loss: 0.1109 - val_accuracy: 0.9677 - 1s/epoch - 2ms/step
Epoch 5/5
540/540 - 1s - loss: 0.0959 - accuracy: 0.9712 - val_loss: 0.1042 - val_accuracy: 0.9665 - 1s/epoch - 2ms/step


<keras.callbacks.History at 0x7fc49ac54e20>

## Test the model

As we discussed in the lectures, after training on the training data and validating on the validation data, we test the final prediction power of our model by running it on the test dataset that the algorithm has NEVER seen before.

It is very important to realize that fiddling with the hyperparameters overfits the validation dataset. 

The test is the absolute final instance. You should not test before you are completely done with adjusting your model.

If you adjust your model after testing, you will start overfitting the test dataset, which will defeat its purpose.

In [None]:
test_loss, test_accuracy = model.evaluate(test_data)

In [None]:
# We can apply some nice formatting if we want to
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))