# Deep Neural Network for MNIST Classification

The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook).

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). 

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes. 

Our goal would be to build a neural network with 2 hidden layers.

## Import the relevant packages

In [8]:
import numpy as np
import tensorflow as tf

# TensorFLow includes a data provider for MNIST that we'll use.
# It comes with the tensorflow-datasets module, therefore, if you haven't please install the package using
# pip install tensorflow-datasets 
# or
# conda install tensorflow-datasets

import tensorflow_datasets as tfds

# these datasets will be stored in C:\Users\*USERNAME*\tensorflow_datasets\...
# the first time you download a dataset, it is stored in the respective folder 
# every other time, it is automatically loading the copy on your computer 

## Data

That's where we load and preprocess our data.

In [9]:
# these datasets will be stored in C:\Users\*USERNAME*\tensorflow_datasets\...
# the first time you download a dataset, it is stored in the respective folder 
# every other time, it is automatically loading the copy on your computer 

# tfds.load actually loads a dataset (or downloads and then loads if that's the first time you use it) 
# in our case, we are interesteed in the MNIST; the name of the dataset is the only mandatory argument
# there are other arguments we can specify, which we can find useful
# mnist_dataset = tfds.load(name='mnist', as_supervised=True)
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)
# with_info=True will also provide us with a tuple containing information about the version, features, number of samples
# we will use this information a bit below and we will store it in mnist_info

# as_supervised=True will load the dataset in a 2-tuple structure (input, target) 
# alternatively, as_supervised=False, would return a dictionary
# obviously we prefer to have our inputs and targets separated 



In [10]:
## Prepare train, validation and test data

mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']

num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

num_test_samples = mnist_info.splits['train'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)

# Scaling the data between 0 to 1
def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255.
    return image, label

scaled_train_and_validation_data = mnist_train.map(scale)
test_data = mnist_test.map(scale)

# Shuffle and batch the data
BUFFER_SIZE = 10000
# this BUFFER_SIZE parameter is here for cases when we're dealing with enormous datasets
# then we can't shuffle the whole dataset in one go because we can't fit it all in memory
# so instead TF only stores BUFFER_SIZE samples in memory at a time and shuffles them
# if BUFFER_SIZE=1 => no shuffling will actually happen
# if BUFFER_SIZE >= num samples => shuffling is uniform
shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

# our validation data would be equal to 10% of the training set, which we've already calculated
# we use the .take() method to take that many samples
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
# similarly, the train_data is everything else, so we skip as many samples as there are in the validation dataset
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

BATCH_SIZE = 100
# we can also take advantage of the occasion to batch the train data
# this would be very helpful when we train, as we would be able to iterate over the different batches
train_data = train_data.batch(BATCH_SIZE)
validation_data = validation_data.batch(num_validation_samples)
test_data = test_data.batch(num_test_samples)

# takes next batch (it is the only batch)
# because as_supervized=True, we've got a 2-tuple structure
validation_inputs, validation_targets = next(iter(validation_data))

## Model

In [20]:
input_size = 784
output_size = 10
hidden_layer_size = 75

model = tf.keras.Sequential([
                # the first layer (the input layer)
                # each observation is 28x28x1 pixels, therefore it is a tensor of rank 3
                # since we don't know CNNs yet, we don't know how to feed such input into our net, so we must flatten the images
                # there is a convenient method 'Flatten' that simply takes our 28x28x1 tensor and orders it into a (None,) 
                # or (28x28x1,) = (784,) vector
                # this allows us to actually create a feed forward neural network
                tf.keras.layers.Flatten(input_shape=(28,28,1)),
                
                tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
                tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
                tf.keras.layers.Dense(output_size, activation='softmax')
        ])

## Optimizer and Loss function

In [21]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

## Training

In [22]:
NUM_EPOCHS = 5
# we fit the model, specifying the
# training data
# the total number of epochs
# and the validation data we just created ourselves in the format: (inputs,targets)
model.fit(train_data, epochs=NUM_EPOCHS, validation_data=(validation_inputs, validation_targets),validation_steps=1, verbose=2)

Epoch 1/5
540/540 - 11s - loss: 0.3576 - accuracy: 0.8978 - val_loss: 0.1842 - val_accuracy: 0.9483
Epoch 2/5
540/540 - 10s - loss: 0.1468 - accuracy: 0.9570 - val_loss: 0.1299 - val_accuracy: 0.9660
Epoch 3/5
540/540 - 10s - loss: 0.1066 - accuracy: 0.9683 - val_loss: 0.1103 - val_accuracy: 0.9708
Epoch 4/5
540/540 - 10s - loss: 0.0833 - accuracy: 0.9748 - val_loss: 0.0950 - val_accuracy: 0.9727
Epoch 5/5
540/540 - 10s - loss: 0.0688 - accuracy: 0.9794 - val_loss: 0.0800 - val_accuracy: 0.9787


<tensorflow.python.keras.callbacks.History at 0x20e58fa2a48>

## Test the Model

In [23]:
test_loss, test_accuracy = model.evaluate(test_data)

      1/Unknown - 2s 2s/step - loss: 0.0930 - accuracy: 0.9709

In [24]:
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.09. Test accuracy: 97.09%
