# Deep Neural Network for MNIST Classification

We'll apply all the knowledge from the lectures in this section to write a deep neural network. The problem we've chosen is referred to as the "Hello World" of deep learning because for most students it is the first deep learning algorithm they see.

The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook). He is one of the pioneers of what we've been talking about and of more complex approaches that are widely used today, such as covolutional neural networks (CNNs). 

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). 

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes. 

Our goal would be to build a neural network with 2 hidden layers.

## Import the relevant packages

Tensforflow includes a data provider for MNIST.

Ensure that the current environment has tensorflow-datasets module installed. 

For every usage on the dataset, it is stored in the respective folder (C:\Users...\tensorflow_datasets)

In [1]:
import numpy as np
import tensorflow as tf

import tensorflow_datasets as tfds

## Data Pre Processing

The tensorflow dataset is stored in the respective folder (C:\Users...\tensorflow_datasets).

To get the MNIST dataset, use tfds.load

In [192]:
# tfds.load actually loads a dataset (or downloads and then loads if that's the first time you use it) 
# apply MNIST as the argument for the name
# mnist_dataset = tfds.load(name='mnist', as_supervised=True)

# with_info=True will provide a tuple containing information about the version, features, number of samples
# use this information below and store in this


# as_supervised=True will load the dataset in a 2-tuple structure (input, target)
# this will remain as such for the whole dataset that we refer to (in both train and test data)
# alternatively, as_supervised=False, would return a dictionary
# perfer to have inputs and targets separated

mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

In [193]:
# Here, it is observed the shape of the tensor compromises 28 x 28 matrix, with 2-D dimension
# The datatype is int.64 (or integer)
print(mnist_dataset)

{'train': <PrefetchDataset element_spec=(TensorSpec(shape=(28, 28, 1), dtype=tf.uint8, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None))>, 'test': <PrefetchDataset element_spec=(TensorSpec(shape=(28, 28, 1), dtype=tf.uint8, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None))>}


In [194]:
# Calling this will show the info relating to the dataset
# Do note that it has also two splits within the dataset ('test', 'train') and the corresponding number of samples attached to it

mnist_info

tfds.core.DatasetInfo(
    name='mnist',
    full_name='mnist/3.0.1',
    description="""
    The MNIST database of handwritten digits.
    """,
    homepage='http://yann.lecun.com/exdb/mnist/',
    data_path='C:\\Users\\QS\\tensorflow_datasets\\mnist\\3.0.1',
    file_format=tfrecord,
    download_size=11.06 MiB,
    dataset_size=21.00 MiB,
    features=FeaturesDict({
        'image': Image(shape=(28, 28, 1), dtype=uint8),
        'label': ClassLabel(shape=(), dtype=int64, num_classes=10),
    }),
    supervised_keys=('image', 'label'),
    disable_shuffling=False,
    splits={
        'test': <SplitInfo num_examples=10000, num_shards=1>,
        'train': <SplitInfo num_examples=60000, num_shards=1>,
    },
    citation="""@article{lecun2010mnist,
      title={MNIST handwritten digit database},
      author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
      journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
      volume={2},
      year={2010}
    }""",
)

In [195]:
# once loaded the dataset, extract the training and testing dataset with the built references
# by default, TF has training and testing datasets, but no validation sets

mnist_train = mnist_dataset['train']
mnist_test = mnist_dataset['test']

# to create validation set, extract samples from training dataset since it contains a much larger sample (60000)
# defining the number of validation samples as a % of the train samples
# make use of mnist info to get the splits instead of counting the observations by get into the dictionary list under splits
# the num_examples represents the total number of images that 'train' or 'test' contains

num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples

# make sure to cast the validations samples to an integer to avoid float error

num_validation_samples = tf.cast(num_validation_samples, tf.int64)

# defining training_samples is not necessary at this point as it will further define once pre process the data further (scale and shuffle)


# similarly, assign test variable instead of using the mnisft_info variable

num_test_samples = mnist_info.splits['test'].num_examples

# make sure to cast the testing samples into an integer, like above

num_test_samples = tf.cast(num_test_samples, tf.int64)

In [196]:
# to confirm that validation sample is a% of train dataset (60000 -> 6000)
# Notice that the tensor rank is 0, meaning it only contains a scalar which in this is is 6,000
print('',num_validation_samples) 

# assingment on testing sample
# same as above, where the tensor rank is 0
print('\n', num_test_samples)

# remember that we need to take this element having scalar 6000 and 10000 to their individual dataset (train or test)

 tf.Tensor(6000, shape=(), dtype=int64)

 tf.Tensor(10000, shape=(), dtype=int64)


In [197]:
# scale the data in a way that can be interpreted numerically
# at the moment, the data is stored from  0 to 255 (0 represents purely black and 255 purely white)
# we know that we want our response to either be 0 to 1 (hence the classification solution)

def scale(image, label):
    # cast the mnist_info.image from mnist_info
    image = tf.cast(image, tf.float32)
    # since the possible values for the inputs are 0 to 255 (256 different shades of grey)
    # if we divide each element by 255, we would get the desired result -> all elements will be between 0 and 1
    # the dot after 255 signifies that the return shall be float instead of numeric
    image /= 255.

    return image, label

In [198]:
# the method .map() allows a custom transformation to be applied to a given dataset
# scale both validation and training data from mnist_train
scaled_train_and_validation_data = mnist_train.map(scale)

# next, scale and batch the test data
# scale it so it has the same magnitude as the train and validation
# there would be a single batch, equal to the size of the test data
test_data = mnist_test.map(scale)

In [199]:
# shuffling
# once we have scaled our data, we need to shuffle the data to obtain a less biased estimatation of a true gradient
# we need to set our buffer size to be 1<n<num_of_samples (70,000)

BUFFER_SIZE = 10000 # hyper-parameter

shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

# next, we need to take all of validation sets that was created from the training samples and apply the shuffled to form a new dataset for validation
# now this validation data would have contained shuffled dataset and scaled dataset
# it is known taht validation_data should contain
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)

# repeat the same for training data, where this time we shall skip as this means that it skips the first 6,000 images that was reserved for validation set
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)



In [200]:
# notice that earlier, num_validation_samples contains only scalar
print('',num_validation_samples)

# applying the suffled and scaled data to take the samples from num_validation_samples will give 2 tensor elements, of which contains the 6,000 images dataset
print('\n',validation_data)

# likewise, it will also treat the same to train_data
print('\n',train_data)

 tf.Tensor(6000, shape=(), dtype=int64)

 <TakeDataset element_spec=(TensorSpec(shape=(28, 28, 1), dtype=tf.float32, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None))>

 <SkipDataset element_spec=(TensorSpec(shape=(28, 28, 1), dtype=tf.float32, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None))>


In [201]:
# batching
# one of the best method for batching is to use mini batches gradient descent
# we need to set our batch size to be 1<n<num_of_samples (70,000)

BATCH_SIZE = 200

# batch train data
train_data = train_data.batch(BATCH_SIZE)

# as for validation, it is generally known that there would not be backpropagating (no update rule, hence no optimization)
# therefore there will be no batch for validation
# however, the model expects the validation data in the set of batch form too. In this case, the batch number = num_of_samples -> single batch
validation_data = validation_data.batch(num_validation_samples)

# as for test data...
test_data = test_data.batch(num_test_samples)

In [202]:
# printing the batch dataset will give "BatchDataset" to show that it has batches
print('',train_data)

# same goes to validation data, but only contains a single batch
print('\n',validation_data)

# test data will show it has batches
print('\n',test_data)

 <BatchDataset element_spec=(TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int64, name=None))>

 <BatchDataset element_spec=(TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int64, name=None))>

 <BatchDataset element_spec=(TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int64, name=None))>


In [203]:
# takes next batch (it is the only batch) and maintain that the validation_data is iterable
# because as_supervized=True, we've got a 2-tuple structure

validation_inputs, validation_targets = next(iter(validation_data))

# iter(validation_data) makes the 'validation_data' object iterable. 
# This means it could be used like a loop. Imagine it has the values 1,12,-4,9 and that 1 is 'loaded'.
# Using next(), it will load the next batch. Then the value of the object would be 12 as 1 has already been loaded

## Model

### Outline the model

When thinking about a deep learning algorithm, think about building the model

In [204]:
input_size = 784 # input size
output_size = 10 # output size
hidden_layer_size = 200 # hidden layer size
    
# define how the model will look like
model = tf.keras.Sequential([
    
    # the first layer (the input layer)
    # each observation is 28x28x1 pixels, therefore it is a tensor of rank 3
    # CNNs (convulotional neural network) is unknown and no feed such input into our net, therefore need to flatten 28x28x1 images
    # there is a convenient method 'Flatten' that simply takes our 28x28x1 tensor and orders it into a (None,) 
    # or (28x28x1,) = (784,) vector
    # this allows to create a feed forward neural network
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), # input layer
    
    # tf.keras.layers.Dense is basically implementing: output = activation(dot(input, weight) + bias)
    # it takes several arguments, but the most important ones for us are the hidden_layer_size and the activation function
    # for each hidden layer, need to specify this as a new argument under keras.Sequential
    tf.keras.layers.Dense(hidden_layer_size, activation='tanh'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='tanh'), # 2nd hidden layer
    
    # the final layer is no different, we just make sure to activate it with softmax
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])

### Choose the optimizer and the loss function

In [205]:
# as for the learning rate, set new variable (optional), if not
n = 0.001

# create the Adam optimizer with the desired learning rate
# the loss function is sparse categorical crossentropy
# and the metrics for each iteration

adam_optimizer = tf.keras.optimizers.Adam(learning_rate = n)

model.compile(optimizer=adam_optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Training

Training the model...

In [206]:
# determine the maximum number of epochs
NUM_EPOCHS = 10

# fit the model specifying the
# training data
# the total number of epochs
# and the validation data in the format: (inputs,targets)
model.fit(train_data, epochs=NUM_EPOCHS, validation_data=(validation_inputs, validation_targets), verbose =2)

Epoch 1/10
270/270 - 2s - loss: 0.3480 - accuracy: 0.8987 - val_loss: 0.2086 - val_accuracy: 0.9413 - 2s/epoch - 8ms/step
Epoch 2/10
270/270 - 1s - loss: 0.1704 - accuracy: 0.9501 - val_loss: 0.1500 - val_accuracy: 0.9585 - 1s/epoch - 4ms/step
Epoch 3/10
270/270 - 1s - loss: 0.1207 - accuracy: 0.9644 - val_loss: 0.1110 - val_accuracy: 0.9700 - 1s/epoch - 4ms/step
Epoch 4/10
270/270 - 1s - loss: 0.0901 - accuracy: 0.9734 - val_loss: 0.0910 - val_accuracy: 0.9722 - 1s/epoch - 4ms/step
Epoch 5/10
270/270 - 1s - loss: 0.0704 - accuracy: 0.9788 - val_loss: 0.0734 - val_accuracy: 0.9782 - 1s/epoch - 4ms/step
Epoch 6/10
270/270 - 1s - loss: 0.0564 - accuracy: 0.9832 - val_loss: 0.0642 - val_accuracy: 0.9830 - 1s/epoch - 4ms/step
Epoch 7/10
270/270 - 1s - loss: 0.0461 - accuracy: 0.9863 - val_loss: 0.0582 - val_accuracy: 0.9827 - 1s/epoch - 4ms/step
Epoch 8/10
270/270 - 1s - loss: 0.0358 - accuracy: 0.9893 - val_loss: 0.0499 - val_accuracy: 0.9862 - 1s/epoch - 4ms/step
Epoch 9/10
270/270 - 1s 

<keras.callbacks.History at 0x2879fbeab30>

## Test the model

After training on the training data and validating on the validation data, test the final prediction power of model by running it on the test dataset that the algorithm has NEVER seen before.

It is very important to realize that fiddling with the hyperparameters overfits the validation dataset.

The test is the absolute final instance.

In [207]:
test_loss, test_accuracy = model.evaluate(test_data)



In [208]:
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.07. Test accuracy: 97.96%


## Real world application

In [210]:
from tensorflow.keras.preprocessing import image 
image_path= "8.png"
im = image.load_img(image_path, target_size=(28, 28), color_mode = "grayscale")
img = image.img_to_array(im)
img = tf.expand_dims(img, axis=0)      
img /= 255
model.predict_classes(img)[0]

FileNotFoundError: [Errno 2] No such file or directory: '8.png'