## Using a Neural Network to solve the MNIST problem

The problem we’re trying to solve here is to classify grayscale images of handwritten digits (28 × 28 pixels) into their 10 categories (0 through 9). We’ll use the MNIST
dataset, a classic in the machine-learning community, which has been around almost
as long as the field itself and has been intensively studied. It’s a set of 60,000 training
images, plus 10,000 test images, assembled by the National Institute of Standards and
Technology (the NIST in MNIST)

"Solving" MNIST can be thought of as the "Hello World" of Deep Learning.



In [2]:
# Load the MNIST dataset in Keras
from keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

*train_images* and *train_labels* for the training set which is the data the model will learn from. We will then train the model on the test data: *test_images* and *test_labels*.

The images are encoded into Numpy arrays and the labels are an array of digits, from 0-9. Images and labels have a one-to-one correspondence. 

### The network architecture

A *layer* is a core building block of neural networks. Think of a *layer* as a data-processing module, which acts as a filter for the data it takes in. And our aim is for data to come out of a layer (or a node, of a layer) in a more useful form than it went in as. More speciafically, layers extract representations out of the data that is fed into them, and the aim is for the representation to become more meaningful to the problem at hand. 

And that is the crux of neural networks. We pass data through layers, where each layer performs a simple matricie transformation, the the output being a process of chained together simple matricie tarnsformation.

Talk about Sequential, Dense, activiation (relu), how to get shape, why we call it again (last layer), softmax which is a sum of probability for 10 outputs to 1

In [3]:
# The network architecture

from keras import models
from keras import layers
network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

### The compilation step

For the network to be ready for training, we must set three more important parametres as a part of the *compilation* step:

- A loss function (https://keras.io/api/losses/): how the network identifies errors (e.g. MSE, MAE) during training and steers itself in the right direction


- An optimiser (https://keras.io/api/optimizers/): the mechaism through which the network updates itself based on the data it sees and its loss function


- The metric to monitor success during training and testing (https://keras.io/api/metrics/): for this demonstration I will only look at accuracy; the fraction of images correctly classified

In [4]:
# The compilation step

network.compile(
    loss='categorical_crossentropy',
    optimizer='rmsprop',
    metrics=['accuracy']
    )

### Preparing the image data (pre-processing)

Before we train the network, we need to preprocess the data by reshaping it into the shape the network is expecting. We then need to scale the data so all values are in the [0, 1] interval. We know that previously our training images were stored in an array with dimensions (60000, 28, 28), as there were 60000 images of 28 x 28 pixels of type *unit8* with values in the [0, 255] interval. 

Below we tranform our data into a *float32* array of shape (60000, 28 * 28) with values between 0 and 1. 

In [5]:
# Preparing the image data

train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

### Preparing the labels (pre-processing)

We also need to categorically encode the labels, a step that’s explained in chapter 3.

In [6]:
# Preparing the labels

from keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

### Training & Testing the Neural Network!!

We train our model by calling the network's fit method (you need to study a bit of OOP to understand what's happening here). After calling this method, our model has been fit to the training data.  

Some terminology: an *epoch* is a term used to describe a run-through. I have set our *epoch* kwarg to 10, meaning we do 10 run throughs.  

In [7]:
# Training the network

network.fit(train_images, train_labels, epochs=10, batch_size=128)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x2560aa37640>

We can observe the neural network in action across each epoch. The two values dispalyed, *loss* and *accuracy*, are two of the three components of the compilation step described above. We don't have a figure for the optimiser, as that is the mechanism that continually updates the moel based on the *loss* and *accuracy* metrics.

We can see that our model reaches a final accuracy of 99.73% on the last epoch/run-through of the training data. Now are are ready to test our data on the testing data.

In [8]:
# Testing the network
 
test_loss, test_acc = network.evaluate(test_images, test_labels)
print('test_acc:', round(test_acc*100, 2), '%')

test_acc: 98.32 %


The test-set accuracy comes out at 98.32%, meaning our network/model correctly classified the test images to the labels 0-9 98.13% of the time. However, there is a significant gap between our training and testing accuracy.

The above phenomenon is an example of *overfitting*, whcih describes the scenario where which a machine learning model tends to perform worse on new data (test data) compared to the data it trained on. 