## Using a Neural Network to solve the MNIST problem

The problem we’re trying to solve here is to classify grayscale images of handwritten digits (28 × 28 pixels) into their 10 categories (0 through 9). We’ll use the MNIST
dataset, a classic in the machine-learning community, which has been around almost
as long as the field itself and has been intensively studied. It’s a set of 60,000 training
images, plus 10,000 test images, assembled by the National Institute of Standards and
Technology (the NIST in MNIST)

"Solving" MNIST can be thought of as the "Hello World" of Deep Learning.



### Before we jump in:

    Understanding the OOP paradigm

Object Orientated Programming is a programming paradigm that came after procedural/scripting and funcitonal. OOP turns logic into objects that can be easily reused in quick and reader friednly ways. 

It is important to be familiar with principles of OOP if you want to use ML for Python as the models you use (e.g. Logistic Regression, KNN, Neural Networks) will be **Classes** and their associated behaviours (".fit" to train the model on training data, ".add" to add your model parameters as kwargs and args) will be **Methods** of that class. 

A very TLDR, use the link below or Google to find out more:
- Objects are instances of a Class
- Classes are a template for some Object you are programming (a Neural Network, in our case)
- If Classes are the template for some object, consider methods the possible behaviours of that Class
    
https://www.youtube.com/watch?v=B5O52_w-4Zg&feature=emb_logo


    TLDR for Neural Networks

A neural network is a learning object that consits of layers of tensors which all take in and output data. Inbetween the input and output, the tensor in question makes a simple linear transformation to the data structure (a Numpy array, explain below). It can help to think of *layer* as a data preprocessing module, which acts to filter the data that comes through it and to output it in a slighly more useful form. You use kwargs and args to tune the model, training it to perform equally as well on testing and training data. All the while minimising overfitting, the cornerstone problem in machine learning.

    Purpose of a Neural Network

Chain together simple linear transformations on batches of random samples of data, desgined to optimise the model towards the metric representing success to the probem at hand.

    
    How we convert our data into machine learning friendly currency

We encode our data as Numpy arrays. **Numpy arrays are matricies**. Here is how we encode different types of data:

 - *Vector data:* 2D tensor of shape (samples, features)
 - *Timeseries* or *sequence data:* 3D tensor of shape (samples, timesteps, features)
 - *Images:* 4D tensor of shape (samples, height, width, channels)
 - *Video:* 5D tensor of shape (sample, frames, height, width, channels) 

This is way funner than Year 12..



###     
    
    Anatomy of a Neural Network
    
- Layers (which form the network)
- Input data (samples) and associated targets (labels)
- The loss function, which defines the feedback signal uesd for learning
- The optimiser, which orchestrates how the learning proceeds
- The designated metric to optimise towards, often by reducing errors (e.g. MAE, MSE)

Together, these components form the model. I examine each area in more depth in my notes.
    
    Compiling a Neural Network

- A loss function (https://keras.io/api/losses/): how the network identifies errors (e.g. MSE, MAE) during training and steers itself in the right direction


- An optimiser (https://keras.io/api/optimizers/): the mechaism through which the network updates itself based on the data it sees and its loss function


- The metric to monitor success during training and testing (https://keras.io/api/metrics/): for this demonstration I will only look at accuracy; the fraction of images correctly classified

I think of it as one big feedback loop. I examine each area in more depth in my notes.
   
    
    A ML Workflow



    The mathematical mechanisms behind a Neural Network learning
**The gears of neural networks: tensor operations**

- Element-wise operations
- Broadcasting
- Tensor dot
- Tensor reshaping
- Geometric interpretation of tensor operations
- Geometric interpretation of deep learning

**The engine of neural networks: gradient-based optimisation**
 
- What is a derivative?
- Derivaties of tensor operations: the gradient
- Stochatic gradient descent
- Chaining derivatives: the Backpropagation algorithm

The nit and grit that powers deep learning. I examine each area in more depth in my notes and profess to be challenged by the content. 

### Jumping in: Loading the dataset

In [3]:
# Load the MNIST dataset in Keras
from keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

*train_images* and *train_labels* for the training set which is the data the model will learn from. We will then train the model on the test data: *test_images* and *test_labels*.

The images are encoded into Numpy arrays and the labels are an array of digits, from 0-9. << For more information, refer to attached Notion notes. 

TLDR: Data representaitons for neural networks.


Images and labels have a one-to-one correspondence. 

### The network architecture

In [4]:
# The network architecture

from keras import models
from keras import layers
network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

### The compilation step

In [5]:
# The compilation step

network.compile(
    loss='categorical_crossentropy',
    optimizer='rmsprop',
    metrics=['accuracy']
    )

### Preparing the image data (data "pre-processing")

Before we train the network, we need to preprocess the data by reshaping it into the shape the network is expecting. We then need to scale the data so all values are in the [0, 1] interval. We know that previously our training images were stored in an array with dimensions (60000, 28, 28), as there were 60000 images of 28 x 28 pixels of type *unit8* with values in the [0, 255] interval. 

Below we tranform our data into a *float32* array of shape (60000, 28 * 28) with values between 0 and 1. 

In [6]:
# Preparing the image data

train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

### Preparing the labels (also data "pre-processing")

We also need to categorically encode the labels, a step that’s explained in chapter 3.

In [7]:
# Preparing the labels

from keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

### Training & Testing the Neural Network!!

We train our model by calling the network's *fit method*. As with all macine learning I have encountered (scikitlearn and now Keras), it's important to understand Object Orientated Programming as we are using Objects (Classes and Methods) when we do ML with these libraries. 
 
    Objects are instances of a Class
    Classes are a template for some Object you are programming (a Neural Network, in our case)
    If Classes are the template for some object, consider methods the possible behaviours of that Class
     
    When we train our model by calling the "fit" method, we are calling the train behaviour of the tf.keras.Model class. Here we have the tf.keras.Model class stored as the variable "network". 

Give Google a crack if you want to learn more about OOP.

Read here for more about the Keras Model class: https://www.tensorflow.org/api_docs/python/tf/keras/Model 

In [8]:
# Training the network

network.fit(train_images, train_labels, epochs=10, batch_size=128)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x2493ca51b80>

**Some terminology**: 

An *epoch* is a hyperparameter (argument) that describes the numer of times the neural network will see the entire datset. I have ours set to 10, so the neural network will run through the datset 10 times and thus have 10 opportunities to adjust the layer weighting (parametres) and -->> **learn** <<--

The last epoch achieved an accuracy of 0.997, meaning it classified the train images correctly over 99% of the time. 

In [9]:
# Testing the network
 
test_loss, test_acc = network.evaluate(test_images, test_labels)
print('test_acc:', round(test_acc*100, 2), '%')

test_acc: 98.33 %


The test-set accuracy comes out at 98.32%, meaning our network/model correctly classified the test images to the labels 0-9 98.13% of the time. However, there is a significant gap between our training and testing accuracy.

The above phenomenon is an example of *overfitting*, whcih describes the scenario where which a machine learning model tends to perform worse on new data (test data) compared to the data it trained on. 