Convolutional Neural Networks
=========

Convolutional neural networks (CNNs) are a class of deep neural networks, usually used in computer vision applications.

Convolutional refers the network pre-processing data for you, which traditionally was programmed by data scientists. But this type of neural network can learn how to do a lot of the pre-processing by *itself* - applying filters for things such as edge detection.

Here we will do a common CNN programming exercise - recognising handwritten digits using the MNIST digit dataset.

Step 1
------

Let's start by loading our libraries, dataset and setting up our test, train, and validation sets.

In [None]:
# Run this!
import warnings
warnings.filterwarnings("ignore")
import tensorflow as tf
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Conv2D, Dense, Dropout, Flatten, MaxPooling2D
# Sets up the graphing configuration
import matplotlib.pyplot as graph
%matplotlib inline
graph.rcParams['figure.figsize'] = (15,5)
graph.rcParams["font.family"] = 'DejaVu Sans'
graph.rcParams["font.size"] = '12'
graph.rcParams['image.cmap'] = 'rainbow'

In [None]:
from keras.datasets import mnist

# This is our training data, with 6400 samples.
###--- REPLACE THE ???s BELOW WITH initial_train_X AND THEN initial_train_Y ---###
??? = mnist.load_data()[0][0][:6400].astype('float32')
??? = mnist.load_data()[0][1][:6400]
###

# This is our test data, with 2000 samples.
###--- REPLACE THE ???s BELOW WITH initial_test_X AND THEN initial_test_Y ---###
??? = mnist.load_data()[1][0][-2000:].astype('float32')
??? = mnist.load_data()[1][1][-2000:]
###

# This is our validation data, with 1600 samples.
###--- REPLACE THE ???s BELOW WITH initial_valid_X AND THEN initial_valid_Y ---###
??? = mnist.load_data()[1][0][:1600].astype('float32')
??? = mnist.load_data()[1][1][:1600]
###

print('initial_train_X:', initial_train_X.shape, end = ' ')
print('initial_train_Y:', initial_train_Y.shape)
print('initial_test_X:', initial_test_X.shape, end = ' ')
print('initial_test_Y:', initial_test_Y.shape)
print('initial_valid_X:', initial_valid_X.shape, end = ' ')
print('initial_valid_Y:', initial_valid_Y.shape)

Expected output:  
```initial_train_X: (6400, 28, 28) initial_train_Y: (6400,)
initial_test_X: (2000, 28, 28) initial_test_Y: (2000,)
initial_valid_X: (1600, 28, 28) initial_valid_Y: (1600,)```  

__So we have:__
* 6400 training samples
* 1600 validation samples
* 2000 test samples

Step 2
------

Let's take a look at one of the images.

In [None]:
###--- REPLACE THE ??? BELOW WITH initial_train_X[0] (or another number you want to see) ---###
graph.imshow(???, cmap = 'gray', interpolation = 'nearest')
###

graph.show()

You should see a black and white digit.
  
__Each image:__
* Is black and white
* Is 28 pixels by 28 pixels
* This is represented by a 28 x 28 table of numbers (matrix, or DataFrame)

__Each number in the 28 x 28 table that represents the image:__
* Represents one pixel
* Is on a scale of 0 to 255
* 0 is fully black
* 255 is fully white
* In between 0 and 255 is shades of grey.

Step 3
-------

We'll need to play around with our data to get it working well with our neural network. 

First off, let's reshape our `initial_train_X, initial_test_X and initial_valid_X` sets so that they fit the convolutional layers.

We'll save them to a new variable, so if you run the cell twice you won't get errors.

In [None]:
# We'll make a variable dim, to represent our image dimensions
# Then we'll reshape the data sets using reshape

# Image dimensions
dim = initial_train_X[0].shape[0] # 28

# Here reshape will change the data sets shapes using our dim variable

###--- REPLACE THE ???s BELOW WITH reshape ---###
train_X = initial_train_X.???(train_X.shape[0], dim, dim, 1)
test_X = initial_test_X.???(test_X.shape[0], dim, dim, 1)
valid_X = initial_valid_X.???(valid_X.shape[0], dim, dim, 1)
###

# It's more efficient if we scale our values so they're between 0 and 1
# Not 0 and 255

# Here we use feature scaling
train_X = train_X / 255
valid_X = valid_X / 255
test_X = test_X / 255

print("Shapes of train, test and validation sets: ")
print("Train: ", train_X.shape)
print("Test: ", test_X.shape)
print("Validation: ", valid_X.shape)
print("Range: ", np.min(train_X), "to", np.max(train_X))

Expected output:  
```Train:  (6400, 28, 28, 1)
Test:  (2000, 28, 28, 1)
Validation:  (1600, 28, 28, 1)
Range:  0.0 to 1.0```

In [None]:
# Let's take a look at our expected output
###--- WRITE print(initial_train_Y[0]) BELOW ---###

###

Expected output: `5`

Our expected output (the label) is represented by a number - the number that is shown in the training image.

Step 4
------

As with the dog dataset in exercises 8 and 9, the neural network needs this number represented in a one-hot vector.

If we were to give this to the neural network as-is, we would be implying that there is some relationship between the classes(i.e. 5 is more closely related to 4 then 5 is to 3).

In [None]:
# This converts the output  to categorical one-hot vectors

###--- REPLACE THE ???s BELOW WITH to_categorical ---###
train_Y = keras.utils.???(train_Y, 10)
valid_Y = keras.utils.???(valid_Y, 10)
test_Y = keras.utils.???(test_Y, 10)
###

# 10 being the number of classes (digits 0 to 9)

print(train_Y[0])

## Step 5

Train a network!

Here we'll do the convolutional layers

In [None]:
# Sets a randomisation seed for replicatability.
np.random.seed(6)

###--- REPLACE THE ??? BELOW WITH Sequential ---###
model = ???()
###

The `convolutional` in Convolutional Neural Networks means the pre-processing the network does for you.

Keras makes this easy.

In [None]:
# Our input is a 2D image, so we'll use Conv2D

###--- REPLACE THE ???s BELOW WITH Conv2D ---###
model.add(???(28, kernel_size = (3, 3), activation = 'relu', input_shape = (dim, dim, 1)))
model.add(???(56, (3, 3), activation = 'relu'))
###

# Next up we'll use MaxPooling
# This helps simplify the data

###--- REPLACE THE ??? BELOW WITH MaxPooling2D ---###
model.add(???(pool_size = (2, 2)))
###

# Next we'll use Dropout
# Dropout is a method that helps prevent overfitting
# It 'drops out' (disables) nodes in the network

###--- REPLACE THE ??? BELOW WITH Dropout ---###
model.add(???(0.125))
###

# The higher the dropout, the more nodes are turned off.
# Dropout increase training time

# Next we flatten the data set so the rest of the network can use it

###--- REPLACE THE ??? BELOW WITH Flatten ---###
model.add(???())
###


Here we applied the convolutional layers to do the pre-processing for us, and flattened the data so we can analyse it and output labels.


In [None]:
# Regular dense layer, with some additional dropout
model.add(Dense(128, activation = 'relu'))
model.add(Dropout(0.25))

# Now we add our output layer to return out target probability vector.

###--- REPLACE ??? BELOW WITH 10 - THE NUMBER OF CLASSES (DIGITS 0 TO 9) ---###
model.add(Dense(???, activation = tf.nn.softmax))
###

# And finally, we compile.
model.compile(loss = 'categorical_crossentropy', optimizer = 'Adamax', metrics = ['accuracy'])

Step 6
------

Let's train it! (this might take a little while)

In [None]:
###--- REPLACE THE ???s BELOW WITH train_X, train_Y, valid_X, AND THEN valid_Y ---###
training_stats = model.fit(???, ???, batch_size = 128, epochs = 12, verbose = 1, validation_data = (???, ???))
###

###--- REPLACE THE ??? BELOW WITH evaluate ---###
evaluation = model.???(test_X, test_Y, verbose=0)
###

print('Test Set Evaluation: loss = %0.6f, accuracy = %0.2f' %(evaluation[0], 100 * evaluation[1]))

# We can plot our training statistics to see how it developed over time
accuracy, = graph.plot(training_stats.history['acc'], label = 'Accuracy')
training_loss, = graph.plot(training_stats.history['loss'], label = 'Training Loss')
graph.legend(handles = [accuracy, training_loss])
loss = np.array(training_stats.history['loss'])
xp = np.linspace(0,loss.shape[0],10 * loss.shape[0])
graph.plot(xp, np.full(xp.shape, 1), c = 'k', linestyle = ':', alpha = 0.5)
graph.plot(xp, np.full(xp.shape, 0), c = 'k', linestyle = ':', alpha = 0.5)
graph.show()

Step 6
-------

Let's test it on a new sample that it hasn't seen, and see how it classifies it!

In [None]:
###--- REPLACE THE ??? BELOW WITH test_X[0] (or any other number between 0 and 1999) ---###
sample = ???.reshape(dim, dim)
###

graph.imshow(sample, cmap = 'gray', interpolation = 'nearest')
graph.show()

prediction = model.predict(sample.reshape(1, dim, dim, 1))
print('prediction: %i (%s)' %(np.argmax(prediction), prediction))

How is the prediction? Does it look right?

Conclusion
-------

We've built a convolutional neural network that is able to recognise handwritten digits with very high accuracy. Well done!