In [1]:
from keras.datasets import mnist
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

(X_train, y_train), (X_test, y_test) = mnist.load_data() # first we load all our data from mnist

Using TensorFlow backend.


In [2]:
print('TRAINING SET DIGIT OCCURENCES')
for i in range(10):
    print('{}:{}'.format(i,np.bincount(y_train)[i])) # gives the occurence of values
print('\nTEST SET DIGIT OCCURENCES')
for i in range(10):
    print('{}:{}'.format(i,np.bincount(y_test)[i])) # gives the occurence of values
# we make sure we do not have a heavily skewed distribution of digits

TRAINING SET DIGIT OCCURENCES
0:5923
1:6742
2:5958
3:6131
4:5842
5:5421
6:5918
7:6265
8:5851
9:5949

TEST SET DIGIT OCCURENCES
0:980
1:1135
2:1032
3:1010
4:982
5:892
6:958
7:1028
8:974
9:1009


In [3]:
# DEFINING VARIABLES
IMG_WIDTH = 28
IMG_HEIGHT = 28
IMG_CHANNELS = 1
IMG_SHAPE = (IMG_WIDTH, IMG_HEIGHT, IMG_CHANNELS)

NUMBER_OF_TRAIN = len(X_train)
NUMBER_OF_TEST = len(X_test)
NUMBER_OF_CLASSES = 10
EPOCHS = 50

In [15]:
from keras.models import Sequential, load_model
from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Activation, Dropout, Input

######### WE GONNA BUILD MULTIPLE MODELS AND SEE HOW THEY FARE AGAINST ONE ANOTHER ##########

# dense layers model #
modeldense = Sequential()
modeldense.add(Flatten(input_shape=(28,28,1))) 
# need to flatten input because each sample is 3d, and our dense layer accepts 1d samples 

modeldense.add(Dense(784))
modeldense.add(Activation('relu'))
modeldense.add(Dropout(0.4))

modeldense.add(Dense(128))
modeldense.add(Activation('relu'))

modeldense.add(Dense(128))
modeldense.add(Activation('relu'))
modeldense.add(Dropout(0.4))

modeldense.add(Dense(10))
modeldense.add(Activation('softmax')) 
# softmax is here because we have 10 classes and we want the outputs to be the probability that 
# the input encountered is that class

# conv2d model #
modelconv = Sequential()
modelconv.add(Conv2D(32, (3,3), input_shape=(28,28,1)))
modelconv.add(Conv2D(32, (3,3)))
modelconv.add(MaxPooling2D(pool_size=(2,2)))
modelconv.add(Dropout(0.4))

modelconv.add(Conv2D(64, (3,3)))
modelconv.add(Conv2D(64, (3,3)))
modelconv.add(MaxPooling2D(pool_size=(2,2)))
modelconv.add(Dropout(0.4))

modelconv.add(Flatten()) # always flatten when connecting a maxpooling/conv2d layer to a dense layer
modelconv.add(Dense(128))
modelconv.add(Activation('relu'))
modelconv.add(Dropout(0.4))
modelconv.add(Dense(128))
modelconv.add(Activation('relu'))
modelconv.add(Dense(10))
modelconv.add(Activation('softmax'))

modeldense.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
modelconv.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

print(modeldense.summary())
print('\n'*4)
print(modelconv.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_3 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_8 (Dense)              (None, 784)               615440    
_________________________________________________________________
activation_8 (Activation)    (None, 784)               0         
_________________________________________________________________
dropout_6 (Dropout)          (None, 784)               0         
_________________________________________________________________
dense_9 (Dense)              (None, 128)               100480    
_________________________________________________________________
activation_9 (Activation)    (None, 128)               0         
_________________________________________________________________
dense_10 (Dense)             (None, 128)               16512     
__________

First, we need to one-hot encode our labels, because we have 10 classes as our output, and we want the labels to also have 10 dimensions to correspond to the final output layer of our NN.

After that, we will preprocess our X data. We will normalize it by dividing by 255 (the max value of a pixel for our X data), so that the range of pixel values is [0,1]. Normalizing makes our NN train more efficiently.

In [5]:
# PREPROCESSING OUR DATA
from keras.utils import np_utils

# ONE-HOT ENCODING OUR LABELS
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)

X_train = X_train/255 # normalize our data
X_test = X_test/255

X_train = np.reshape(X_train, (-1, 28, 28, 1)) # reshaping to fit convolutional model
X_test = np.reshape(X_test, (-1, 28, 28, 1))

Why do we have a validation set and a test set? 
https://stackoverflow.com/questions/2976452/whats-is-the-difference-between-train-validation-and-test-set-in-neural-netwo

**Training Set**: this data set is used to adjust the weights on the neural network.

**Validation Set**: this data set is used to minimize overfitting. You're not adjusting the weights of the network with this data set, you're just verifying that any increase in accuracy over the training data set actually yields an increase in accuracy over a data set that has not been shown to the network before, or at least the network hasn't trained on it (i.e. validation data set). If the accuracy over the training data set increases, but the accuracy over the validation data set stays the same or decreases, then you're overfitting your neural network and you should stop training.

**Testing Set**: this data set is used only for testing the final solution in order to confirm the actual predictive power of the network.

In [6]:
from sklearn.model_selection import train_test_split

X_train, X_validate, y_train, y_validate = train_test_split(X_train, y_train, test_size=0.25)

We use callbacks to improve the quality of training on our model.

Earlystopping will stop model from training after `val_loss` does not improve for a certain number of epochs (based on `patience`).

Reducing the Learning Rate on Plateau will decrease the learning rate when `val_loss` does not improve, so that we can reach the local minima better.

In [7]:
# Using callbacks to improve fitting of the model
from keras.callbacks import EarlyStopping, ReduceLROnPlateau

earlystop = EarlyStopping(monitor='val_loss', 
                         patience=5,
                         )

reduceLR = ReduceLROnPlateau(monitor='val_loss', 
                             factor=0.2, 
                             patience=3,
                            )

callbacks = [earlystop, reduceLR]

In [16]:
modelconv.fit(X_train, y_train, epochs=EPOCHS, callbacks=callbacks, validation_data=(X_validate, y_validate))

Train on 45000 samples, validate on 15000 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50


<keras.callbacks.History at 0x1fdfb52f390>

In [11]:
modeldense.fit(X_train, y_train, epochs=EPOCHS, callbacks=callbacks, validation_data=(X_validate, y_validate))

Train on 45000 samples, validate on 15000 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50


<keras.callbacks.History at 0x1fdfb501e10>

In [17]:
# saving our models
modelconv.save('convolutional_mnist_digit.h5')
modeldense.save('dense_mnist_digit.h5')

In [18]:
modelconv.evaluate(X_test, y_test)



[0.022484434038856124, 0.9936]

In [14]:
modeldense.evaluate(X_test, y_test)



[0.07698074098548184, 0.9869]

From the evaluation score, we can see that the Convolutional Neural Network is better at identifying handwritten digits.