In [None]:
import numpy as np
import matplotlib.pyplot as plt
import math

In [None]:
import keras
from keras.datasets import mnist
from keras.layers import Input, Dense, Dropout, Flatten, MaxPooling2D, MaxPooling1D, Conv2D, BatchNormalization
from keras.models import Model, Sequential
import numpy as np

Keras is a great place to start, it has a relatively simple design, and uses TensorFlow under the hood. And -- it has the mnist data already available.


In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [None]:
x_train.shape

We do a tiny bit of data prep -- normalization, meaning we divide by the max value to make all the pixes on the range of 0 - 1, this helps the model learn faster. For fun you can take out the ` / np.max(...)` and see how much longer it takes for the accuracy rise.

And -- we reshape. This is about meeting Keras expectations. The convolution layers are set up for 3D data -- meaning (x, y, color) channel pixels. Since the source mnist data is just (x, y), we have to shape the grey scale into the color channel position in the matrix, adding one additional dimension.

In [None]:
x_train = np.expand_dims(x_train / np.max(x_train), -1)
x_test = np.expand_dims(x_test / np.max(x_test), -1)
x_train.shape

For the output y labels, we need to convert the digit identifiers 0, 1, ... 8, 9 to one hot encodings where they are 10 slots, with a 0 or one acting as a flag.

In [None]:
y_train[0:10]

In [None]:
train_labels = keras.utils.to_categorical(y_train, 10)
test_labels = keras.utils.to_categorical(y_test, 10)

train_labels[0:10]

And now -- an actual deep network, this is a now classic design, using convolution, pooling, and dropout. Finally, the model ends in a dense layer with softmax -- seems familiar, this softmax output is just like our logistic regression. 

The difference here is -- we have created a deep learning model with many layers.

And now, using the Keras, build a model that has convolution, pooling, dropout and a final softmax classification.


One thing to note here is Flatten. Because our images are two dimensional *x,y* pairs, and our output is one dimension -- a class 0-9, Flatten is needed to reduce the dimensions.


In [None]:
input_shape = x_train[0].shape
num_classes = 10
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

model.summary()

With the model assembled, we compile it, which prepares the model for execution with a solver. And then we fit it -- using the training data and labels to learn parameters, and the testing data and labels to check how well the model works.

This is an important point -- holding out part of the data to test. If you use all of you data in training, you can end up with a model that merely memorizes your input data, but cannot make predictions about new, unseen data. This is a phenomena known as *overfitting*.

In [None]:
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

history = model.fit(x_train, train_labels,
                    batch_size=64,
                    epochs=8,
                    verbose=1,
                    validation_data=(x_test, test_labels))

And now we'll get a report as to how well we're classifying.

In [None]:
import sklearn.metrics
predictions = model.predict_classes(x_test)
print(sklearn.metrics.classification_report(test_labels, predictions))