In [2]:
import numpy as np
import mnist
import keras

Using TensorFlow backend.


In [3]:
# The first time you run this might be a bit slow, since the
# mnist package has to download and cache the data.
train_images = mnist.train_images()
train_labels = mnist.train_labels()

In [4]:
print(train_images.shape) # (60000, 28, 28)
print(train_labels.shape) # (60000,)

(60000, 28, 28)
(60000,)


## Preparing the Data

Before we begin, we’ll normalize the image pixel values from [0, 255] to [-0.5, 0.5] to make our network easier to train (using smaller, centered values usually leads to better results). We’ll also reshape each image from (28, 28) to (28, 28, 1) because Keras requires the third dimension.

In [5]:
train_images = mnist.train_images()
train_labels = mnist.train_labels()
test_images = mnist.test_images()
test_labels = mnist.test_labels()

In [6]:
# Normalize the images.
train_images = (train_images / 255) - 0.5
test_images = (test_images / 255) - 0.5

In [7]:
# Reshape the images.
train_images = np.expand_dims(train_images, axis=3)
test_images = np.expand_dims(test_images, axis=3)

In [8]:
print(train_images.shape) # (60000, 28, 28, 1)
print(test_images.shape)  # (10000, 28, 28, 1)

(60000, 28, 28, 1)
(10000, 28, 28, 1)


## Building the model

Every Keras model is either built using the Sequential class, which represents a linear stack of layers, or the functional Model class, which is more customizeable. We’ll be using the simpler Sequential model, since our CNN will be a linear stack of layers.
<br>We start by instantiating a Sequential model:

In [9]:
from keras.models import Sequential

In [10]:
from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten

The Sequential constructor takes an array of Keras Layers. We’ll use 3 types of layers for our CNN: Convolutional, Max Pooling, and Softmax.

In [11]:
num_filters = 8
filter_size = 3
pool_size = 2

In [12]:
model = Sequential([
  Conv2D(num_filters, filter_size, input_shape=(28, 28, 1)),
  MaxPooling2D(pool_size=pool_size),
  Flatten(),
  Dense(10, activation='softmax'),
])







<br> 1) num_filters, filter_size, and pool_size are self-explanatory variables that set the hyperparameters for our CNN.
<br> 2) The first layer in any Sequential model must specify the input_shape, so we do so on Conv2D. Once this input shape is specified, Keras will automatically infer the shapes of inputs for later layers.
<br> 3) The output Softmax layer has 10 nodes, one for each class.

## Compiling the Model

Before we can begin training, we need to configure the training process. We decide 3 key factors during the compilation step:
<br> 1) The __optimizer__. We’ll stick with a pretty good default: the Adam gradient-based optimizer. Keras has many other optimizers you can look into as well.
<br> 2) The __loss function__. Since we’re using a Softmax output layer, we’ll use the Cross-Entropy loss. Keras distinguishes between __binary_crossentropy__ (2 classes) and __categorical_crossentropy__ (>2 classes), so we’ll use the latter. See all Keras losses.
<br> 3) A list of metrics. Since this is a classification problem, we’ll just have Keras report on the __accuracy__ metric.
<br>
<br>Here’s what that compilation looks like:

In [13]:
model.compile(
  'adam',
  loss='categorical_crossentropy',
  metrics=['accuracy'],
)





## Training the model

Training a model in Keras literally consists only of calling fit() and specifying some parameters. There are a lot of possible parameters, but we’ll only supply these:
<br> 1) The __training data__ (images and labels), commonly known as X and Y, respectively.
<br> 2) The __number of epochs__ (iterations over the entire dataset) to train for.
<br> 3) The __validation data__ (or test data), which is used during training to periodically measure the network’s performance against data it hasn’t seen before.
<br>
<br>
There’s one thing we have to be careful about: Keras expects the training targets to be _10-dimensional vectors_ , since there are 10 nodes in our Softmax output layer. Right now, our __train_labels__ and __test_labels__ arrays contain single integers representing the class for each image:

In [14]:
print(train_labels[0]) # 5

5


Conveniently, Keras has a utility method that fixes this exact issue: to_categorical. It turns our array of class integers into an array of one-hot vectors instead. For example, 2 would become [0, 0, 1, 0, 0, 0, 0, 0, 0, 0] (it’s zero-indexed).
<br> Here’s what that looks like:

In [15]:
from keras.utils import to_categorical

In [16]:
model.fit(
  train_images,
  to_categorical(train_labels),
  epochs=3,
  validation_data=(test_images, to_categorical(test_labels)),
)

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where



Train on 60000 samples, validate on 10000 samples
Epoch 1/3





Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x1d6b903ef60>

In [17]:
# Predict on the first 5 test images
predictions = model.predict(test_images[:5])

In [18]:
print(np.argmax(predictions, axis=1)) # [7, 2, 1, 0, 4]

[7 2 1 0 4]


In [19]:
print(test_labels[:5]) # [7, 2, 1, 0, 4]

[7 2 1 0 4]


In [21]:
# Save the model to disk.
model.save_weights('cnn.h5')

## Extensions
There’s much more we can do to experiment with and improve our network - in this official Keras MNIST CNN example, they achieve 99.25% test accuracy after 12 epochs. Some examples of modifications you could make to our CNN include:
Network Depth
<br> What happens if we add or remove Convolutional layers? How does that affect training and/or the model’s final performance?

In [22]:
model = Sequential([
  Conv2D(num_filters, filter_size, input_shape=(28, 28, 1)),
  Conv2D(num_filters, filter_size),
  MaxPooling2D(pool_size=pool_size),
  Flatten(),
  Dense(10, activation='softmax'),
])

## Dropout
What if we tried adding Dropout layers, which are commonly used to prevent overfitting?

In [23]:
from keras.layers import Dropout

model = Sequential([
  Conv2D(num_filters, filter_size, input_shape=(28, 28, 1)),
  MaxPooling2D(pool_size=pool_size),
  Dropout(0.5),
  Flatten(),
  Dense(10, activation='softmax'),
])


Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


## Fully-connected Layers
What if we add fully-connected layers between the Convolutional outputs and the final Softmax layer? This is something commonly done in CNNs used for Computer Vision.

In [24]:
from keras.layers import Dense

model = Sequential([
  Conv2D(num_filters, filter_size, input_shape=(28, 28, 1)),
  MaxPooling2D(pool_size=pool_size),
  Flatten(),
  Dense(64, activation='relu'),
  Dense(10, activation='softmax'),
])

## Convolution Parameters

What if we play with the Conv2D parameters? For example:

In [25]:
# These can be changed, too!
num_filters = 8
filter_size = 3

model = Sequential([
  # See https://keras.io/layers/convolutional/#conv2d for more info.
  Conv2D(
    num_filters,
    filter_size,
    input_shape=(28, 28, 1),
    strides=2,
    padding='same',
    activation='relu',
  ),
  MaxPooling2D(pool_size=pool_size),
  Flatten(),
  Dense(10, activation='softmax'),
])

In [27]:
# Compile the model.
model.compile(
  'adam',
  loss='categorical_crossentropy',
  metrics=['accuracy'],)


In [29]:
model.fit(
  train_images,
  to_categorical(train_labels),
  epochs=12,
  validation_data=(test_images, to_categorical(test_labels)),
)

Train on 60000 samples, validate on 10000 samples
Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x1d6bc376be0>