## MNIST dataset with Convolution Neural Networks!

This week we are going to play with CNN using our old MNIST dataset! As usual please add the MNIST dataset and run the first cell to get its path.


In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

As for the exercise with NN, we'll use the first 10000 samples from the MNIST dataset, and translate the ground truth y_train and y_test from a single digit to one-hot encoding, i.e. 0 -> \[1,0,...,0], 1-> [0,1,0,..0],..., 9->[0,0,...,1] using np.eye.  

You can check what we've done in Week 06.

In [None]:
# Use np.eye for one-hot encoding
# 0 -> [1,0,...,0], 1-> [0,1,0,..0],..., 9->[0,0,...,1] 

mnist = np.load('/kaggle/input/mnist-numpy/mnist.npz')
x_train = mnist['x_train'][:10000]/255.
y_train = np.array([np.eye(10)[n] for n in mnist['y_train'][:10000]])
x_test = mnist['x_test']/255.
y_test = np.array([np.eye(10)[n] for n in mnist['y_test']])

print('x_train shape is: ', x_train.shape)
print('y_train shape is: ', y_train.shape)

Now we'll build a CNN model with 16 5x5 filters, 2x2 max pooling layer, flatten the max-pooled output and feed it into a fully-connected NN with 10 outputs (for digit 0-9).

Note that we reshape the input to (28,28,1) for gray-scale images.

In [None]:
from tensorflow.keras.models import Sequential
# Import all layers!
from tensorflow.keras.layers import *
from tensorflow.keras.optimizers import SGD, Adadelta

model = Sequential()
# build a CNN model, input shape channel = 1 for gray scale
model.add(Reshape((28,28,1), input_shape=(28,28)))
model.add(Conv2D(16, kernel_size=(5,5), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
#model.add(Dropout(0.1))
model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer=Adadelta(),
              metrics=['accuracy'])

# Print model architecture
model.summary()

Let's fit the model!

In [None]:
model.fit(x_train, y_train, epochs=100, batch_size=50,
          validation_data=(x_test, y_test))

print('Performance (training)')
print('Loss: %.5f, Acc: %.5f' % tuple(model.evaluate(x_train, y_train)))
print('Performance (testing)')
print('Loss: %.5f, Acc: %.5f' % tuple(model.evaluate(x_test, y_test)))

## Visualization of the filters 

How do the filters and the feature maps look like? We can use the following code to fetch the weights (elements of the filters) and plot them. There are 16 5x5 filters in our model.

In [None]:
# Visualize the filters
# Weights come in the form filters, biases = model.layers[1].get_weights()
w = model.layers[1].get_weights()
print('weights from Conv2D layer shape is:', np.array(w[0]).shape)
fig = plt.figure(figsize=(8,8), dpi=80)
for i in range(16):
    plt.subplot(4,4,i+1)
    plt.imshow(w[0][:,:,:,i], cmap='Greys')
plt.show()

In the plots above, the light squares are small weights and the dark squares represent large weights. So the filters pick up different kinds of patterns in the image.

## Visualization of the feature maps

The feature maps represent how the CNN 'sees' the input image. It will be interesting to see how the feature maps look like! In the code below, we first build a test model whose output is the output of the first hidden layer (the only Conv2D in our model), then we plot the output. There will be 16 feature maps for a single input image.

We'll use the first sample in the x_test as the input to this test model. What is the corresponding digit of the image?

In [None]:
# redefine model to output right after the first hidden layer
from tensorflow.keras.models import Model
model_test = Model(inputs=model.inputs, outputs=model.layers[1].output)

# Use the 1st image in x_test as input
# What is the digit this image represents?
plt.imshow(x_test[0])
plt.show()

We need to first transform the shape of x_test\[0] from (28,28) to (1,28,28),
so that it looks like an input sample with only one image.

Use any method you like to define an input 'img' of shape (1,28,28) from x_test\[0].

In [None]:
# Add a new axis/dimension so that the shape of the input will be (1,28,28), 
# i.e. an input sample with only one image
img = x_test[0][np.newaxis]
print('img shape is:', img.shape)

Now we take the prediction (output) of the input image from the test model and plot them.

In [None]:
# get feature map for first hidden layer
feature_maps = model_test.predict(img)
#print('output shape is:', feature_maps.shape)

fig = plt.figure(figsize=(8,8), dpi=80)
for i in range(16):
    plt.subplot(4,4,i+1)
    plt.imshow(feature_maps[0,:,:,i], cmap='Greys')
plt.show()

As you can see, some feature maps capture a lot of details of the image, and some show much less details. This makes the CNN able to abstract more general concepts for classifcation of the images, and thus generalize to other samples.