In [None]:
%matplotlib inline

To help you understand the fundamentals of deep learning, this demo will walk through the basic steps of building two toy models for classifying handwritten numbers with accuracies surpassing 95%. The model will be a deeper network that introduces the concepts of convolution and pooling.
(strongly inspired from this notebook, with some adaptation: https://colab.research.google.com/github/AviatorMoser/keras-mnist-tutorial/blob/master/MNIST%20in%20Keras.ipynb#scrollTo=IDL7UYq7Gz5u)

# First play with some kernels

https://setosa.io/ev/image-kernels/

## Prerequisite Python Modules


In [None]:
import numpy as np                   # advanced math library
import matplotlib.pyplot as plt      # MATLAB like plotting routines
import random                        # for generating random numbers

from tensorflow.keras.datasets import mnist     # MNIST dataset is included in Keras
from tensorflow.keras.models import Sequential  # Model type to be used

from tensorflow.keras.layers import Dense, Dropout, Activation # Types of layers to be used in our model
from tensorflow.keras.utils import to_categorical              # A useful helper function for one-hot encoding
from skimage.transform import resize

## Loading Training Data

The MNIST dataset is conveniently bundled within Keras, and we can easily analyze some of its features in Python.

In [None]:
# The MNIST data is split between 60,000 28 x 28 pixel training images and 10,000 28 x 28 pixel images
(X_train, y_train), (X_test, y_test) = mnist.load_data()

print("X_train shape", X_train.shape)
print("y_train shape", y_train.shape)
print("X_test shape", X_test.shape)
print("y_test shape", y_test.shape)

Using matplotlib, we can plot some sample images from the training set directly into this notebook.

In [None]:
plt.rcParams['figure.figsize'] = (9,9) # Make the figures a bit bigger

for i in range(9):
    plt.subplot(3,3,i+1)
    num = random.randint(0, len(X_train))
    plt.imshow(X_train[num], cmap='gray', interpolation='none')
    plt.title("Class {}".format(y_train[num]))

plt.tight_layout()

Let's examine a single digit a little closer, and print out the array representing the last digit.

In [None]:
# just a little function for pretty printing a matrix
def matprint(mat, fmt="g"):
    col_maxes = [max([len(("{:"+fmt+"}").format(x)) for x in col]) for col in mat.T]
    for x in mat:
        for i, y in enumerate(x):
            print(("{:"+str(col_maxes[i])+fmt+"}").format(y), end="  ")
        print("")

# now print!
matprint(X_train[1])

Each pixel is an 8-bit integer from 0-255. 0 is full black, while 255 is full white. This what we call a single-channel pixel. It's called monochrome.

Fun-fact! Your computer screen has three channels for each pixel: red, green, blue. Each of these channels also likely takes an 8-bit integer. 3 channels -- 24 bits total -- 16,777,216 possible colors!

## Formatting the input data layer

We'll  normalize the inputs to be in the range [0-1] rather than [0-255]. Normalizing inputs is generally recommended, so that any additional dimensions (for other network architectures) are of the same scale.

In [None]:
X_train = X_train.reshape(60000, 28, 28, 1) # add an additional dimension to represent the single-channel
X_test = X_test.reshape(10000, 28, 28, 1)

X_train = X_train.astype('float32')         # change integers to 32-bit floating point numbers
X_test = X_test.astype('float32')

X_train /= 255                              # normalize each value for each pixel for the entire vector for each input
X_test /= 255

print("Training matrix shape", X_train.shape)
print("Testing matrix shape", X_test.shape)

We then modify our classes (unique digits) to be in the one-hot format, i.e.


```
0 -> [1, 0, 0, 0, 0, 0, 0, 0, 0]
1 -> [0, 1, 0, 0, 0, 0, 0, 0, 0]
2 -> [0, 0, 1, 0, 0, 0, 0, 0, 0]
etc.
```

If the final output of our network is very close to one of these classes, then it is most likely that class. For example, if the final output is:

```
[0, 0.94, 0, 0, 0, 0, 0.06, 0, 0]
```
then it is most probable that the image is that of the digit `1`.

In [None]:
nb_classes = 10 # number of unique digits, 0 to 9

Y_train = to_categorical(y_train, nb_classes)
Y_test = to_categorical(y_test, nb_classes)

## Building a "Deep" Convolutional Neural Network



> 💡 Don't forget to activate GPUs on your notebook!!



In [None]:
# import some additional tools

from keras.preprocessing.image import ImageDataGenerator
from keras.layers import Conv2D, MaxPooling2D, ZeroPadding2D, GlobalAveragePooling2D, Flatten
from keras.layers import BatchNormalization
from keras import backend as K

In [None]:
model = Sequential()                                 # Linear stacking of layers

# 💡 Choose your activation!
activation = 'relu'

# Convolution Layer 1
model.add(Conv2D(32, (3, 3), input_shape=(28,28,1))) # 32 different 3x3 kernels -- so 32 feature maps
convLayer01 = Activation(activation)                 # activation
model.add(convLayer01)

# Convolution Layer 2
model.add(Conv2D(32, (3, 3)))                        # 32 different 3x3 kernels -- so 32 feature maps
model.add(Activation(activation))                    # activation
convLayer02 = MaxPooling2D(pool_size=(2,2))          # Pool the max values over a 2x2 kernel
model.add(convLayer02)

# Convolution Layer 3
model.add(Conv2D(64,(3, 3)))                         # 64 different 3x3 kernels -- so 64 feature maps
convLayer03 = Activation(activation)                 # activation
model.add(convLayer03)

# Convolution Layer 4
model.add(Conv2D(64, (3, 3)))                        # 64 different 3x3 kernels -- so 64 feature maps
model.add(Activation(activation))                    # activation
convLayer04 = MaxPooling2D(pool_size=(2,2))          # Pool the max values over a 2x2 kernel
model.add(convLayer04)

model.add(Flatten())  # The next layer is a Dense layer... what needs to be done here?

# Fully Connected Layer 5
model.add(Dense(512))                                # 512 FCN nodes
model.add(Activation(activation))                    # activation

# Fully Connected Layer 6
model.add(Dense(10))                                 # final 10 FCN nodes
model.add(Activation('softmax'))                     # softmax activation in order to get probabilities per class

💡TODO: before running the next cell, can you guess the shape and number of parameters of each layer?

In [None]:
model.summary()

In [None]:
# Let's vizualize the model with all the dimensions!
from tensorflow.keras.utils import plot_model
plot_model(model, show_shapes=True, dpi=90)

In [None]:
# 💡 Choose some stuff yourself!
optimizer = 'adam'
loss = 'categorical_crossentropy'  # helper: https://keras.io/api/losses/probabilistic_losses/

model.compile(loss=loss, optimizer=optimizer, metrics=['accuracy'])

In [None]:
# data augmentation prevents overfitting by slightly changing the data randomly
# Keras has a great built-in feature to do automatic augmentation
# In order to begin with a simple model, no additionnal data is added. It will come later...

gen = ImageDataGenerator()

# Of course, no data augmentation for the test set.
test_gen = ImageDataGenerator()

In [None]:
# We can then feed our data in batches
# This method actually results in significant memory savings
# because we are actually LOADING the data into the network in batches before processing each batch

# Before the data was all loaded into memory, but then processed in batches.

BATCH_SIZE = 128

train_generator = gen.flow(X_train, Y_train, batch_size=BATCH_SIZE)
test_generator = test_gen.flow(X_test, Y_test, batch_size=BATCH_SIZE)

In [None]:
# We can now train our model which is fed data by our batch loader
# Steps per epoch should always be total size of the set divided by the batch size

# SIGNIFICANT MEMORY SAVINGS (important for larger, deeper networks)

NB_EPOCHS = 5

model.fit(train_generator,
          steps_per_epoch=len(X_train)//BATCH_SIZE,
          epochs=NB_EPOCHS,
          verbose=1,
          validation_data=test_generator,
          validation_steps=len(X_test)//BATCH_SIZE)

In [None]:
test_loss, test_accuracy = model.evaluate(X_test, Y_test)

In [None]:
# Let's do predictions on all the test set
prediction = model.predict(X_test)

Let's vizualize performances!

In [None]:
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

# Since the last layer is a softmax, it provides us probabilities for each class.
# We need to take the argmax in order to get the real prediction of the model.
y_pred = np.argmax(prediction, axis=1)

def plot_cm(labels, predictions):
    cm = confusion_matrix(labels, predictions)
    plt.figure(figsize=(5, 5))
    sns.heatmap(cm, annot=True, fmt="d")
    plt.title("Confusion matrix (non-normalized))")
    plt.ylabel("Actual label")
    plt.xlabel("Predicted label")

plot_cm(y_test, y_pred)

# Let's vizualize the layers!

In [None]:
# 💡choose any image to want by specifying the index, here the 3nd image of the test set.
img_index = 3
img = X_test[img_index]
img = np.expand_dims(img, axis=0) # Keras requires the image to be in 4D, so we add an extra dimension to it.

In [None]:
# Not important to understand how this function work -- It just plots a convolution layer

def visualize(layer):
    inputs = model.inputs

    _convout1_f = K.function(inputs, [layer.output])

    def convout1_f(X):
        # The [0] is to disable the training phase flag
        return _convout1_f([X])

    convolutions = convout1_f(img)
    convolutions = np.squeeze(convolutions)

    print ('Shape of conv:', convolutions.shape)

    m = convolutions.shape[2]
    n = int(np.ceil(np.sqrt(m)))

    # Visualization of each filter of the layer
    fig = plt.figure(figsize=(15,12))
    for i in range(m):
        ax = fig.add_subplot(n,n,i+1)
        ax.imshow(convolutions[:,:,i], cmap='gray')

In [None]:
# Print the current image
plt.imshow(X_test[img_index].reshape(28,28), cmap='gray', interpolation='none');

In [None]:
# 💡 BEFORE RUNNING: can you guess the shape of the first feature map?
visualize(convLayer01) # visualize first set of feature maps

In [None]:
# 💡 BEFORE RUNNING: can you guess the shape of the first feature map?
visualize(convLayer02) # visualize second set of feature maps

In [None]:
# 💡 BEFORE RUNNING: can you guess the shape of the first feature map?
visualize(convLayer03)# visualize third set of feature maps

In [None]:
# 💡 BEFORE RUNNING: can you guess the shape of the first feature map?
visualize(convLayer04)# visualize fourth set of feature maps

If you want to go deeper on the vizualization, you have this great notebook: https://colab.research.google.com/github/nguyenhoa93/cnn-visualization-keras-tf2/blob/master/visualization.ipynb

💡**YOUR TURN**

Now, please try to play with the different activation functions!

- Which one train faster?
- Which one give best results?

Please, play also with the number of layers: can you remove some of them and see the impact?

# Now, let's try data augmentation!

In each epoch, the ImageDataGenerator applies a transformation on the images you have and use the transformed images for training. The set of transformations includes rotation, zooming, etc. By doing this you're somehow creating new data (i.e. also called data augmentation), but obviously the generated images are not totally different from the original ones. This way the learned model may be more robust and accurate as it is trained on different variations of the same image.
It means we use a different transformation of each image in each epoch.

In [None]:
# We'll change the image data generator so that it generates images in real time.
# See all the possibles parameters here, and then vizualize the impacts:
# https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator

# 💡 Play with it and vizualize it!
gen = ImageDataGenerator(rotation_range=8,
                         width_shift_range=0.08,
                         shear_range=0.3,
                         height_shift_range=0.08,
                         zoom_range=0.08,
                         vertical_flip=False,)

In [None]:
# Take one image to vuzualize the transformations applied
img = X_train[0]
plt.imshow(img[:,:, -1], cmap='gray', interpolation='none');

In [None]:
i = 0
# Look at 9 augmentations for that image
for img_batch in gen.flow(img.reshape(1, 28, 28, 1), batch_size=9):
    for img_ in img_batch:
        plt.subplot(3,3,i+1)
        plt.imshow(img_[:,:,-1], cmap='gray')
        i = i + 1
    if i >= 9:
        break

plt.tight_layout()

💡 Now, relaunch the training with this data augmentation, and try to see the impacts!

💡 What if the data augmentation is not a good quality (for example, too much zoom)?