# Deep convolutional neural networks for image classification
<span style="font-size:9pt;">
author: MWLafarge (m.w.lafarge@tue.nl); affiliation: Eindhoven University of Technology; created: Feb 2020
</span>

## Preliminaries
In the previous exercise, we saw how to tackle a classification task by modeling the target likelihood function by a fully-connected neural network. In order to exploit the structured nature of image data, we will implement a deep convolutional neural network (CNN) by replacing dense layers by __convolution layers__. This should result in a more more efficient model with significantly smaller number of parameters. 

This exercise will also show how to efficiently reduce the dimensionality of the feature maps along the depth of the networks by the use of __pooling layers__. You will see how to visualize the learned convolutional kernels, intermediate feature maps and get insights of the reasons why CNNs are a powerful machine framework for image processing and analysis.

This exercise follows the same structure as the second exercise and uses the same classification problem as an example. The main differences are in the "Defining the graph operations" section and the "Visualizing the intermediate feature maps" subsections.

## Import the required libraries

In [None]:
# environement setup
%load_ext autoreload
%autoreload 2
#%matplotlib widget
%matplotlib inline

# system libraries
import sys
import os

# computational libraries
import numpy as np
import tensorflow as tf

# utility functions for this exercise
from utils_ex3 import plot_image_batch, plot_featureMaps, Monitoring

## Importing the dataset

The dataset for this exercise is derived from the __PCam__ benchmark dataset created by B. Veeling et al. ([github.com/basveeling/pcam](https://github.com/basveeling/pcam)) that in turn originates from the [CAMELYON16 challenge](https://camelyon16.grand-challenge.org/). 

The PCam dataset consist of RGB color images of size 96 $\times$ 96 pixels extracted from histological sections of sentinel lymph node' tissue of breast cancer patients. Each image is annotated with a binary label indicating presence of metastatic tissue in the patch.

For the purpose of this exercise, the original PCam dataset was subsampled to 20000 training images and 2000 test images, balanced across the two classes. Furthermore, to enable faster processing the images were cropped to the central region of 64 $\times$ 64 pixels.

In [None]:
PATH_BASE_DIR = os.getcwd() # current working directory

def load_data_helper(path):
    data_dict = np.load(path) 
    data_points = data_dict["images"]
    data_labels = data_dict["labels"]
    
    return data_points, data_labels

path_train = PATH_BASE_DIR + os.sep + "../data/data_smallPCam_training.npz"
path_test  = PATH_BASE_DIR + os.sep + "../data/data_smallPCam_test.npz"
data_images_train, data_labels_train = load_data_helper(path_train)
data_images_test, data_labels_test   = load_data_helper(path_test)
    
print("Imported training points: ", data_images_train.shape)
print("Imported training labels: ", data_labels_train.shape)

print("Imported test points: ", data_images_test.shape)
print("Imported test labels: ", data_labels_test.shape)

First, let's visualize a few examples.  As it can be seen, the class label of a patch (1: tumor, 0: not tumor) is not obvious to non-experts (this task requires experience with analyzing histopathology images). 

In [None]:
random_indices = [np.random.randint(data_images_train.shape[0]) for _i in range(8)]

batch_images = data_images_train[random_indices,]
batch_labels = data_labels_train[random_indices,]

plot_image_batch(batch_images, batch_labels)

## Creating a classifier
We want to create a model that takes actual tensor images of shape \[64,64,3\] (64 $\times$ 64 RGB image) as input and outputs the likelihood of the class label.

### Defining the graph operations
As with the previous exercises, let's first instantiate all the components that we will use to construct the network. This time, instead of fully connected layers, we will use convolutional layers, that require specific parameters. The network will be a straight-forward sequence of alternating convolutional layers and max-pooling layers until a bottleneck is reached. Finally, a fully connected layer will produce the network output.

- an __tf.keras.Input()__ placeholder that takes image tensors of size \[64, 64, 3\] as input.
- several [__tf.keras.layers.Conv2D()__](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D) layers to filter intermediate feature maps (convolution operation + non-linearity).
- several [__tf.keras.layers.MaxPool2D()__](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D) layers to down-sample the intermediate feature maps.
- an __tf.keras.layers.Dense()__ layer that will constitute the final layer of the network and activated by a sigmoid.

As for the previous exercise, we will use a regularizer to prevent overfitting.
It is important to keep track of the change of shape of the features maps along the sequence of layers: the shape is indicated, in comment, at the instanciation of each layer.
This way, a bottleneck is reached after the fifth convolutional layer (spatial dimensions are reduced to \[1,1\]).


In [None]:
# regularizer object
weight_decay = 0.0001
regularizer = tf.keras.regularizers.l2(l=weight_decay)

# placeholder for the image data
image_size = [64, 64, 3]
inputs = tf.keras.Input(shape=image_size)

# model hyper-parameters
N = 8 # number of feature maps in each convolutional layer
M = 16 # number of feature maps in the last layes (prior to the output)
activation = "relu" # chosen non-linerarity for the convolutional layers

# model components
# feature map shape: [64,64,3] -> [60,60,N]
conv1 = tf.keras.layers.Conv2D(
    filters     = N,
    kernel_size = 5,
    strides     = (1, 1),
    padding     = "valid",
    activation  = activation,
    kernel_regularizer = regularizer)

# feature map shape: [60,60,16] -> [30,30,16]
maxPool1 = tf.keras.layers.MaxPool2D(
    pool_size = (2, 2),
    strides   = None,
    padding   = "valid")

# feature map shape: [30,30,N] -> [28,28,N]
conv2 = tf.keras.layers.Conv2D(
    filters     = N,
    kernel_size = 3,
    strides     = (1, 1),
    padding     = "valid",
    activation  = activation,
    kernel_regularizer = regularizer)

# feature map shape: [28,28,N] -> [14,14,N]
maxPool2 = tf.keras.layers.MaxPool2D(
    pool_size = (2, 2),
    strides   = None,
    padding   = "valid")

# [14,14,N] -> [12,12,N]
conv3 = tf.keras.layers.Conv2D(
    filters     = N,
    kernel_size = 3,
    strides     = (1, 1),
    padding     = "valid",
    activation  = activation,
    kernel_regularizer = regularizer)

# [12,12,N] -> [6,6,N]
maxPool3 = tf.keras.layers.MaxPool2D(
    pool_size = (2, 2),
    strides   = None,
    padding   = "valid")

# [6,6,N] ->  [4,4,N]
conv4 = tf.keras.layers.Conv2D(
    filters     = N,
    kernel_size = 3,
    strides     = (1, 1),
    padding     = "valid",
    activation  = activation,
    kernel_regularizer = regularizer)

# [4,4,N] -> [2,2,N]
maxPool4 = tf.keras.layers.MaxPool2D(
    pool_size = (2, 2),
    strides   = None,
    padding   = "valid")

# [2,2,N] -> [1,1,M]
conv5 = tf.keras.layers.Conv2D(
    filters     = M,
    kernel_size = 2,
    strides     = (1, 1),
    padding     = "valid",
    activation  = activation,
    kernel_regularizer = regularizer)

# [1,1,M] -> [M]
flatten = tf.keras.layers.Flatten()

# [M] ->  [1]
layerOut = tf.keras.layers.Dense(
    units       = 1,
    activation  = "sigmoid",
    kernel_regularizer = regularizer)

### Connecting the graph and instantiating the model
We can now join the graph components to define the model __output__ as a function of the __input placeholder__.

In [None]:
featureMap1 = maxPool1(conv1(inputs))
featureMap2 = maxPool2(conv2(featureMap1))
featureMap3 = maxPool3(conv3(featureMap2))
featureMap4 = maxPool4(conv4(featureMap3))
featureMap5 = flatten(conv5(featureMap4))
outputs = layerOut(featureMap5)

model = tf.keras.Model(inputs, outputs)

### Preparing to train the model

As before, we will use binary cross entropy loss and stochastic gradient descent with momentum.

In [None]:
# cross-entropy loss between the distribution of ground truth labels and the model predictions
loss = tf.keras.losses.BinaryCrossentropy()

# stochastic gradient descent with momentum
optimizer = tf.keras.optimizers.SGD(
    learning_rate = 0.01,
    momentum      = 0.9)

### Compiling the model
We finally configure the model for training by indicating the loss, the optimizer and performance metrics to be computed during training.  

We use __model.summary()__ to display and check the model architecture we implemented.
Remark that with a similar architecture, the total number of trainable parameters in the network is highly reduced.

In [None]:
model.compile(
    optimizer = optimizer,
    loss      = tf.keras.losses.BinaryCrossentropy(),
    metrics   = ["accuracy"])

model.summary()

### Creating a data generator

Again, for memory-wise and training efficiency reasons, we will define generators that can be called to produce __mini-batches__ of samples when needed.

This time, images will keep their tensor shape, and we will rescale their intensity to be in \[0,1\].

In [None]:
class dataGenerator(tf.keras.utils.Sequence):
    """ DataGenerator herited from tf.keras.utils.Sequence
        Input: image data, label data
        __getitem__: Returns random samples (mini-batches) drawn from the data
    """
    
    def __init__(self, data_images, data_labels):
        self.batch_size = 32
        
        self.data_images = data_images
        self.data_labels = data_labels
        self.data_size   = data_images.shape[0]
        
        self.scan_idx    = 0

    def __len__(self):
        return int(self.data_size / self.batch_size)

    def __getitem__(self, idx):
        batch_images = []
        batch_labels = []
        for _i in range(self.batch_size):
            batch_images.append(self.data_images[self.scan_idx,])
            batch_labels.append(self.data_labels[self.scan_idx])
        
            self.scan_idx += 1 # Loop over available images
            self.scan_idx %= self.data_size
            
        batch_images = np.stack(batch_images, axis=0)
        batch_labels = np.array(batch_labels)
        
        batch_images = batch_images / 255.0 # Images are rescaled in [0,1]
        
        return batch_images, batch_labels

Then, we can instantiate 2 generators: one to generate mini-batches of the training data, one to generate mini-batches of the test data.

In [None]:
dataGenerator_train = dataGenerator(data_images_train, data_labels_train)
dataGenerator_test  = dataGenerator(data_images_test, data_labels_test)

Finally lets check what keras receives when the generator are called:

In [None]:
arbitrary_iteration_idx = 42
train_images_batch, train_labels_batch = dataGenerator_train[arbitrary_iteration_idx]

print("Mini-batch of images has shape: ", train_images_batch.shape)

## Training the model and monitoring the training process

Now that our generators are instantiated, we can train our model. We will use __tf.keras.Model.fit\_generator__ function to start the training procedure by feeding data generators instead of data arrays. Look at the documentation of [tf.keras.Model.fit\_generator](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit\_generator) for more details.

To get a better insight of the learned kernels, the Monitor callback for this exercise will show visualizations of the kernels of the first layer. Note that the output of each neuron in the first layer is: kernel $\ast$ image + bias.

In [None]:
nbEpochs  = 500

model.fit_generator(
    generator = dataGenerator_train,
    steps_per_epoch = 10,
    epochs          = nbEpochs,

    validation_data  = dataGenerator_test,
    validation_freq  = 5,
    validation_steps = 1,

    verbose   = 1,
    callbacks = [Monitoring(layerTracking=conv1)])

## Evaluating the model
Once the model is trained, we want to quantify its performances on the hold-out test set. 

In [None]:
# We evaluate the model on test data
eval_out = model.evaluate_generator(
    generator = dataGenerator_test,
    verbose   = 1)

print("Accuracy on test set: ", eval_out[1])

## Visualizing some predictions
Let's select a sample of test images and compare the predictions with the true class labels.

In [None]:
# First we use the test generator to create batches of test images
arbitrary_iteration_idx = 42
batch_test_images, batch_test_labels = dataGenerator_test[arbitrary_iteration_idx]
batch_test_images = batch_test_images[:8] # Selection of a small sample
batch_test_labels = batch_test_labels[:8]

# Then we get the prediction of the mode
tensor_predictions = model.predict(
    x = batch_test_images)

# We format the results in a list for visualization
list_results = ["True Class [{}] \n P(y=1|x) = {}".format(true_y, str(pred_y[0])[:5]) 
                for true_y, pred_y in zip(batch_test_labels, tensor_predictions)]

# We rescale the images to their range of origin
batch_test_images = 255.0 * batch_test_images #-- Rescaling
batch_test_images = batch_test_images.astype(np.uint8) #-- 8-bit conversion

plot_image_batch(batch_test_images, list_results) #-- We finally visualize the results

## Visualizing the intermediate feature maps
The nature of the convolution operation enable to conserve the spatial structure of the input images withing the intermediate feature maps. It is possible to extract and visualize them for a given input. Although they can be difficult to interpret, they reveal how the input images are transformed to produce the obtained output.

This can be implemented with __tf.keras__ by creating a new auxilary __tf.keras.Model__ that ouputs the intermediate map of interest.
We can see the returned feature maps with the function __plot_featureMaps__ (from ex3_utils.py).

In [None]:
# We define a auxillary model

model_aux = tf.keras.Model(inputs, featureMap1) # first conv layer as a target
#model_aux = tf.keras.Model(inputs, featureMap2) # second conv layer as a target
#model_aux = tf.keras.Model(inputs, featureMap3) # third conv layer as a target

In [None]:
# We first select a random training image
random_index = np.random.randint(data_images_train.shape[0])
demo_image = data_images_train[random_index,]
demo_image = np.expand_dims(demo_image, axis=0) # enforce the shape of a batch of 1 image

In [None]:
# We feed the selected image to the auxillary network
featureMaps = model_aux.predict(demo_image) # returns a tensor with shape [1,Height,Width,Channels]

plot_featureMaps(demo_image, featureMaps, maxNbChannels = 8)

# Exercises

## The effect of the model architecture

Experiment with different neural network architectures and observe the training process (using the training and test loss curves), the kernels of the first layer (using the visualization of the kernels) and the classification accuracy
You can create new neural network architectures by varying the width of the network (__number of channels__), changing the __pooling operations__ (average pooling, global pooling).  

You can try varying the __kernel size__, __padding__ or __stride__ parameters of the convolutionl layers. Observe the effect on the learning curves and intermediate feature maps. Make sure that the shape of the feature maps stays coherent and that the full network always produces a single scalar output for a given input image.

## The effect of the training procedure and regularization

Experiment with different parameters of the optimizer(learning rate and the momentum) and observe the training process using the training and test loss curves. Try varying the $L_2$ regularization factor. What is the effect of increasing the regularization on the appearance of the convolutional kernels? You can also experiment with using different optimizers (a list of supported optimizers is available [here](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers).
