# Deep fully connected neural networks for image classification
<span style="font-size:9pt;">
author: MWLafarge (m.w.lafarge@tue.nl); affiliation: Eindhoven University of Technology; created: Feb 2020
</span>

## Preliminaries
In this exercise you will implement a deep fully connected neural network for image classification using the __Keras__ framework with a Tensorflow backend.  
The goal is to analyze the training of __deep__ neural network models for complex problems with high dimensionality.

This exercise will focus on a binary classification problem with histopathology images. 

## Import the required libraries

In [None]:
# environement setup
%load_ext autoreload
%autoreload 2
%matplotlib widget
#%matplotlib inline

# system libraries
import sys
import os

# computational libraries
import numpy as np
import tensorflow as tf

# utility functions for this exercise
from utils_ex2 import plot_image_batch, Monitoring

## Importing the dataset

The dataset for this exercise is derived from the __PCam__ benchmark dataset created by B. Veeling et al. ([github.com/basveeling/pcam](https://github.com/basveeling/pcam)) that in turn originates from the [CAMELYON16 challenge](https://camelyon16.grand-challenge.org/). 

The PCam dataset consist of RGB color images of size 96 $\times$ 96 pixels extracted from histological sections of sentinel lymph node tissue of breast cancer patients. Each image is annotated with a binary label indicating presence of metastatic tissue in the patch.

For the purpose of this exercise, the original PCam dataset was subsampled to 20000 training images and 2000 test images, balanced across the two classes.  
Furthermore, to enable faster processing the images were cropped to the central region of 64 $\times$ 64 pixels.

In [None]:
PATH_BASE_DIR = os.getcwd() # current working directory

def load_data_helper(path):
    data_dict = np.load(path) 
    data_points = data_dict["images"]
    data_labels = data_dict["labels"]
    
    return data_points, data_labels

path_train = PATH_BASE_DIR + os.sep + "../data/data_smallPCam_training.npz"
path_test  = PATH_BASE_DIR + os.sep + "../data/data_smallPCam_test.npz"
data_images_train, data_labels_train = load_data_helper(path_train)
data_images_test, data_labels_test   = load_data_helper(path_test)
    
print("Imported training points: ", data_images_train.shape)
print("Imported training labels: ", data_labels_train.shape)

print("Imported test points: ", data_images_test.shape)
print("Imported test labels: ", data_labels_test.shape)

First, let's visualize a few examples.  As it can be seen, the class label of a patch (1: tumor, 0: not tumor) is not obvious to non-experts (this task experience with analyzing histopathology images). 

In [None]:
random_indices = [np.random.randint(data_images_train.shape[0]) for _i in range(8)]

batch_images = data_images_train[random_indices,]
batch_labels = data_labels_train[random_indices,]

plot_image_batch(batch_images, batch_labels)

## Creating a classifier
We want to create a model that takes high-dimensional vectors (images) as input and outputs the likelihood of the class label.

### Defining the graph operations
As with the first exercise, let's first instantiate all the components that we will use to construct the network:
- __tf.keras.Input()__ placeholder that takes vectors of size 64$\times$64$\times$3.  
- several __tf.keras.layers.Dense()__ layers that will constitute the network.

  
Note that the high-dimensionality of the input creates a risk for the model to overfit the training dataset, so some sort of mitigation strategy is required. One option is to use regularization such as $L_2$ regularization. We will also define a $L_2$ regularizer (__tf.keras.regularizers.l2()__) that will be used during training. The regularizer can be passed as an argument during the instantiation of the layers. __tf.keras__ includes other regularizers that can be easily associated with the model components (see [tf.keras.regularizers documentation](https://www.tensorflow.org/api_docs/python/tf/keras/regularizers)).

In [None]:
# regularizer object
weight_decay = 0.01
regularizer = tf.keras.regularizers.l2(l=weight_decay)

# placeholder for the image data
# note that the image data is flattened to one dimension
images_size = 64 * 64 * 3
inputs = tf.keras.Input(shape=(images_size,))

# model components
layer1   = tf.keras.layers.Dense(64, activation='relu', kernel_regularizer=regularizer)
layer2   = tf.keras.layers.Dense(16, activation='relu', kernel_regularizer=regularizer)
layer3   = tf.keras.layers.Dense(8, activation='relu', kernel_regularizer=regularizer)

layerOut = tf.keras.layers.Dense(1, activation='sigmoid')

### Connecting the graph and instantiating the model
We can now join the graph components to define the model __output__ as a function of the __input placeholder__.

In [None]:
outputs = layerOut(layer3(layer2(layer1(inputs))))

model = tf.keras.Model(inputs, outputs)

### Preparing to train the model

As before, we will use binary cross entropy loss and stochastic gradient descent with momentum.

In [None]:
# cross-entropy loss between the distribution of ground truth labels and the model predictions
loss = tf.keras.losses.BinaryCrossentropy()

# stochastic gradient descent with momentum
optimizer = tf.keras.optimizers.SGD(
    learning_rate = 0.01,
    momentum      = 0.9)

### Compiling the model
We finally configure the model for training by indicating the loss, the optimizer and performance metrics to be computed during training.  

We use __model.summary()__ to display and check the model architecture we implemented.

In [None]:
model.compile(
    optimizer = optimizer,
    loss      = tf.keras.losses.BinaryCrossentropy(),
    metrics   = ["accuracy"])

model.summary()

### Creating a data generator

Since this is a large dataset, it is __memory-wise expensive and unnecessary to load the whole dataset in memory__. Therefore, a solution is to create a generator that will read random __mini-batches__ of samples every training iteration.

We can define a __"generator"__ object with __tf.keras__ that will be called automatically during the training loop. We can implement our custom generator as a class that inherits from __tf.keras.utils.Sequence__. At every iteration, keras expects the method __\_\_getitem\_\___ to return 2 tensors: 
- a batch of images
- a batch of the corresponding labels.

In [None]:
class dataGenerator(tf.keras.utils.Sequence):
    """ DataGenerator herited from tf.keras.utils.Sequence
        Input: image data, label data
        __getitem__: Returns random samples (mini-batches) drawn from the data
    """
    
    def __init__(self, data_images, data_labels):
        self.batch_size = 32
        
        self.data_images = data_images
        self.data_labels = data_labels
        self.data_size   = data_images.shape[0]
        
        self.scan_idx    = 0

    def __len__(self):
        return int(self.data_size / self.batch_size)

    def __getitem__(self, idx):
        batch_images = []
        batch_labels = []
        for _i in range(self.batch_size):
            batch_images.append(self.data_images[self.scan_idx,])
            batch_labels.append(self.data_labels[self.scan_idx])
        
            self.scan_idx += 1 # Loop over available images
            self.scan_idx %= self.data_size
            
        batch_images = np.stack(batch_images, axis=0)
        batch_labels = np.array(batch_labels)
        
        shape_flat = [-1, batch_images.shape[1] * batch_images.shape[2] * batch_images.shape[3]]
        batch_images = np.reshape(batch_images, shape_flat)
        
        batch_images = 2.0 * (batch_images / 255.0 - 0.5) # Images are rescaled in [-1,1]
        
        return batch_images, batch_labels

Then, we can instantiate 2 generators: one to generate mini-batches of the training data, one to generate mini-batches of the test data.

In [None]:
dataGenerator_train = dataGenerator(data_images_train, data_labels_train)
dataGenerator_test  = dataGenerator(data_images_test, data_labels_test)

Finally lets check what keras receives when the generator are called:

In [None]:
arbitrary_iteration_idx = 42
train_images_batch, train_labels_batch = dataGenerator_train[arbitrary_iteration_idx]

print("Mini-batch of images has shape: ", train_images_batch.shape)

## Training the model and monitoring the training process

Now that our generators are instantiated, we can train our model.
This time, we will use __tf.keras.Model.fit\_generator__ function to start the training procedure by feeding data generators instead of data arrays. Look at the documentation of [tf.keras.Model.fit\_generator](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit\_generator) for more details.

To get a better insight of the learned weights, the Monitor callback for this exercise will show visualizations of the weights of the first layer. The weights are visualized as images. Note that the output of each neuron in the first layer is the input image $\times$ weights image + bias.

In [None]:
nbEpochs  = 500

model.fit_generator(
    generator = dataGenerator_train,
    steps_per_epoch = 10,
    epochs          = nbEpochs,

    validation_data  = dataGenerator_test,
    validation_freq  = 10,
    validation_steps = 1,

    verbose   = 1,
    callbacks = [Monitoring(layerTracking=layer1)])

## Evaluating the model
Once the model is trained, we want to quantify its performances on the hold-out test set. 

In [None]:
# We evaluate the model on test data
eval_out = model.evaluate_generator(
    generator = dataGenerator_test,
    verbose   = 1)

print("Accuracy on test set: ", eval_out[1])

## Visualizing some predictions
Let's select a sample of test images and compare the predictions with the true class labels.

In [None]:
# First we use the test generator to create batches of test images
arbitrary_iteration_idx = 42
batch_test_images, batch_test_labels = dataGenerator_test[arbitrary_iteration_idx]
batch_test_images = batch_test_images[:8] # Selection of a small sample
batch_test_labels = batch_test_labels[:8]

# Then we get the prediction of the mode
tensor_predictions = model.predict(
    x = batch_test_images)

# We format the results in a list for visualization
list_results = ["True Class [{}] \n P(y=1|x) = {}".format(true_y, str(pred_y[0])[:5]) 
                for true_y, pred_y in zip(batch_test_labels, tensor_predictions)]

# We reshape the images to their format of origin
batch_images_unflattened = np.reshape(batch_test_images, [-1,64,64,3]) #-- Unflattening
batch_images_unflattened = 255.0 * (batch_images_unflattened + 1.0) / 2.0 #-- Rescaling
batch_images_unflattened = batch_images_unflattened.astype(np.uint8) #-- 8-bit conversion

plot_image_batch(batch_images_unflattened, list_results) #-- We finally visualize the results

# Exercises

## The effect of the model architecture

Experiment with different neural network architectures and observe the training process (using the training and test loss curves), the parameters of the first layer (using the visualization of the weights) and the classification accuracy. You can create new neural network architectures by varying the depth of the network (number of layers) and/or the number of neurons in each layer.

## The effect of the training procedure and regularization

Experiment with different parameters of the optimizer(learning rate and the momentum) and observe the training process using the training and test loss curves. Try varying the $L_2$ regularization factor. What is the effect of increasing the regularization on the appearance of the weight images?
