<table class="tfo-notebook-buttons" align="center">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/PracticalDL/Practical-Deep-Learning-Book/blob/master/code/chapter-5/1-develop-tool.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/PracticalDL/Practical-Deep-Learning-Book/blob/master/code/chapter-5/1-develop-tool.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

# Chapter 5 - From Novice to Master Predictor: Maximizing Convolutional Neural Network Accuracy


We explore strategies to maximize the accuracy that our classifier can achieve, with the help of a range of tools including TensorBoard, What-If Tool, tf-explain, TensorFlow Datasets, AutoKeras, AutoAugment. Along the way, we conduct experiments to develop an intuition of what parameters might or might not work for your AI task. [Read online here.](https://learning.oreilly.com/library/view/practical-deep-learning/9781492034858/ch05.html)

## Tools

In this file, we will develop a tool to experiment with various parameter settings of a model. One can choose amongst different kinds of augmentation techniques, use different datasets available in TensorFlow Datasets, choose to train either from scratch or use finetune from MobileNet or any model of your choice, all in the browser without any framework installs on your system. 

In [None]:
# Perform all installations
!pip install tensorflow-gpu==2.0.0
!pip install tensorflow-datasets
!pip install tensorwatch

# Get TensorBoard to run 
%load_ext tensorboard

In [None]:
# Import necessary packages
import tensorflow as tf
import tensorflow_datasets as tfds

# tfds makes a lot of progress bars and they take up a lot of screen space, so lets diable them
tfds.disable_progress_bar()

import math
import numpy as np
from tensorflow.keras.preprocessing import image
from tensorflow.keras.callbacks import CSVLogger

To make experiments reproducible across runs, we control the amount of randomization possible. Randomization is introduced in initialization of weights of models, randomized shuffling of data. 

Random number generators can be made reproducible by initializing a seed and that’s exactly what we will do. Various frameworks have their own ways of setting a random seed, some of which are shown below:

In [None]:
tf.random.set_seed(1234)
np.random.seed(1234)

In [None]:
# Variables
BATCH_SIZE = 32
NUM_EPOCHS = 100
IMG_H = IMG_W = 224
IMG_SIZE = 224
LOG_DIR = './log'
DEGREES = 10  #for rotation
SHUFFLE_BUFFER_SIZE = 1024
IMG_CHANNELS = 3

## Dataset

Choose the dataset that we want to experiment on. We have tried to build this tool in such a way that it works with all the image datasets available in `TensorFlow Datasets`.

To see all available datasets in `TensorFlow Datasets`,  use the following `print` command.

In [None]:
# View all available datasets
print(tfds.list_builders())

In [None]:
# Choose dataset

#dataset_name = "colorectal_histology"
#dataset_name = "caltech101"
#dataset_name = "oxford_flowers102"
dataset_name = "cats_vs_dogs"

### Dataset Preprocessing and Augmenting

Let's define some preprocessing and augmentation functions.

Note that bicubic resizing functionality is not available in TFDS yet

All `tf.image` augmentations defined at https://www.tensorflow.org/api_docs/python/tf/image.

In [None]:
def preprocess(ds):
    x = tf.image.resize_with_pad(ds['image'], IMG_SIZE, IMG_SIZE)
    x = tf.cast(x, tf.float32)
    x = (x / 127.5) - 1
    return x, ds['label']


def augmentation(image, label):
    #image = tf.image.resize_with_crop_or_pad(image, IMG_SIZE, IMG_SIZE)

    # Random Crop: randomly crops an image and fits to given size
    #image = tf.image.random_crop(image,[IMG_SIZE, IMG_SIZE, IMG_CHANNELS])

    # Brightness: Adjust brightness by a given max_delta
    image = tf.image.random_brightness(image, .1)

    # Random Contrast: Add a random contrast to the image
    image = tf.image.random_contrast(image, lower=0.0, upper=1.0)

    # Flip: Left and right
    image = tf.image.random_flip_left_right(image)

    # Rotation: Only 90 degrees is currently supported.
    # Not all images still look the same after a 90 degree rotation
    # Most images are augmented by a 10-30 degree tilt
    #image = tf.keras.preprocessing.image.random_rotation(image,10)

    # Finally return the augmented image and label
    return image, label

### Dataset Loading

Develop handy functions to load training and validation data.

Some of the datasets in `TensorFlow Datasets` do not have a `validation` split. For those datasets we take a small percentage of samples from the `training` set, and treat it as the `validation` set. Splitting the dataset using the `weighted_splits` takes care of randomizing and shuffling data between the splits.

In [None]:
def get_dataset(dataset_name, *split_value):
    # see all possible splits in dataset
    _, info = tfds.load(dataset_name, with_info=True)
    #print(info)
    if "validation" in info.splits:
        # then load train and validation
        if len(split_value) == 1:
            print(
                "INFO: Splitting train dataset according to splits provided by user"
            )
            if "test" in info.splits:
                print('INFO: Test dataset is available')
                all_data = tfds.Split.TEST + tfds.Split.TRAIN
                print("INFO: Added test data to train data")
                train, info_train = tfds.load(dataset_name,
                                              split=all_data,
                                              with_info=True)
                NUM_CLASSES = info_train.features['label'].num_classes
                NUM_EXAMPLES = info_train.splits['train'].num_examples
            else:
                all_data = tfds.Split.TRAIN
            split_train, _ = all_data.subsplit(
                weighted=[split_value[0], 100 - split_value[0]])
            # Load train with the new split
            train, info_train = tfds.load(dataset_name,
                                          split=split_train,
                                          with_info=True)
        else:
            #load training dataset the without any user intervention way
            print("INFO: Loading standard splits for training dataset")
            train, info_train = tfds.load(dataset_name,
                                          split=tfds.Split.TRAIN,
                                          with_info=True)
        # Load validation dataset as is standard
        val, info_val = tfds.load(dataset_name,
                                  split=tfds.Split.VALIDATION,
                                  with_info=True)
        NUM_CLASSES = info_train.features['label'].num_classes
        NUM_EXAMPLES = info_train.splits['train'].num_examples
    else:
        # Validation not in default datasets
        print(
            "INFO: Defining a 90-10 split between training and validation as no default split exists."
        )
        # Here we have defined how to split the original train dataset into train and val
        # Use 90% as train dataset and 10% as validation dataset
        train, info_train = tfds.load(dataset_name,
                                      split='train[:90%]',
                                      with_info=True)
        val, info_val = tfds.load(dataset_name,
                                  split='train[90%:]',
                                  with_info=True)
        NUM_CLASSES = info_train.features['label'].num_classes
        # The total number of classes in training dataset should either be equal to or more than the total number of classes in validation dataset
        assert NUM_CLASSES >= info_val.features['label'].num_classes
        NUM_EXAMPLES = info_train.splits['train'].num_examples * 0.9

    # Standard processing for training and validation set
    IMG_H, IMG_W, IMG_CHANNELS = info_train.features['image'].shape
    if IMG_H == None or IMG_H != IMG_SIZE:
        IMG_H = IMG_SIZE
    if IMG_W == None or IMG_W != IMG_SIZE:
        IMG_W = IMG_SIZE
    if IMG_CHANNELS == None:
        IMG_CHANNELS = 3

    # Training specific processing
    train = train.map(preprocess).repeat().shuffle(SHUFFLE_BUFFER_SIZE).batch(
        BATCH_SIZE)
    train = train.map(augmentation)
    train = train.prefetch(tf.data.experimental.AUTOTUNE)

    # Validation specific processing
    val = val.map(preprocess).repeat().batch(BATCH_SIZE)
    val = val.prefetch(tf.data.experimental.AUTOTUNE)

    return train, info_train, val, info_val, IMG_H, IMG_W, IMG_CHANNELS, NUM_CLASSES, NUM_EXAMPLES

Now that we have defined all our helper functions, let's use them to get the dataset

In [None]:
train, info_train, val, info_val, IMG_H, IMG_W, IMG_CHANNELS, NUM_CLASSES, NUM_EXAMPLES = get_dataset(
    dataset_name, 100)

print("\n\nIMG_H, IMG_W", IMG_H, IMG_W)
print("IMG_CHANNELS", IMG_CHANNELS)
print("NUM_CLASSES", NUM_CLASSES)
print("BATCH_SIZE", BATCH_SIZE)
print("NUM_EXAMPLES", NUM_EXAMPLES)
print("NUM_EPOCHS", NUM_EPOCHS)

# If you want to print even more information on both the splits, uncomment the lines below:
#print(info_train)
#print(info_val)

Great! 

## Training

Here we decide what kind of training to perform. 

We have defined a model from scratch as well as a model that performs transfer learning on MobileNet. 

Depending on what dataset we chose, and how different that dataset is from `ImageNet dataset`, we may learn a lot from the experiments on both kinds of trainings.

In [None]:
#choose "scratch" to train a new model from scratch
training_format = "scratch"

#choose transfer_learning to use finetuning on mobilenet
training_format = "transfer_learning"

In [None]:
#Allow TensorBoard callbacks
tensorboard_callback = tf.keras.callbacks.TensorBoard(LOG_DIR,
                                                      histogram_freq=1,
                                                      write_graph=True,
                                                      write_grads=True,
                                                      batch_size=BATCH_SIZE,
                                                      write_images=True)

### Model definition for training from scratch

In [None]:
def create_model():
    model = tf.keras.Sequential([
        tf.keras.layers.Conv2D(32, (3, 3),
                               activation='relu',
                               input_shape=(IMG_SIZE, IMG_SIZE, IMG_CHANNELS)),
        tf.keras.layers.MaxPool2D(pool_size=(2, 2)),
        tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
        tf.keras.layers.MaxPool2D(pool_size=(2, 2)),
        tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
        tf.keras.layers.MaxPool2D(pool_size=(2, 2)),
        tf.keras.layers.Dropout(rate=0.3),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dropout(rate=0.3),
        tf.keras.layers.Dense(NUM_CLASSES, activation='softmax')
    ])
    return model


def scratch(train, val, learning_rate):
    model = create_model()
    model.summary()
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy'])
    earlystop_callback = tf.keras.callbacks.EarlyStopping(
        monitor='val_accuracy', min_delta=0.0001, patience=5)
    csv_logger = CSVLogger('colorectal-scratch-' + 'log.csv',
                           append=True,
                           separator=';')

    model.fit(train,
              epochs=NUM_EPOCHS,
              steps_per_epoch=int(NUM_EXAMPLES / BATCH_SIZE),
              validation_data=val,
              validation_steps=1,
              validation_freq=1,
              callbacks=[tensorboard_callback, earlystop_callback, csv_logger])
    return model

### Model Defination for fine-tuning

We will be using the MobileNet model for fine-tuning. We can decide how many layers of the model to train by un-freeze the top layers of the model as follows:

```
#Our unfreeze_percentage variable helps us decide how many layers to unfreeze
unfreeze_percentage = 0.75

#Initially set all layers to fixed, i.e., not trainable
mobile_net.trainable=False
  
#Find total number of layers in base model 
num_layers = len(mobile_net.layers)
print("Total number of layers in MobileNet: ", num_layers)
 
#Set the last few layers to be trainable
for layer_index in range(int(num_layers - unfreeze_percentage*num_layers), num_layers):
    print(layer_index, mobile_net.layers[layer_index])
    
    #set the layer to be trainable
    mobile_net.layers[layer_index].trainable = True
```

All you need to do is unfreeze the last few layers in the `mobile_net` model or set them trainable. Then, you should recompile the model (necessary for these changes to take effect), and resume training.

In [None]:
def transfer_learn(train, val, unfreeze_percentage, learning_rate):
    mobile_net = tf.keras.applications.MobileNet(input_shape=(IMG_SIZE,
                                                              IMG_SIZE,
                                                              IMG_CHANNELS),
                                                 include_top=False)
    # Use mobile_net.summary() to view the model
    mobile_net.trainable = False
    # Unfreeze some of the layers according to the dataset being used
    num_layers = len(mobile_net.layers)
    for layer_index in range(
            int(num_layers - unfreeze_percentage * num_layers), num_layers):
        mobile_net.layers[layer_index].trainable = True
    model_with_transfer_learning = tf.keras.Sequential([
        mobile_net,
        tf.keras.layers.GlobalAveragePooling2D(),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(64),
        tf.keras.layers.Dropout(0.3),
        tf.keras.layers.Dense(NUM_CLASSES, activation='softmax')
    ], )
    model_with_transfer_learning.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
        loss='sparse_categorical_crossentropy',
        metrics=["accuracy"])
    model_with_transfer_learning.summary()
    earlystop_callback = tf.keras.callbacks.EarlyStopping(
        monitor='val_accuracy', min_delta=0.0001, patience=10)
    csv_logger = CSVLogger('colorectal-transferlearn-' + 'log.csv',
                           append=True,
                           separator=';')
    model_with_transfer_learning.fit(
        train,
        epochs=NUM_EPOCHS,
        steps_per_epoch=int(NUM_EXAMPLES / BATCH_SIZE),
        validation_data=val,
        validation_steps=1,
        validation_freq=1,
        callbacks=[tensorboard_callback, earlystop_callback, csv_logger])
    return model_with_transfer_learning

Recall that `tf.keras.layers.BatchNormalization` and `tf.keras.layers.Dropout` are applicable only during training. They are turned off when calculating validation loss.

Also, remember that training metrics report the average for an epoch, while validation metrics are evaluated after the epoch, so validation metrics see a model that has trained slightly longer.

Run the following cells to start the training and then come back up to look at TensorBoard.

## TensorBoard

Let's focus on what we can learn fom TensorBoard:

1. Visualize the training and validation accuracy and loss.
2. View the output from each layer by clicking on the `Images` tab. This takes some time to load so grab a coffee!

We can edit the size of the results we are looking at, add additional contrast and brightness.

3. Visualize the graph of the network that we just trained. 
4. The `Distributions` tab shows the weight distribution of the weight matrices of each of the layers. This is very useful when quantizing a model. And we will be learning more about this in the later chapters. We can view the histogram of this distribution in the `Histogram` tab.





Note: You can ALT+Scroll in and out for zoom

In [None]:
# Start TensorBoard
%tensorboard --logdir ./log

In [None]:
# select the percentage of layers to be trained while using the transfer learning
# technique. The selected layers will be close to the output/final layers.
unfreeze_percentage = 0

learning_rate = 0.001

if training_format == "scratch":
    print("Training a model from scratch")
    model = scratch(train, val, learning_rate)
elif training_format == "transfer_learning":
    print("Fine Tuning the MobileNet model")
    model = transfer_learn(train, val, unfreeze_percentage, learning_rate)

In [None]:
# Save the model to load it in the What-If tool

tf.saved_model.save(model, "tmp/model/1/")

In [None]:
# Load the saved model
loaded = tf.saved_model.load("tmp/model/1")
print(list(loaded.signatures.keys()))  # ["serving_default"]

In [None]:
# Zip the directory so that we can download it
!zip model.zip  tmp/model/* 

In [None]:
# If you are running this in Google Colab,
# Go to the content directory and download the trained model

!pwd

## Summary

In our experiment, we were able to modify how many layers of the MobileNet V2 base model we wanted to train. Our training process nudged the weights from a more generic feature set to features associated specifically to the dataset at hand. This is relevant for datasets that are quite different from the ImageNet dataset, or are much smaller.

As we learnt in this and the previous chapters, the higher up a layer is, the more specialized it is to the task at hand. The initial layers learn very simple features, like distinguishing an edge, a feature that is common to almost all images. Then, as we proceed to higher layers, features become more specific to the training dataset. 

Through fine-tuning, we attempted to nudge these higher-layer/specific features to work with a new dataset while still making use of the generic layers.