# Bonus Homework 7: CIFAR10 Image Classification problem

We train two small convolutional neural networks on the CIFAR10 dataset. 

Data can be downloaded from the [University of Toronto](https://www.cs.toronto.edu/~kriz/cifar.html) Web Site.
This homework assumes you have extracted the contents of the file `https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz` some where accessible in your hard drive.


This optional homework requires you to train a Convolutional Neural Networ using the CIFAR10 dataset. Training will be **much faster** if you run this notebook on a machine with `GPU`. Follow the instructions provided with this notebook to provision a machine on `gcloud` and configure it to be be able to run this notebook.

## Preliminaries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tensorflow import keras

print(tf.__version__)

import pickle

%matplotlib inline

from sklearn.model_selection import train_test_split

%matplotlib inline

import sys
sys.path.append("../..")
from E4525_ML.notebook_utils import get_logger,LoggingCallback

We set up a logger so that we can save to a file intermediate calculation results as they occur in the notebook

In [None]:
logger=get_logger("Bonus_Homework_CIFAR10.log")

The following lines avoid crashes on certain `NVDIA` `GPU`s for some versions of `tensorflow`. It may not be needed on the `gcloud` machines as configured.

In [None]:
#
from tensorflow.keras.backend import set_session

config = tf.ConfigProto()
config.gpu_options.allow_growth = True  # dynamically grow the memory used on the GPU
#config.log_device_placement = True  # to log device placement (on which device the operation ran)
                                    # (nothing gets printed in Jupyter, only if you run it standalone)

sess = tf.Session(config=config)

set_session(sess)  # set this TensorFlow session as the default session for Keras

## Data Preparation

`CIFAR10` s separated into batches, we read them all and aggregate them.

<div class="alert alert-block alert-info"> Problem 0.0 </div>
Download the CIFAR-10 image data set from `https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz` extract its contents into the `raw_data_dir` directory defined below.


In [None]:
raw_data_dir="../../raw/CIFAR-10-Images/"

<div class="alert alert-block alert-info"> Problem 0.1 </div>
Process the main and test  datasets  using the formulas below.

In [None]:
def unpickle(filename):
    import pickle
    with open(filename, 'rb') as fo:
        dict = pickle.load(fo, encoding='bytes')
    return dict

Note that we re-scale colors to the range $[0,1]$ 

In [None]:
def process(filename):
    dic=unpickle(filename)
    labels=dic[b"labels"]
    data=dic[b"data"]
    images=data.reshape(10000, 3, 32, 32).transpose(0,2,3,1).astype("uint8")
    return images/255.0,labels

In [None]:
images=[]
labels=[]
for idx in range(1,6):
    filename=raw_data_dir+f"data_batch_{idx}"
    img,lab=process(filename)
    images.append(img)
    labels.append(lab)

images=np.concatenate(images)
labels=np.concatenate(labels)

images.shape,labels.shape

In [None]:
filename=raw_data_dir+f"test_batch"
images_test,labels_test=process(filename)

In [None]:
filename=raw_data_dir+f"batches.meta"
meta_data=unpickle(filename)
label_names=meta_data[b"label_names"]
label_names

The images are truly tiny and have very low resolution!

In [None]:
plt.figure(figsize=(12,6))
for i in range(8):
    plt.subplot(2, 4, i+1)
    plt.imshow(images[i])
    label=label_names[labels[i]].decode("ascii")
    plt.title(label)

<div class="alert alert-block alert-info"> Problem 0.2 </div>
Separate the main dataset into a training set and a valuation set with 20% of the data

## Simple  Model

<div class="alert alert-block alert-info"> Problem 1.0 </div>
Using the `keras` library build a convolutional network containing the following layers
1. A Convolutional layer with a $5\times 5$ kernel, 32 output channels and `relu` activation
2. A Max pooling layer with a $2\times 2$ stride.
3. A 30% dropout layer
4. A Convolutional layer with a $5\times 5$ kernel, 64 output channels and `relu` activation
5. A Max pooling layer with a $2\times 2$ stride.
6. A 30% dropout layer
7. A Dense layer with 128 hidden units and `relu` activation
8. A 25% dropout layer
9. A Dense layer with 64 hidden units and `relu` activation
10. A 25% dropout layer
11. A final softmax layer that output the probabilities of the sample belonging to each one of the image classes.

<div class="alert alert-block alert-info"> Problem 1.1 </div>
Compile the model. Set it up so that it uses the `Adam` optimizer.

<div class="alert alert-block alert-info"> Problem 1.2 </div>
How many learnable parameters are in each layer, an in total?

<div class="alert alert-block alert-info"> Problem 1.3 </div>
Train the model over 100 epochs, and a batch size of 128. 

Use the validation sets you created before for validation as you optimize.

Make sure to collect the results returned by the `fit` method.

<div class="alert alert-block alert-info"> Problem 1.4 </div>
Plot the training history of loss and accuracy as a function of traning epoch for both training and validation set.

<div class="alert alert-block alert-info"> Problem 1.4 </div>
What accuracy did you achieve on the validation set?

## Larger Model

<div class="alert alert-block alert-info"> Problem 2.0 </div>
Using the `keras` library build a convolutional network containing the following layers
1. A Convolutional layer with a $3\times 3$ kernel, 32 output channels and `relu` activation
3. A 25% dropout layer
1. A Convolutional layer with a $3\times 3$ kernel, 32 output channels and `relu` activation
2. A Max pooling layer with a $2\times 2$ stride.
3. A 25% dropout layer
1. A Convolutional layer with a $3\times 3$ kernel, 32 output channels and `relu` activation
3. A 25% dropout layer
4. A Convolutional layer with a $3\times 3$ kernel, 64 output channels and `relu` activation
2. A Max pooling layer with a $2\times 2$ stride.
3. A 50% dropout layer
4. A Convolutional layer with a $3\times 3$ kernel, 64 output channels and `relu` activation
3. A 25% dropout layer
5. A Convolutional layer with a $3\times 3$ kernel, 1024 output channels and `relu` activation
6. An Average pooling layer wit $8\times 8$ stride.
3. A 25% dropout layer
7. A Dense layer with 512 hidden units and `relu` activation
8. A 40% dropout layer
9. A Dense layer with 256 hidden units and `relu` activation
10. A 40% dropout layer
11. A final softmax layer that output the probabilities of the sample belonging to each one of the image classes.

<div class="alert alert-block alert-info"> Problem 2.1 </div>
Compile the model. Set it up so that it uses the `Adam` optimizer.

<div class="alert alert-block alert-info"> Problem 2.2 </div>
How many parameters are in total? What layers contribute the most learnable parameters?

<div class="alert alert-block alert-info"> Problem 2.3 </div>
Train the model over 100 epochs, and a batch size of 128. 

Use the validation sets you created before for validation as you optimize.

Make sure to collect the results returned by the `fit` method.

<div class="alert alert-block alert-info"> Problem 2.4 </div>
Plot the training history of loss and accuracy as a function of traning epoch for both training and validation set.

<div class="alert alert-block alert-info"> Problem 2.4 </div>
What accuracy did you achieve on the validation set?

# Test of Best Model

<div class="alert alert-block alert-info"> Problem 3.0 </div>
Of the two models that you have trained, select the one that had better performance on the valuation set.
Refit the model (100 epochs) to all the training + validation data and test is performance with the test dataset.

[**Note**: for faster convergence you can re-start fitting the model you already learned on the training set]  

<div class="alert alert-block alert-info"> Problem 3.1 </div>
What accuracy did you achieve on the test set?