# Convolutional Neural Network Tutorial

Build a convolutional neural network with Keras.

This example shows how to build a convolutional neural network classifier to classify digits 0-9 in the MNIST dataset.

**Author: Vikas Nataraja**

## CNN Overview

![CNN](http://personal.ie.cuhk.edu.hk/~ccloy/project_target_code/images/fig3.png)

## MNIST Dataset Overview

This example is using MNIST handwritten digits. The dataset contains 60,000 examples for training and 10,000 examples for testing. The digits have been size-normalized and centered in a fixed-size image (28x28 pixels) with values from 0 to 1. 

![MNIST Dataset](http://neuralnetworksanddeeplearning.com/images/mnist_100_digits.png)

More info: http://yann.lecun.com/exdb/mnist/

# Objective

The goal of this tutorial is to show you how a convolutional neural network (CNN) functions by walking you through the steps involved in building it, testing it, and deploying it. In this example, we will try to train a CNN so that it learns how to identify digits. So, given an image of a digit (0 - 9), the CNN should tell us what digit is displayed in the image.

## Import the necessary libraries

In [None]:
# future is imported to allow the use of different versions of Python
from __future__ import division, print_function, absolute_import
import warnings
warnings.filterwarnings('ignore')

# Force use of CPUs instead of GPUs
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'

import numpy as np
import matplotlib.pyplot as plt
from random import randint

# Keras is a deep learning that is used to build the neural network
import keras
import tensorflow as tf
from keras.layers import Input, Conv2D, MaxPooling2D, Dense, Flatten

## Read in the dataset 

* Training images are loaded into `train_images` as numpy arrays, corresponding labels (ground truth) are read into another numpy array called `train_labels`
* A similar procedure is followed for test images.
* The arrays are cast to 32 bit float because Keras takes in float images
* We also normalize the pixels so that all the input images always have pixel values between 0 - 1.

* `train_images` will be a 3D array - (num_of_images, height, width)
* `train_labels` will be a 1D array of 0-9 because we are trying to predict digits 0-9. So the labels are solutions/ground truth to the coresponding `train_images`
* `test_images` will be a 3D array - (num_of_images, height, width)
* `test_labels` will be a 1D array of 0-9 because we are trying to predict digits 0-9. So the labels are solutions/ground truth to the corresponding `test_labels`


In [None]:
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
num_classes = 10           # we have 10 digits so we need 10 classes

train_images = np.float32(train_images)/255. # normalize pixel values
train_images = np.expand_dims(train_images, -1) # add a channel dimension as this is what Keras expects
test_images = np.float32(test_images)/255.  
test_images = np.expand_dims(test_images, -1)

# convert class vectors to binary class matrices
train_labels = keras.utils.to_categorical(train_labels, num_classes)
test_labels  = keras.utils.to_categorical(test_labels, num_classes)

## Size of data

In [None]:
print('Number of training images: ',train_images.shape[0])
print('Each training image is of size: ',train_images.shape[1:])
print('Number of training labels: ',train_labels.shape[0],'\n')
print('Number of test images: ',test_images.shape[0])
print('Each test image is of size: ',test_images.shape[1:])
print('Number of test labels: ',test_images.shape[0])

## Visualizing the dataset - what does the MNIST dataset look like?

* Here, we visualize the training set. Each time this cell block is run, random images from the training set will be displayed

In [None]:
fig = plt.figure(figsize=(16, 16))
columns = 4
rows = 5
for i in range(1, columns*rows +1):
    # show random images from the dataset
    random_range = randint(0, train_images.shape[0]-1)
    img = train_images[random_range]
    fig.add_subplot(rows, columns, i)
    plt.imshow(img, cmap='gray')
    plt.title('Ground truth label = {}'.format(np.argmax(train_labels[random_range])))
    plt.axis('off')
plt.show()

## Model directory

This is where the entire model will be saved on your computer. 

In [None]:
#########################################################
# CHANGE THIS DIRECTORY TO ANY DIRECTORY ON YOUR SYSTEM!!!
# This is where the model will be saved after it is run
#########################################################
model_dir = os.getcwd()

## Hyperparameters

These are the parameters that can be tuned depending on how training goes. For example, 

    * If the model is too slow to train, decrease the batch size.
    * If the loss is still high after training, increase the number of steps
    * If the model is learning too slowly or is not converging faster, increase the learning rate.

        
**Batch size** 
- The number of images to be taken and trained before updating the weights.
- This is done to reduce the memory consumption. Instead of training all images at the same time, we do it batch-by-batch.
- Batch sizes are usually in powers of 2 e.g 16, 32, 64, 128 ...
- For example if your total number of training images = 2000 and batch size is 128 then we get 15 full batches and the final batch will have the remaining images.
- Higher batch size almost always gives better accuracies but will be computationally slow

**Learning rate**
- It is a number between 0 and 1
- Dictates how fast your optimization moves. A popular optimization algorithm is gradient descent.
- Learning rates are usually set in tenths like 0.1, 0.01 etc,. It is a good idea to start with a very low value and gradually increase

**Number of epochs**
- 1 epoch = the model has seen one full pass of the entire training set.
- Since we use batches, 1 epoch will be competed after (dataset_size/batch_size) steps
- Generally speaking, it is common to train for 100,000 or even 500,000 epochs. For this example we choose a small number - 2000


# Tuning the hyperparameters
Feel free to play around with some of these "hyperparameters" which are the knobs you turn to affect performance. For instance, you could modify the `batch_size` to be 8 instead of 16.

You could also run the entire model as is first and then reset, come back here and change something, run the model again to see what changed.

In [None]:
## define architecture details
input_shape = (28, 28, 1)     # because our images are 28 x 28 pixels across
num_filters = [16, 32, 64] # filters for convolution => it's usually a good idea to double the number of filters in each step
kernel_size = (3, 3)       # this is the convolution kernel
num_epochs  = 10         # number of epochs to train for
learning_rate = 0.001      # learning rate for the model weights. recommended to start with a low number like 0.0001
batch_size = 16           # batch size of the training set

## Define the layers in the network

* The convolutional layer, max pooling layer, fully connected layer are all defined here.
* This is essentially the bulk of the work in creating your model

In [None]:
# Create the neural network

def model_architecture(input_shape:tuple, num_filters:list, kernel_size:tuple, num_classes:int):
    model = keras.Sequential(
    [
        Input(shape=input_shape),
        Conv2D(num_filters[0], kernel_size=kernel_size, activation="relu"),
        MaxPooling2D(pool_size=(2, 2)),
        Conv2D(num_filters[1], kernel_size=kernel_size, activation="relu"),
        MaxPooling2D(pool_size=(2, 2)),
        Conv2D(num_filters[2], kernel_size=kernel_size, activation="relu"),
        MaxPooling2D(pool_size=(2, 2)),
        Flatten(),
        Dense(100, activation="relu"),
        Dense(num_classes, activation="softmax"), # we want probability as the output
    ])

    return model

# Let's train the model!

* This takes time. Depending on the dataset, the network intricacies and the epochs, training time could take anywhere between seconds to days or even weeks.
* For example, training ChatGPT took several months
* Often in atmospheric science, we train models on GPUs on supercomputers which can exponentially increase the speed!

In [None]:
# Build the model
model = model_architecture(input_shape, num_filters, kernel_size, num_classes)

# callbacks to the model
# 1. stop the model early if it is not learning anything
# 2. save the model (checkpoint) only if the loss has improved
callbacks = [keras.callbacks.EarlyStopping(patience=50, verbose=1),
             keras.callbacks.ModelCheckpoint(filepath='cnn_model.h5', monitor="val_loss", save_best_only=True, verbose=1)]

# our cost function or loss function is cross-entropy which is probabalistic
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

# # fit the model to the dataset X: train_images, y: train_labels
history = model.fit(train_images, train_labels, 
          batch_size=batch_size, epochs=num_epochs, 
          validation_split=0.1, verbose=1,
          callbacks=callbacks)

# Analyze how the training went

Training and validation loss should be close to each other. 

- If training loss is low, but validation loss is much higher, that means the model "overfitted"
    - That means the model probably memorized the training data and when confronted with a new, unseen image, did not know how to respond
- If training loss is low, and validation loss is also low, that means the model did a good job
- If both losses are high, that means the model did not learn anything and "underfitted"

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(20, 6))

# first plot will be of accuracies
ax[0].plot(history.history['accuracy'], color='blue', label='training')
ax[0].plot(history.history['val_accuracy'], color='orange', label='validation')
ax[0].set_title('Model Accuracy')
ax[0].set_ylabel('Accuracy')
ax[0].set_xlabel('Epoch')
ax[0].legend()

# second plot will be of the errors
ax[1].plot(history.history['loss'], color='blue', label='training')
ax[1].plot(history.history['val_loss'], color='orange', label='validation')
ax[1].set_title('Model Loss')
ax[1].set_ylabel('Loss')
ax[1].set_xlabel('Epoch')
ax[1].legend(['training', 'validation'])

plt.show()

# Test the trained model on test set

Now that the model has been trained, you can test it on new images i.e your test images

In [None]:
score = model.evaluate(test_images, test_labels, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

## Visualize the results

In [None]:
# Use the model to predict the images class
predictions = model.predict(test_images, verbose=0)

# Display
fig = plt.figure(figsize=(16, 16))
columns = 4
rows = 5
for i in range(1, columns*rows +1):
    fig.add_subplot(rows, columns, i)
    plt.imshow(test_images[i], cmap='gray')
    plt.title('Predicted label = {}\nConfidence = {:0.2f}'.format(np.argmax(predictions[i]), np.max(predictions[i])), pad=5)
    plt.axis('off')
fig.subplots_adjust(hspace=0.3)
plt.show()

# Take a picture and test your CNN!

Let's see if the model can recognize a live image. 

1. Draw a single digit, take a picture of it and store it somewhere on your computer (must be a png or jpg file)
2. Update `filepath` to that image (full path required)
3. Call the function below `test_real_digit(filepath)` to see what the model predicts!


NOTE: You might need to install the package pillow using this command: `conda install -c anaconda pillow`

In [None]:
filepath = "some/file/path/here/" # can be jpg or png

After updating the filepath in the above cell, run the cell below to test the model for your image!

In [None]:
def test_real_digit(filepath, model):
    try:
        from PIL import Image
        import PIL.ImageOps 
    except ModuleNotFoundError:
        print("<IMPORT ERROR>: Please install the `pillow` package to proceed. Use `conda install -c anaconda pillow`")
        
    img = Image.open(filepath)
    img = img.resize((28, 28), Image.Resampling.LANCZOS) # resize to 28 x 28
    img = img.convert('L') # convert to grayscale
    img = PIL.ImageOps.invert(img) # invert image so the digit is white, background is black
    img = np.expand_dims(img, -1) # add extra dimension for Keras
    img = np.expand_dims(img, 0)  # add extra dimension for faking batch size
    img = img.astype('float')/img.max() # normalize it like we did during training
    prediction = model.predict(img, verbose=0)[0]
    return img, prediction

im, test_pred = test_real_digit(filepath, model)

# Visualize it
fig, ax = plt.subplots(1, 1, figsize=(6, 6))
ax.imshow(im[0, :, :, 0], cmap='gray')
ax.set_xticks([])
ax.set_yticks([])
ax.set_title('Predicted label = {}\nConfidence = {:0.2f}'.format(np.argmax(test_pred), np.max(test_pred)), pad=5)
plt.show()