# A Simple Convolutional Neural Network

"Deep Learning" is a general term that usually refers to the use of neural networks with multiple layers that synthesize the way the human brain learns and makes decisions.

## Some (Very) Basic Neural Network Theory

Your brain works by connecting networks of neurons, each of which receives electrochemical stimuli from multiple inputs, which cause the neuron to fire under certain conditions. When a neuron fires, it creates an electrochemical charge that is passed as an input to one or more other neurons, creating a complex *feed-forward* network made up of layers of neurons that pass the signal on. An artificial neural network uses the same principles but the inputs are numeric values with associated *weights* that reflect their relative importance. The neuron take these input values and weights and applies them to an *activation function* that determines whether the artificial neuron should pass an output onto the next layer:

$$ \rightrightarrows\oint\rightarrow $$

As the human brain learns from experience, the inputs to the neurons are strenghtened or weakened depending on their importance to the decisions that the brain needs to make in response to stimuli. Similarly, you train an artificial neural network using a supervised leaning technique in which a *loss function* is used to evaluate how well the multi-layered model detects known labels. You can then find the derivative of the loss function to determine whether the level of error (loss) is reduced by increasing or decreasing the weights associated with the inputs, and then apply *backpropagation* to adjust the weights and improve the model iteratively over multiple training *epochs*. The result of this training is a deep learning model that consists of:
* An *input* layer to which the initial input variables are passed.
* One or more *hidden* layers in which the weights optimized by training determine the signal that is fed forward through the network.
* An *output* layer that presents the results.

## Convolutional Neural Networks (CNNs)
Convolutional Neural Networks, or *CNNs*, are a particular type of artificial neural network that works well with matrix inputs, such as images (which are fundamentally just matrices of pixel intensity values). There are various kinds of layer in a CNN, but a common architecture is to build a sequence of *convolutional* layers that find patterns in indvidual areas of the input matrix and *pooling* layers that aggregate these patterns. Additionally, some layers may *drop* data (which helps avoid *overfitting* the model to the training data), and finally some layers will *flatten* the matrix data and a *dense*, or *fully connected* layer will perform classification and reshape the predictions to conform with the expected output format.

### Convolutional Layers
Convolutional layers apply filters to a subregion of the input image, and *convolve* the filter across the image to extract features (such as edges, corners, etc.). For example, suppose the following matrix represents the pixels in a 6x6 image:

$$\begin{bmatrix}255 & 255 & 255 & 255 & 255 & 255\\255 & 255 & 0 & 0 & 255 & 255\\255 & 0 & 0 & 0 & 0 & 255\\255 & 0 & 0 & 0 & 0 & 255\\255 & 255 & 0 & 0 & 255 & 255\\255 & 255 & 255 & 255 & 255 & 255\end{bmatrix}$$

And let's suppose that a filter matrix is defined like this:

$$\begin{bmatrix}0 & 1 & 0\\0 & 1 & 0\\0 & 1 & 0\end{bmatrix}$$

The convolution layer applies the filter to the image matrix one "patch" at a time; so the first operation would apply to the <span style="color:red">red</span> elements below:

$$\begin{bmatrix}\color{red}{255} & \color{red}{255} & \color{red}{255} & 255 & 255 & 255\\\color{red}{255} & \color{red}{255} & \color{red}{0} & 0 & 255 & 255\\\color{red}{255} & \color{red}{0} & \color{red}{0} & 0 & 0 & 255\\255 & 0 & 0 & 0 & 0 & 255\\255 & 255 & 0 & 0 & 255 & 255\\255 & 255 & 255 & 255 & 255 & 255\end{bmatrix}$$

To apply the filter, we multiply the patch area by the filter elementwise, and add the results:

$$\begin{bmatrix}255 & 255 & 255\\255 & 255 & 0\\255 & 0 & 0\end{bmatrix} \times \begin{bmatrix}0 & 1 & 0\\0 & 1 & 0\\0 & 1 & 0\end{bmatrix}= \begin{bmatrix}(255 \times 0) + (255 \times 1) + (255 \times 0) & +\\ (255 \times 0) + (255 \times 1) + (0 \times 0) & + \\ (255 \times 0) + (0 \times 1) + (0 \times 0)\end{bmatrix}  = 510$$

This result is then used as the value for the first element of a feature map that is the size of the original image minus the outside edge elements:

$$\begin{bmatrix}\color{red}{510} & ? & ? & ?\\? & ? & ? & ?\\? & ? & ? & ?\\? & ? & ? & ?\end{bmatrix}$$

Next we move the patch along one pixel and apply the filter to the new patch area:

$$\begin{bmatrix}255 & \color{red}{255} & \color{red}{255} & \color{red}{255} & 255 & 255\\255 & \color{red}{255} & \color{red}{0} & \color{red}{0} & 255 & 255\\255 & \color{red}{0} & \color{red}{0} & \color{red}{0} & 0 & 255\\255 & 0 & 0 & 0 & 0 & 255\\255 & 255 & 0 & 0 & 255 & 255\\255 & 255 & 255 & 255 & 255 & 255\end{bmatrix}$$

$$\begin{bmatrix}255 & 255 & 255\\255 & 0 & 0\\0 & 0 & 0\end{bmatrix} \times \begin{bmatrix}0 & 1 & 0\\0 & 1 & 0\\0 & 1 & 0\end{bmatrix}= \begin{bmatrix}(255 \times 0) + (255 \times 1) + (255 \times 0) & +\\ (255 \times 0) + (0 \times 1) + (0 \times 0) & + \\ (0 \times 0) + (0 \times 1) + (0 \times 0)\end{bmatrix}  = 255 $$

So can fill in that value on our feature map:
$$\begin{bmatrix}510 & \color{red}{255} & ? & ?\\? & ? & ? & ?\\? & ? & ? & ?\\? & ? & ? & ?\end{bmatrix}$$

Then we just repeat the process, moving the patch across the entire image matrix until we have a completed feature map like this:

$$\begin{bmatrix}510 & 255 & 255 & 510\\255 & 0 & 0 & 255\\255 & 0 & 0 & 255\\510 & 255 & 255 & 510\end{bmatrix}$$

### Pooling Layers
After using one or more convolution layers to create a filter map, you can use a pooling layer to  reduce the number of dimensions in the matrix. A common technique is to use *MaxPooling*, in which a patch is applied to the matrix and the maximum value within the mask is retained while the others are discarded.

For example, we could apply a 2x2 patch to our feature map to extract the largest value in each 2x2 subarea:

$$\begin{bmatrix}\color{blue}{510} & \color{blue}{255} & \color{green}{255} & \color{green}{510}\\\color{blue}{255} & \color{blue}{0} & \color{green}{0} & \color{green}{255}\\\color{magenta}{255} & \color{magenta}{0} & \color{orange}{0} & \color{orange}{255}\\\color{magenta}{510} & \color{magenta}{255} & \color{orange}{255} & \color{orange}{510}\end{bmatrix} \Longrightarrow \begin{bmatrix}\color{blue}{510} & \color{green}{510}\\\color{magenta}{510} & \color{orange}{510}\end{bmatrix}$$

### Dropout Layers
In any machine learning training process, there is a danger of *overfitting* the model to the training data. In other words, you might end with a model that works extremely well with the data on which it was trained, but can't generalize effectively to classify new images. One way in which you can reduce the risk of overfitting is to randomly drop some features.

### Dense (Fully-Connected) Layers
After the previous layers have created feature maps, a final *fully-connected* layer is used to generate class predictions - you can think of the fully-connected layer as being the endpoint of the classifier what determines which combination of features found in the previous layers "adds up" to a particular class.

## Building a CNN
There are several commonly used frameworks for creating CNNs, including The *Microsoft Cognitive Toolkit (CNTK)*, *Tenserflow*, and *Keras* (which is a high-level API that can use Tenserflow or CNTK as a back end). 

### A Simple Example
In this notebook, we'll build a simple example CNN using Keras with a CNTK back end. The example is a classification model that can classify an image as a circle, a triangle, or a square.

First, we'll create some functions that can be used to generate the images for our classification model. Run the cell below to create these functions:

In [None]:
# function to generate an image of random size and color
def create_image (size, shape):
    from random import randint
    import numpy as np
    from PIL import Image, ImageDraw
    
    xy1 = randint(10,40)
    xy2 = randint(60,100)
    col = randint(10,200)

    img = Image.new("RGB", size, (255, 255, 255))
    draw = ImageDraw.Draw(img)
    
    if shape == 'circle':
        draw.ellipse([(xy1,xy1), (xy2,xy2)], fill=col)
    elif shape == 'triangle':
        draw.polygon([(xy1,xy1), (xy2,xy2), (xy2,xy1)], fill=col)
    else: # square
        draw.rectangle([(xy1,xy1), (xy2,xy2)], fill=col)
    del draw
    
    return np.array(img)

# function to create a dataset of images
def generate_image_data (shapes, size = (128,128), cases = 1000):
    images = []
    imagecodes = []
    
    i = 0
    while(i < cases / len(shapes)):
        for shape in shapes:
            images.append(create_image(size, shape))
            imagecodes.append(shapes.index(shape))
        i = i + 1
    
    return images, imagecodes

Now we'll create a set of images with which to train and validate a CNN model.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Our classes will be circles, triagnles. and squares
classnames = ['circle', 'triangle', 'square']

# All images will be 128x128 pixels
img_size = (128,128)

# Generate 1500 random images.
images, imagecodes = generate_image_data(classnames, img_size, 1500)

print(len(images), 'images generated. Here are the first three:')

# Display the first three images
fig = plt.figure(figsize=(12, 8))
for i in range(3):
    ax = fig.add_subplot(1, 3, i + 1, xticks=[], yticks=[])
    ax.set_title(classnames[imagecodes[i]])
    ax.imshow(images[i])

### Setting up the Frameworks
Now that we have our data, we're ready to build a CNN. The first step is to install and configure the frameworks we want to use.

First, we'll install a version of CNTK that is compatible with Keras on Python 3.5.

> **Note**: The following code installs the correct version of CNTK on Azure Notebooks. To install CNTK on your own system, consult the CNTK documentation at https://docs.microsoft.com/en-us/cognitive-toolkit/Setup-CNTK-on-your-machine.

In [None]:
# Upgrade CNTK to 2.3.1 for Python 3.5
!pip install --upgrade --no-deps https://cntk.ai/PythonWheel/CPU-Only/cntk-2.3.1-cp35-cp35m-linux_x86_64.whl

Next we'll install Keras

In [None]:
# install Keras
!pip install keras

And finally we'll configure Keras to use CNTK as a back end.

In [None]:
# Set the Keras backend to CNTK
from keras import backend as K
import os
from importlib import reload

def set_keras_backend(backend):

    if K.backend() != backend:
        os.environ['KERAS_BACKEND'] = backend
        reload(K)
        assert K.backend() == backend

set_keras_backend("cntk")

### Preparing the Data
Before we can train the model, we need to prepare the data. When working with CNTK, Keras expects the numeric values to be 32-bit floating point numbers, so we'll cast our features and labels to that type. We'll also divide the feature values by 255 to normalize the values, and we'll convert the numeric labels into categories that match the number of classes in our data (in this case, three - *circle*, *triangle*, and *square*).

In [None]:
def preprocess_images(image_array):
    from PIL import Image, ImageOps
    
    # Pre-process all of the images to make them consistent
    images = []
    for img in image_array:
        # Equalize the pixel intensity to ensure consistent contrast
        img = ImageOps.equalize(Image.fromarray(img))
        # flatten the images
        images.append(np.array(img))
        
        # (our images are all the same size - otherwise you should resize them!)
        
    return images


from keras.utils import np_utils
from keras.utils import to_categorical
import numpy as np

# The images are our features, the imagecodes are the labels
features = np.array(preprocess_images(images))
labels = np.array(imagecodes)

#Format features
features = features.astype('float32')
features /= 255

# Format labels
labels = to_categorical(labels, len(classnames))
labels = labels.astype('float32')

# Show the shape of the features array (num_images, width, height, channels)
print (features.shape)

### Defining the CNN
Now we're ready to train our model. This involves defining the layers for our CNN, and compiling them for multi-class classification.

In [None]:
# Train a CNN classifier
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense

# Define the model as a sequence of layers
model = Sequential()

# The input layer accepts an image and applies a convolution that uses 32 6x6 filters and a rectified linear unit activation function
model.add(Conv2D(32, (6, 6), input_shape=(features.shape[1], features.shape[2], features.shape[3]), activation='relu'))

# Next we;ll add a max pooling layer with a 2x2 patch
model.add(MaxPooling2D(pool_size=(2,2)))

# A dropout layer randomly drops some nodes to reduce inter-dependencies (which can cause over-fitting)
model.add(Dropout(0.2))

# We can add as many layers as we think necessary - here we'll add another convolution, max pooling, and dropout layer
model.add(Conv2D(32, (6, 6), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))

# And another set
model.add(Conv2D(32, (6, 6), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))

# Now we'll flatten the feature maps and generate an output layer with a predicted probability for each class
model.add(Flatten())
model.add(Dense(len(classnames), activation='softmax'))

# With the layers defined, we can ow compile the model for categorical (multi-class) classification
model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])



### Training the Model
With the layers of the CNN defined, we're ready to train the model using our image data. In the example below, we use 3 iterations (*epochs*) to train the model in 50-image batches, holding back 30% of the data for validation. After each epoch, the loss function detects the error in the model and adjusts the weights (which were randomly generated for the first iteration) to try to improve accuracy. 

> **Note**: We're only using 3 epochs to reduce the training time for this simple example. A real-world CNN is usually trained over more epochs than this (50 is a common starting point). CNN model training is processor-intensive, so it's recommended to perform this on a system that can leverage GPUs as well as CPUs (such as the Data Science Virtual Machine in Azure) to reduce training time. This will take a while to complete in Azure Notebooks (in which GPUs are not available) - status will be displayed as the training progresses. Feel free to go get some coffee!

In [None]:
# Train the model over 3 epochs using 50-image batches and holding back the last 30% of the data for validation
model.fit(features, labels, epochs=3, batch_size=50, validation_split=0.3)

# Note: The validation_split parameter holds back the last X% (in this case 30%) of the training data WITHOUT shuffling the data.
# Our data is ordered such that instances the three classes are distributed evenly, so this is fine.
# If the data is ordered such that the the last X% doesn't contain a mix of all classes, you should either:
#   - Shuffle the data before training (ensuring features still match corresponding labels.)
#   - Randomly split the data first and use the validation_data parameter instead of validation_split

### Using the Trained Model
Now that we've trained the model, we can use it to predict the class of an image.

In [None]:
from random import randint
from PIL import Image, ImageOps

# Create a random test image
img = create_image ((128,128), classnames[randint(0, len(classnames)-1)])
plt.imshow(img)

# Modify the image data to match the format of the training features
img = np.array(ImageOps.equalize(Image.fromarray(img)))
imgfeatures = img.reshape(1, img.shape[0], img.shape[1], img.shape[2])
imgfeatures = imgfeatures.astype('float32')
imgfeatures /= 255

# Use the classifier to predict the class
predicted_class = model.predict(imgfeatures)
# Print the predicted probabilities for each class
print(predicted_class)
# Find the class with the highest predicted probability
i = np.where(predicted_class == predicted_class.max())
print (classnames[int(i[1])])

## Learning More
* [CNTK Documentation](https://docs.microsoft.com/en-us/cognitive-toolkit/)
* [Keras Documentation](https://keras.io/)