# Introduction to Convolutional Neural Networks
We have been using **fully connected networks** (FCNs) to classify the MNIST dataset, and in the last assignment we designed a network which could do this with an accuracy of around 98%.   Convolutional Neural Networks, or Convnets, or CNNS (fake networks!), are another even more powerful tool for classifying images such as MNIST.   You might ask, what do Convnets do that FCNs can't?

To understand this, let's take another look at our MNIST FCN.  If you have not already, examine and run the jupyter notebook in assignment10_prep called **train_fcn_model_mnist.ipynb**.   After you run this, you will have a stored version of the compiled and fit FCN called **fully_trained_model_fcn.h5**.

# Pull in the MNIST data

In [None]:
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

short = False
if short:
    train_images = train_images[:7000,:]
    train_labels = train_labels[:7000]
    test_images = test_images[:3000,:]
    test_labels = test_labels[:3000]
#
print("Train info",train_images.shape, train_labels.shape)
print("Test info",test_images.shape, test_labels.shape)
train_images = train_images.reshape((train_images.shape[0],28*28))
train_images = train_images.astype('float32')/255

test_images = test_images.reshape((test_images.shape[0],28*28))
test_images = test_images.astype('float32')/255
from keras.utils import to_categorical

train_labels_cat = to_categorical(train_labels)
test_labels_cat = to_categorical(test_labels)


# Run on the test set using our previous network
We should get around 98% (will vary depending on the randomly initialized weights).

In [None]:
from keras.models import load_model
network_name = 'fully_trained_model_fcn.h5'
trained_network = load_model(network_name)

## A method to get performance numbers
The following method will be helpful later to get loss, accuracy, and the confusion matrix for our network.

In [None]:
import numpy as np
#
# Used to implement the multi-dimensional counter we need in the performance class
from collections import defaultdict
def autovivify(levels=1, final=dict):
    return (defaultdict(final) if levels < 2 else
            defaultdict(lambda: autovivify(levels-1, final)))
def getPerformance(network,images,labels_cat,labels):
#
# Get the overall performance for the test sample
    loss, acc = network.evaluate(images,labels_cat)
#
# Get the individual predictions for each sample in the test set
    predictions = network.predict(images)
#
# Get the max probabilites for each rows
    probs = np.max(predictions, axis = 1)
#
# Get the predicted classes for each row
    classes = np.argmax(predictions, axis = 1)
#
# Now loop over the first twenty samples and compare truth to prediction
#print("Label\t Pred\t Prob")
#for label,cl,pr in zip(smear_labels[:20],classes[:20],probs[:20]):
#    print(label,'\t',cl,'\t',round(pr,3))
#
# Get confustion matrix
    cf = autovivify(2,int)
    for label,cl in zip(labels,classes):
        cf[label][cl] += 1
#
    return loss,acc,cf

In [None]:
loss,acc,cf = getPerformance(trained_network,test_images,test_labels_cat,test_labels)
print("   Results")
print("   Loss,acc",round(loss,4),round(acc,4))
for trueClass in range(10):
    print("   True: ",trueClass,end="")
    for predClass in range(10):
        print(" \t",cf[trueClass][predClass],end="")
    print()
print()


# Can we improve on this?
When we ask about how we might improve on this performance, we should think about a few things:
1.  How big is our network?   Does it scale to images larger than our 28x28x1 digit images?
2.  How sensitive is our network to small (or even large) changes in our inputs?
3.  Even if the previous two points are a problem, we can still ask if it is possible to improve our performance over the FCN.

## How big is our network
Keras gives us a tool to get summary information about our network:

In [None]:
print(trained_network.input)
print(trained_network.output)
print(trained_network.summary())

## Calculating the Number of parameters
Notice that the two layers are called **dense**: these are fully connected layers, meaning there is a connection from every output of one layer to every input of the next layer.   Here is how we get the parameter counts:
1.  dense_1 (the "_1" is just a label from when the network was created, it has no significance): We have 784 inputs each connected to 400 hidden nodes: 784*400=313600 parameters, plus another 400 "bias" parameters (1 for each node) which gives us a total of 314,000 parameters for the hidden layer.
2.  dense_2: we have 400 inputs (one each from the hidden layer) connected to 10 outputs: 400*10 + 10(bias)=4010 parameters for the output layer.

So we have 318,010 total parameters for a network which is used to classify small 28x28x1 greyscale images.   If we went to megapixel color images, we would have 1000x1000x3 = 3,000,000 input pixels, and if we have a 400 node hidden layer (which is probably too small), we end up with more than 1.2 billion parameters.... this does not scale!

## Sensitivity to variations in the input
The types of variations we want to consider include:
1.  Shifts of the input image (up/down and/or right/left).
2.  Scaling of the input image (making it bigger or smaller (still within the 28x28 pixel window).
3.  Rotations of the input image.
We could also include shearing of the input image, but for now lets just consider the first 3.   

Keras includes a method for performing all of these operations on an image.   Let's define a method to do this, using a single image as an input, and also define a method to display the image:

In [None]:
import keras.preprocessing.image as kpi
import matplotlib.pyplot as plt
import numpy as np

data_gen_args = dict(featurewise_center=True,
                     featurewise_std_normalization=True,
                     rotation_range=90,
                     width_shift_range=0.1,
                     height_shift_range=0.1,
                     zoom_range=0.2)
image_datagen = kpi.ImageDataGenerator(**data_gen_args)

def transform_image(img,tx=0,ty=0,zoom=1.0,rotation=0.0,shear=0.0):
    transform_parameters = {}
    orig_image = np.array(img, copy=True).reshape(28,28,1)
    transform_parameters['theta'] = rotation
    transform_parameters['zx'] = zoom
    transform_parameters['zy'] = zoom
    transform_parameters['tx'] = tx
    transform_parameters['ty'] = ty
    transform_parameters['shear'] = shear
    orig_image = image_datagen.apply_transform(x=orig_image, transform_parameters=transform_parameters)
    return orig_image

def plot_image(img):
    one_image = img.reshape(28,28)
    plt.imshow(one_image, cmap='hot')
    plt.colorbar()
    plt.show()


## TASK 1: Pick a random image from our test dataset, and do the following:
1.  Rotate 45 degrees.
2.  Rotate 45 degrees plus zoom out (make the digit smaller)
3.  Rotate 45 degrees plus zoom out plus translate the image to the upper corner.

In [None]:
# Your code here!

In [None]:
# Your code here!

In [None]:
# Your code here!

## Stability of the FCN to Variations in the input
We are now ready to systematically answer the question: how well does the FCN handle images that are slight (or not-so-slight) variations of the data it was trained on.   

Here is what we will do: 
1.  Loop over every image in the test set
2.  Choose a random +/- shift (in x and y) from a subset (0-4 pixels in increments of 1).
3.  Randomly shift the image over that shift.
4.  Store the transformed image in a list
When we are done, we will run that list of images through our original FCN and note the performance, comparing it to the original performance.

In [None]:
import random
import numpy as np

for shift in [0,1,2,3,4]:
    print()
    print("Shift ",shift)
    imgList = []
    count = 0
    for img in test_images[:]:
        if random.uniform(0.0,1.0) > 0.5:
            tx = shift
        else:
            tx = -shift
        if random.uniform(0.0,1.0) > 0.5:
            ty = shift
        else:
            ty = -shift
#        tx = random.uniform(-shift,shift)
#        ty = random.uniform(-shift,shift)
        trans_image = transform_image(img,tx=tx,ty=ty,zoom=1.0,rotation=0.0,shear=0.0)
        imgList.append(trans_image)
#
# Convert to np array
    npa_images = np.asarray(imgList, dtype=np.float32)
    npa_images = npa_images.reshape((npa_images.shape[0],28*28))
#
    smear_loss,smear_acc,smear_cf = getPerformance(trained_network,npa_images,test_labels_cat,test_labels)
    print("   Results")
    print("   Loss,acc",smear_loss,smear_acc)
    for trueClass in range(10):
        print("   True: ",trueClass,end="")
        for predClass in range(10):
            print(" \t",smear_cf[trueClass][predClass],end="")
        print()


**This is bad.** As soon as we get 2 pixels out, the performance drops by almost 30%!   So the FCN is not stable at all against small variations in input.

## Task 2: Test FCN Stability Further
Using a similar strategy as above, try the following:
1.  Rotations in the range: [0.0,20.0,40.0,60.0,80.0] (keep shifts=0)
2.  Zooms in the range[1.0,1.25,1.5,1.75,2.0] (keep shifts and rotations=0)

In [None]:
# Your code here!

In [None]:
# Your code here!

## Shortcomings of FCNs
We see that standard, fully-connected neural networks, although powerful, have some clear shortcomings when applied to image classification:
1.  They do not scale well.  Reasonable-sized images would require an enormous number of parameters.   This in turn would require a corresponding increase in the number of training samples in order to determine the parameters accurately.
2.  They are dependent on the specific pixel relationships within the image.   Performance degrades substantially as soon as there is a minor devation from these relationships.

Both of these issues are related: the FCN does not take advantage of the fact that - generally - in image classification, the images tend to be built from underlying common features.   In the case of MNIST images, these are the curves and lines and corners which make up the individual digits.   Convnets attempt to take advantage of these features.

# Building a Convnet
Before explaining how convnets work, lets try to build a simple network to classify MNIST images.

For comparison, we first show the code we use to build and train an FCN, followed by similar code to build and train a CNN.

In [None]:
from keras import models
from keras import layers

#
# Make sure the shape of the input is correct
train_images = train_images.reshape((train_images.shape[0],28*28))
test_images = test_images.reshape((test_images.shape[0],28*28))

fcn_network = models.Sequential()
#
# Hidden
fcn_network.add(layers.Dense(400,activation='tanh',input_shape=(28*28,)))
#
# Output
fcn_network.add(layers.Dense(10,activation='softmax'))
#
# Compile
fcn_network.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
# 
# Fit/save/print summary
history = fcn_network.fit(train_images,train_labels_cat,epochs=15,batch_size=128,validation_data=(test_images,test_labels_cat))
fcn_network.save('fully_trained_model_fcn.h5')
print(fcn_network.summary())


In [None]:
from keras import models
from keras import layers
#
# Make sure the shape of the input is correct (the last ",1" is the number of "channels"=1 for grayscale)
train_images = train_images.reshape((train_images.shape[0],28,28,1))
test_images = test_images.reshape((test_images.shape[0],28,28,1))
#
cnn_network = models.Sequential()
#
# First convolutional layer
cnn_network.add(layers.Conv2D(30,(5,5),activation='relu',input_shape=(28,28,1)))
# Pool
cnn_network.add(layers.MaxPooling2D((2,2)))
#
# Second convolutional layer
cnn_network.add(layers.Conv2D(25,(5,5),activation='relu'))
# Pool
cnn_network.add(layers.MaxPooling2D((2,2)))
#
# Connect to a dense output layer - just like an FCN
cnn_network.add(layers.Flatten())
cnn_network.add(layers.Dense(64,activation='relu'))
cnn_network.add(layers.Dense(10,activation='softmax'))
#
# Compile
cnn_network.compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=['accuracy'])
#
# Fit/save/print summary
history = cnn_network.fit(train_images,train_labels_cat,epochs=5,batch_size=256,validation_data=(test_images,test_labels_cat))
cnn_network.save('fully_trained_model_cnn.h5')
print(cnn_network.summary())


In [None]:
#
# Get the overall performance for the test sample
test_loss, test_acc = cnn_network.evaluate(test_images,test_labels_cat)
print("Test sample loss: ",test_loss, "; Test sample accuracy: ",test_acc)


## Comparison of CNN and FCN
There are a couple of things to notice when comparing the output from the two code blocks above:
1.  The performance of the CNN is better than the FCN after 5 epochs.  A careful examination of the training set accuracies reveals that the CNN is still undertrained (and so can perform better if we increase the number of epochs).
2.  The number of parameters needed to specify the CNN is 40.3k, about 7 times smaller than the FCN!
3.  The training time per step is much longer (about 10x) for the CNN than it is for the FCN.

## Task 3: Test CNN Stability 
What we don't know yet, is how stable the CNN is to variations in the input images.
Using a similar strategy as we used above for the FCN, calaculate the performance under the following variations:
1.  Shifts in the range [0,1,2,3,4]
2.  Rotations in the range: [0.0,20.0,40.0,60.0,80.0] (keep shifts=0)
3.  Zooms in the range[1.0,1.25,1.5,1.75,2.0] (keep shifts and rotations=0)

In [None]:
# Your code here!