
# Improving Computer Vision Accuracy using Convolutions

### Review of pervious attempt 
In the previous lessons you saw how to do fashion recognition using a Deep Neural Network (DNN) containing three layers -- the input layer (in the shape of the data), the output layer (in the shape of the desired output) and a hidden layer. You experimented with the impact of different sized of hidden layer, number of training epochs etc. on the final accuracy.

For convenience, here's the entire code again. Run it and take a note of the test accuracy that is printed out at the end. 

In [None]:
# Import library
import tensorflow as tf
# Prepare data
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images / 255.0
test_images=test_images / 255.0
# Build and compile neural network model 
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation=tf.nn.relu),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Train on Training data
model.fit(training_images, training_labels, epochs=5)
# Evaluate test (unseen) data
print("---Evaluating unseen images---")
test_loss = model.evaluate(test_images, test_labels)

While the accuracy (89% on training and 87% on validation) of the deep neural network model is not bad, convolutional neural networks almost always improve the performance of the models for computer vision. Convolutions narrow down the content of the image to focus on specific, distinct, details. 

If you've ever done image processing using a filter like https://en.wikipedia.org/wiki/Kernel_(image_processing) then convolutions will look very familiar.

![title](img/filter.png)

In short, you take an array (usually 3x3 or 5x5 called filter) and pass it over the image. By changing the underlying pixels based on the formula in that matrix, you can do things like edge detection. So, for example, you'll see a 3x3 that is defined for edge detection where the middle cell is 8, and all of its neighbors are -1. In this case, for each pixel, you would multiply its value by 8, then subtract the value of each neighbor. Do this for every pixel, and you'll end up with a new image that has the edges enhanced.

This is perfect for computer vision, because often it's features that can get highlighted like this that distinguish one item for another, and the amount of information needed is then much less...because you'll just train on the highlighted features.

That's the concept of Convolutional Neural Networks. Add some layers to do convolution before you have the dense layers, and then the information going to the dense layers is more focused, and possibly more accurate.

Run the below code -- this is the same neural network as earlier, but this time with Convolutional layers added first. It will take longer, but look at the impact on the accuracy:

### Import Libraries

Let's start our convolutional neural network model by importing tensorflow.

In [None]:
import tensorflow as tf
print(tf.__version__)

### Provide Data 

As with 02_Intro_ComputerVision.ipynb, we will be using Fashion MNIST ( https://github.com/zalandoresearch/fashion-mnist ), consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes/ types of clothes. The dataset is available directly in the tf.keras datasets API. Here, you'll notice that the training data needed to be reshaped. That's because the first convolution expects a single tensor (a 4D list) containing everything, so instead of 60,000 28x28x1 items in a list, we have a single list that is 60,000x28x28x1, and the same for the test images. If you don't do this, you'll get an error when training as the Convolutions do not recognize the shape. 

In [None]:

### Provide Data
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images.reshape(60000, 28, 28, 1)
training_images=training_images / 255.0
test_images = test_images.reshape(10000, 28, 28, 1)
test_images=test_images/255.0

### Define and Compile the network

Next is to define the model. Now instead of the input layer at the top, you're going to add a Convolution. 
```
 tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(28, 28, 1)),
 tf.keras.layers.MaxPooling2D(2, 2),
```

The parameters are:

1. The number of convolutions to generate. Although arbitrary, it is good to start with something in the order of 32
2. The size of the Convolution, in this case a 3x3 grid
3. The activation function -- e.g., *relu*, which is the equivalent of returning x when >0, else returning 0
4. In the first layer, the shape of the input data.

The convolution is followed with a MaxPooling layer which is then designed to compress the image, while maintaining the content of the features that were highlighted by the convlution. By specifying (2,2) for the MaxPooling, the effect is to quarter the size of the image. The idea is that it creates a 2x2 array of pixels, and picks the biggest one, thus turning 4 pixels into 1. It repeats this across the image, and in so doing halves the number of horizontal, and halves the number of vertical pixels, effectively reducing the image by 25%.

We add another convolution:
```
  tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
  tf.keras.layers.MaxPooling2D(2,2)
```
Now flatten the output. After this you'll just have the same DNN (deep neural network) structure as the non convolutional version (see review above or 02_Intro_ComputerVision.ipynb)
```
  tf.keras.layers.Flatten(),
```
The same 128 dense layers, and 10 output layers as in the pre-convolution example:
```
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])
```
As before, we compile the model and use Adam for the optimizer and crossentropy for the loss. 

You can call **model.summary()** to see the size and shape of the network, and you'll notice that after every MaxPooling layer, the image size is reduced in this way. 

In [None]:
### Define and Compile the network
model = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(28, 28, 1)),
  tf.keras.layers.MaxPooling2D(2, 2),
  tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
  tf.keras.layers.MaxPooling2D(2,2),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()

### Train the Network and Evaluate Model 

Call the fit method to do the training.

In [None]:
model.fit(training_images, training_labels, epochs=5)

It's likely gone up to about 93% on the training data.

### Evaluate unseen data

And we evaluate the loss and accuracy from the test set.

In [None]:
print("Evaluating model using unseen images")
test_loss = model.evaluate(test_images, test_labels)

and 91% on the validation data, an improvment over the deep neural network. 

Try running it for more epochs (e.g., 20), and explore the results! While the results might seem really good, the validation results may actually go down, due to 'overfitting'. Overfitting occurs when the network learns the data from the training set really well (it's too specialised to only that data) and as a result is less effective at seeing *other* data. 

Let's see how well the model predicted the first 100 test images. Red text means the prediction is incorrect. 

In [None]:
%matplotlib inline
# Plot the first 9 test images, their predicted label, and the true label
# Color correct predictions in blue, incorrect predictions in red
import numpy as np
import matplotlib.pyplot as plt
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

classifications = model.predict(test_images)

rows = 10
cols = 10
num_images = rows*cols
plt.figure(figsize=(cols, rows))

for i in range(num_images):
  plt.subplot(rows,cols,i+1)
  plt.grid(False)
  plt.xticks([])
  plt.yticks([])
  plt.imshow(test_images[i].reshape(28, 28))
  predicted_label = np.argmax(classifications[i])
  true_label = test_labels[i]
  if predicted_label == true_label:
    color = 'blue'
  else:
    color = 'red'
  plt.xlabel("{}".format(class_names[predicted_label]),color=color)

plt.tight_layout()

# Visualizing the Convolutions and Pooling

This code will show us the convolutions graphically. The print (test_labels[;100]) shows us the first 100 labels in the test set, and you can see that the ones at index 0, index 23 and index 28 are all the same value (9). They're all ankle boots. Let's take a look at the result of running the convolution on each, and you'll begin to see common features between them emerge. Now, when the DNN is training on that data, it's working with a lot less, and it's perhaps finding a commonality between shoes based on this convolution/pooling combination.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
FIRST_IMAGE=0
SECOND_IMAGE=23
THIRD_IMAGE=28
CONVOLUTION_NUMBER = 1

rows = 3
cols = 1
num_images = rows*cols
plt.figure(figsize=(cols, rows))

for i,img in enumerate([FIRST_IMAGE,SECOND_IMAGE,THIRD_IMAGE]):
  plt.subplot(rows,cols,i+1)
  plt.grid(False)
  plt.xticks([])
  plt.yticks([])
  plt.imshow(test_images[img].reshape(28, 28))

In [None]:
f, axarr = plt.subplots(3,4)
from tensorflow.keras import models
layer_outputs = [layer.output for layer in model.layers]
activation_model = tf.keras.models.Model(inputs = model.input, outputs = layer_outputs)
for x in range(0,4):
  f1 = activation_model.predict(test_images[FIRST_IMAGE].reshape(1, 28, 28, 1))[x]
  axarr[0,x].imshow(f1[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
  axarr[0,x].grid(False)
  f2 = activation_model.predict(test_images[SECOND_IMAGE].reshape(1, 28, 28, 1))[x]
  axarr[1,x].imshow(f2[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
  axarr[1,x].grid(False)
  f3 = activation_model.predict(test_images[THIRD_IMAGE].reshape(1, 28, 28, 1))[x]
  axarr[2,x].imshow(f3[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
  axarr[2,x].grid(False)
plt.tight_layout()