# Ch 5 - Deep Learning for Computer Vision

This chapter introduces convolutional neural networks, also known as convnets, a type of deep-learning model almost universally used in computer vision applications. 

## 5.1 Introduction to Convnets

Going to use the same MNIST digits dataset as in Ch 2. 

In [1]:
from keras import layers
from keras import models

Using TensorFlow backend.


### Instantiating a small convnet

This is what a basic convnet looks like. It is a stack of Conv2D and MaxPooling2D layers. 

The convnet takes input tensors of shape (image_height, image_width, image_channels). In this case, we'll configure the convnet to process inputs of size (28, 28, 1), which is the format of MNIST  images. 

In [2]:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', \
                        input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

#### Architecture of the Convnet:

In [3]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 3, 3, 64)          36928     
Total params: 55,744
Trainable params: 55,744
Non-trainable params: 0
_________________________________________________________________


The output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels).

The width and height dimensions tend to shrink as you go deeper in the network.

The number of channels is controlled by the first argument passed to the Conv2D layers (32 or 64).

#### Adding a Classifier on Top of the Convnet

The next step is to feed the last output tensor (of shape (3, 3, 64)) into a densely connected classifier network like those you're already familiar with: a stack of Dense layers.

These classifiers process vectors, which are 1D, whereas the current output is a 3D tensor. First we need to flatten the 3D outputs to a 1D, and then add a few Dense layers on top.

In [4]:
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

Instructions for updating:
keep_dims is deprecated, use keepdims instead


We’ll do 10-way classification, using a final layer with 10 outputs and a softmax activation.

Here’s what the network looks like now:

In [5]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten_1 (Flatten)          (None, 576)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                36928     
__________

The (3, 3, 64) outputs are flattened into vectors of shape (576, ) before going through two Dense layers.

#### Training the Convnet on MNIST Images

In [6]:
from keras.datasets import mnist
from keras.utils import to_categorical

In [7]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()