# Chapter-11: CNN- Convolutional Neural Networks

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from PIL import Image
from matplotlib.pyplot import imshow, imsave
import imageio
import random
import urllib.request  

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

In [None]:
Data_path="/content/drive/My Drive/DataSets/Chapter-11/Datasets/"

## ANN's for images
The standard ANNs have the flexibility to learn any patterns in the data. In ANNs, we take each pixel as input while building the model. Considering each pixel as an input is not practically viable. As part of the first step, we flatten the images. Flattening the images removes a fundamental property of
images called special dependency. Images have a local correlation.

### Spatial dependence
In ANN, the first step that we perform is flattening the image to a single row. While building the ANN model, we do not use this image as it is, we convert it onto a row and pass it
on to model as input.

In [None]:
(X_train, Y_train), (X_test, Y_test) = keras.datasets.mnist.load_data()
print("X_train shape", X_train.shape)
print("X_test shape", X_test.shape)

x_train = X_train.reshape(60000, 784)
x_test = X_test.reshape(10000, 784)

print("X_train new shape", x_train.shape)
print("X_test new shape", x_test.shape)

In MNIST data, our images were of size 28X28. We have 60,000 images in train data. The first step was to flatten the 28X28 matrix-shaped images to 756X1 array. We generally use the reshape function before starting our ANN model building.

The original data has 60,000 images and each image has 28X28 size. Original data is a three-dimensional tensor. We used reshape to flatten the 28X28 image into a single row of 784 pixels. The reshaped tensor has only two dimensions now. We then use this reshaped data and build the model.

In [None]:
num_classes=10
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

## Convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(Y_train, num_classes)
y_test = keras.utils.to_categorical(Y_test, num_classes)

model = keras.Sequential()
model.add(layers.Dense(20, activation='sigmoid', input_shape=(784,)))
model.add(layers.Dense(20, activation='sigmoid'))
model.add(layers.Dense(10, activation='softmax'))
model.summary()
model.compile(loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train,epochs=2)

By converting images to single rows, which is also known as flatting the images removes the spatial
correlation between the surrounding pixels. Somehow we need to preserve the local correlation.
Loss of spatial dependency is the first issue that we observed with standard ANNs.

## Filters
We can think filters as subregions on the image. Till now we have considered pixels inside the image. A pixel is a most primitive subregion. Instead of taking the input information from the pixels we try to take the information from the sub-regions of the image. We use filters to get the information on image subregions. 

In [None]:
(X_train, Y_train), (X_test, Y_test) = keras.datasets.mnist.load_data()
print("X_train shape", X_train.shape)
print("X_test shape", X_test.shape)

### Kernal matrix for detecting features
The values of the kernel matrix decide the resultant convoluted features. In the above example, we
considered a kernel matrix that has all 1’s in it. Depending on the values of the matrix each kernel
captures a certain type of features. Some kernels capture straight lines in the image. Some kernels
capture the circles, some kernels capture the sharp edges and some kernels capture the curves.

In [None]:
# Plot 4 images as gray scale

%matplotlib inline
import matplotlib.pyplot as plt
plt.subplot(221)
plt.imshow(X_train[1], cmap=plt.get_cmap('gray'))
plt.subplot(222)
plt.imshow(X_train[6], cmap=plt.get_cmap('gray'))
plt.subplot(223)
plt.imshow(X_train[7], cmap=plt.get_cmap('gray'))
plt.subplot(224)
plt.imshow(X_train[9], cmap=plt.get_cmap('gray'))

# show the plot
plt.show()

## The convolution layer
In ANN we have an input layer, and the first layer after the input layer is a fully connected hidden layer. But here, we are applying a filter on the input layer and creating convoluted features. The filter is nothing but a matrix of weights. These weights need to be determined while training. The
first layer in the convolutional neural network is the convolution layer. The convolution layer keeps the local correlation intact. The convolution layer captures all the features from the input.

### Convolution layer in keras
We use the function conv2D(). This function moves
the kernel matrix along the rows and columns of the image. We usually initialize random weights in the kernel. Below is the code is used for adding the convolution layer to the model.

In [None]:
#i=int(np.random.rand(1,1)*60000)
#634 #924 #952 #3611  #4458
import numpy as np
x=X_train[3611]


print("Actual Image")
plt.imshow(x, cmap=plt.get_cmap('gray'))
plt.show()


print("Random Weights Kennel")
from tensorflow.keras.layers import Conv2D
model=Sequential()
model.add(Conv2D(filters=1,
                 kernel_size=7,
                 input_shape=(28,28,1),
                 kernel_initializer='random_uniform'))

img_reshape=np.expand_dims(x, axis=0)
img_reshape=np.expand_dims(img_reshape, axis=3)
img_reshape=model.predict(img_reshape)
pixels = np.matrix(img_reshape[:][:][:][0])
plt.imshow(pixels,cmap=plt.get_cmap('gray'))
plt.show()

Below are the parameters from the above code
* **Conv2D()** we are moving the kernel matrix in two dimensions- rows and columns
* **filters** – Number od filters or kernel matrices. We need to declare sufficient filters to capture
all the features. Here we mentioned1. Usually, in practical problems, it is 8 or 16 or 32
* **kernel_size** – Size of the kernel matrix. Here we mentioned 7 which gives us 7X7 matrix.
Usually, it is 3X3 or 5X5 or 7X7
* **input_shape** – Required parameter for the first layer only. It will be derived on its own from
the second convolution layer onwards. In our example, the input image shape is (28,28,1)
* **kernel_initializer** – Initial values of the kernel matrix. Usually, it is randomly initialized.

In the above output we saw the result of a randomly initialized kernel. During the training process
those weights in the matrix will be adjusted. Finally, each kernel matrix will identify a feature. Now
we will create own kernels for detecting horizontal and vertical lines. These kernels are known as
constant initializers in Keras. Below is the code to create the two kernel matrices.

In [None]:
import numpy as np
filter1=np.array([[1,1,1,1,1,1,1],
           [1,1,1,1,1,1,1],
           [100,100,100,100,100,100,100],
           [100,100,100,100,100,100,100],
           [100,100,100,100,100,100,100],
           [1,1,1,1,1,1,1],
           [1,1,1,1,1,1,1]])
print("filter1 \n", filter1)

filter2=np.transpose(filter1)
print("filter2 \n",filter2)

We will now apply these two kernel functions using the below code.

In [None]:
#Try #634 #924 #952 #3611  #4458
x=X_train[3611]

print("Actual Image")
plt.imshow(x, cmap=plt.get_cmap('gray'))
plt.show()

print("Horizontal Line")

model=Sequential()
model.add(Conv2D(1,
                 kernel_size=7,
                 input_shape=(28,28,1),
                 kernel_initializer=keras.initializers.Constant(filter1)))

img_reshape=np.expand_dims(x, axis=0)
img_reshape=np.expand_dims(img_reshape, axis=3)
img_reshape=model.predict(img_reshape)
pixels = np.matrix(img_reshape[:][:][:][0])
plt.imshow(pixels,cmap=plt.get_cmap('gray'))
plt.show()

print("Vertical Line")
model=Sequential()
model.add(Conv2D(1,
                 kernel_size=7,
                 input_shape=(28,28,1),
                 kernel_initializer=keras.initializers.Constant(filter2)))

img_reshape=np.expand_dims(x, axis=0)
img_reshape=np.expand_dims(img_reshape, axis=3)
img_reshape=model.predict(img_reshape)
pixels = np.matrix(img_reshape[:][:][:][0])
plt.imshow(pixels,cmap=plt.get_cmap('gray'))
plt.show()

From the result, we can see the highlighted portions of horizontal lines and vertical lines. Usually, it
is not easy to observe such patterns in the images just by looking at the plots.

### Filters for colour images
The color images have the depth the filter will also have the depth. The filter is applied just like the back and white images but the dot product will be considering the depth also. The whole calculation will end up in a single number in the convoluted image.

We will first import the image

In [None]:
github_link="https://raw.githubusercontent.com/venkatareddykonasani/ML_DL_py_TF/master/Chapter11_CNN/Datasets/"

In [None]:
#imp_path=Data_path+"43534.png"
#print("imp_path",imp_path)

#Image importing
import matplotlib.pyplot as plt
import urllib.request  
urllib.request.urlretrieve((github_link+"43534.png"), "43534.png")
x=plt.imread('43534.png')

%matplotlib inline
#x=plt.imread(imp_path)
print(x.shape)
y=x[10:15,10:15]
print(y*20)
print(y.shape)
plt.imshow((y*255).astype(np.uint8))

## Zero Padding 

Below is the code for creating convolution layers.

In [None]:
#i=int(np.random.rand(1,1)*60000)
#634 #924 #952 #3611  #4458
import numpy as np
x=X_train[3611]

print("Actual Image Shape " ,  x.shape)
plt.imshow(x, cmap=plt.get_cmap('gray'))
plt.show()


print("Random Weights Kennel siz3 7X7")
from tensorflow.keras.layers import Conv2D
model=Sequential()
model.add(Conv2D(filters=1,
                 kernel_size=7,
                 input_shape=(28,28,1),
                 kernel_initializer='random_uniform'))

img_reshape=np.expand_dims(x, axis=0)
img_reshape=np.expand_dims(img_reshape, axis=3)
img_reshape=model.predict(img_reshape)
pixels = np.matrix(img_reshape[:][:][:][0])
print("Output Shape " ,pixels.shape)
plt.imshow(pixels,cmap=plt.get_cmap('gray'))
plt.show()


print("Random Weights Kennel siz3 5X5")
from tensorflow.keras.layers import Conv2D
model=Sequential()
model.add(Conv2D(filters=1,
                 kernel_size=5,
                 input_shape=(28,28,1),
                 kernel_initializer='random_uniform'))

img_reshape=np.expand_dims(x, axis=0)
img_reshape=np.expand_dims(img_reshape, axis=3)
img_reshape=model.predict(img_reshape)
pixels = np.matrix(img_reshape[:][:][:][0])
print("Output Shape " ,pixels.shape)
plt.imshow(pixels,cmap=plt.get_cmap('gray'))
plt.show()

## CNN code
The network we used for calculating weights is not a random CNN model. It is a model used for the classification of images in CIFAR10 data. CIFAR10 and CIFAR100 are of the 80 million tiny images dataset collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. CIFAR10 is available as part of
Keras sample datasets library. The CIFAR-10 data consists of 60,000 images, each image size is 32x32x3 and there are ten classes in the output. The output classes are airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck

Below code is used for downloading the data and visualizing a few images

In [None]:
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt

In [None]:
(X_train, y_train), (X_test, y_test) = datasets.cifar10.load_data()

# Normalize pixel values to be between 0 and 1
X_train=X_train/255
X_test=X_test/255

In [None]:
print("X_train.shape", X_train.shape)
print("y_train.shape", y_train.shape)
print("X_test.shape", X_test.shape)
print("y_test.shape", y_test.shape)

In [None]:
#Drawing Few images
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
plt.figure(figsize=(10,10))
for i in range(16):
    plt.subplot(4,4,i+1)
    plt.imshow(X_train[i], cmap=plt.cm.binary)
    plt.xlabel(class_names[y_train[i][0]])
    plt.xticks([])
    plt.yticks([])
plt.show()

The dataset has such low-quality
pixelated images. We can now go ahead and build the model.

Below is the code for creating the above CNN model.

In [None]:
model = models.Sequential()
model.add(layers.Conv2D(32, (5, 5), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))
model.summary()

Most often, we use ReLU activation in image processing examples. Image processing networks are
usually very deep; ReLu works better than sigmoid and tanh activation.

Now we will go ahead and train the model. Below is the code training the model

In [None]:
import time #To measure the execution time 
start = time.time()

model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.1),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
model.fit(X_train, y_train, 
          batch_size=16,
          epochs=12, 
          validation_data=(X_test, y_test))

end = time.time()
print("Execution time is", int(end - start), "seconds")

In the above code, we used the SGD optimizer. Loss is usually categorical cross-entropy. Here in this
data, we did not do the one-hot encoding for the target variable. The target variable has integers in
it. We have to use SparseCategoricalCrossentropy() for such cases where one-hot encoding is not
performed. form_logits=True will consider the logits for they-predicted values instead of
probabilities. This option works better for faster execution.

From the above output, we can see that the accuracy of test data is around 70% accuracy after 10
epochs. If we run a few more epochs the model will be exceedingly overfitted. We can finetune the
parameters and add regularization, we can further increase the accuracy.

## Case study: Sign Language Reading from Images
This case study is about predicting the number based on the sign shown with fingers. We are going
to use the sign-language dataset in this case study. It is publically available under CC BY-SA 4.0
License. This dataset originally prepared by Turkey Ankara Ayrancı Anadolu High School Students.

### Background and objective
The samples are collected from handmade gestures of the digits from 218 participants. Each image
includes a one-handed display of digits 0 to 9. These are color images wth size: 100 x 100 pixels. The
objective is to build a model that will predict the hand gestures.

In [None]:
#Let us keep all the libraries ready
import tensorflow.keras
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten, Input, AveragePooling2D,  Activation
from tensorflow.keras.layers import Reshape, Input, Lambda
from tensorflow.keras.layers import Conv2D, Convolution2D, MaxPooling2D, BatchNormalization
from tensorflow.keras.layers import Concatenate, GlobalAveragePooling2D
from tensorflow.keras.optimizers import Adam, SGD
from tensorflow.keras import regularizers, initializers
from tensorflow.keras.preprocessing.image import ImageDataGenerator

import pandas as pd
import numpy as np

from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
%matplotlib inline

from PIL import Image
from matplotlib.pyplot import imshow, imsave
import imageio
import random

Before going ahead with data importing. We will verify some random images from the data using the
below code.

In [None]:
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
downloaded = drive.CreateFile({'id':"1gGisF40WaqB3uWSBSTW9Y9CPERtoBT2X"})   
downloaded.GetContentFile('Datasets.zip') 

In [None]:
!unzip -qq 'Datasets.zip'

In [None]:
Data_path = "./"

In [None]:
#Few random Images
fig, ax = plt.subplots(2,2)
location=Data_path+'Sign_Language_Digits/Sign-Language-Digits-Dataset-master/Dataset/'
i=random.randint(0, 9)
img_id=18+i
img=imageio.imread(location+str(i)+"/IMG_11"+str(img_id)+".JPG")
ax[0,0].imshow(img)

i=random.randint(0, 9)
img_id=18+i
img=imageio.imread(location+str(i)+"/IMG_11"+str(img_id)+".JPG")
ax[0,1].imshow(img)

i=random.randint(0, 9)
img_id=18+i
img=imageio.imread(location+str(i)+"/IMG_11"+str(img_id)+".JPG")
ax[1,0].imshow(img)

i=random.randint(0, 9)
img_id=18+i
img=imageio.imread(location+str(i)+"/IMG_11"+str(img_id)+".JPG")
ax[1,1].imshow(img)

### Data importing
The image data importing is different from the standard numerical tabular data importing. Usually,
the collection of images are stored in their respective folders. In our input data set, there are ten
folders, and each folder has around 200 sample images, here the folder names are the labels. The
function flow_from_directory() iterates through the data directory and creates random batches for
us. There are three major steps.
* tf.keras.preprocessing.image.ImageDataGenerator – for generating batches of
tensor image data. It has image scaling, reshaping, and various other commands
* Flow_from_directory : Used for creating train, validation, and test data.
* fit_generator: This function takes input as data and fits the model.

Let us see how to write the code for it. The syntax is new. This is just a new way of importing.

In [None]:
########################
# Generators
########################

from tensorflow.keras.preprocessing.image import ImageDataGenerator

batch_size = 256
target_size = (100,100)


########################
# Data Directory
########################

data_dir = location  # this is the image datasets directory
location_1=Data_path+"Sign_language_digits_dataset_64_64\\"
########################
# Data generator : Any preprocessing options/steps can be  defined here
########################
datagen = ImageDataGenerator(rescale = 1./255,  # scaling the images matrix(standard preprocessing step)
                             validation_split=0.2) # set validation split

########################
# Train generator
########################
train_generator = datagen.flow_from_directory(
    data_dir,
    target_size=target_size,   # resizing the input images to a specific size
    batch_size=batch_size,     # Batch size, iterator will generate a random batch with this size
    color_mode = 'grayscale',  # keeping the channel to grayscale for easy calculations
    class_mode='categorical',
    shuffle=True,
    subset="training") 

########################
# Validation generator
########################

validation_generator = datagen.flow_from_directory(
    data_dir, # same directory as training data
    target_size=target_size,
    batch_size=batch_size,
    color_mode = 'grayscale', 
    class_mode='categorical',
    shuffle=True,
    subset="validation") # set as validation data

Code explanation

**data_dir = location**  This is the image datasets directory

**Data generator** Any preprocessing options/steps can be defined here

rescale- scaling the images matrix(standard preprocessing step)

validation_split - set validation split

**train_generator()** target_size - resizing the input images to a specific size

color_mode - keeping the channel to grayscale for easy
calculations

batch_size- Batch size, the iterator will generate a random batch
with this size

subset – Set as train or test data

The output shows the number of images from train data and test data.

### Model building and validation
Below is the code for building the CNN model.

In [None]:
model1 = Sequential()

# Convolution layer
model1.add(Conv2D(64, (3, 3), input_shape = (100, 100, 1), activation = 'relu'))

# Pooling layer
model1.add(MaxPooling2D(pool_size = (2, 2)))

# Adding second convolutional layer
model1.add(Conv2D(64, (3, 3), activation = 'relu'))
model1.add(MaxPooling2D(pool_size = (2, 2)))

# Adding third convolutional layer
model1.add(Conv2D(64, (3, 3), activation = 'relu'))
model1.add(MaxPooling2D(pool_size = (2, 2)))

# Flattening
model1.add(Flatten())

# Step 4 - Fully connected dense layers
model1.add(Dense(units = 256, activation = 'relu'))
model1.add(Dense(units = 10, activation = 'softmax'))

model1.summary()

In the above code, we have added three convolution layers. Each layer has 64 filters and each filter size id 3X3. We also added a few pooling layers. The final dense layer has 256 hidden nodes. Below is the summary of the above model and we need to take a look at the number of weights before going ahead with model building

There are nearly 1.7 million parameters. We will now compile and fit the model.  Training 1.7 million parameters will take much time. We will measure the time and we will also save the final model.

In [None]:
# model1 compilation
model1.compile(optimizer =SGD(lr=0.01, momentum = 0.9), loss = 'categorical_crossentropy', metrics = ['accuracy'])

########################
# fit model and train
########################

import time
start = time.time()

model1.fit_generator(
        train_generator, 
        steps_per_epoch = len(train_generator), #total number of batches in one train epoch(train observation/batch size; also called iterations per epoch)
        epochs=20,
        validation_data = validation_generator,
        validation_steps = len(validation_generator), #total number of batches in validation(validation observation/batch size)
        verbose=1)

model1.save_weights('m1_Sign_Language_20epochs.h5')

end = time.time()
print("Execution time is", int(end - start), "seconds")

We supply the train data, validation data, and epochs. Steps per epoch is the total number of batches in one train epoch. It is training records/batch size; also called iterations per epoch. 
We can see from the output that the model is overfitted. The model takes around 40 minutes to execute 20 epochs. 

Using the below code, we can load that model and execute two epochs on top of it. 

In [None]:
model1.load_weights(Data_path+"/Pre_trained_models/m1_Sign_Language_20epochs.h5")

model1.fit_generator(
        train_generator, 
        steps_per_epoch = len(train_generator), 
        epochs=2,
        validation_data = validation_generator,
        validation_steps = len(validation_generator), 
        verbose=1)

Using the below code, we can load the model with 50 epochs and execute two epochs on top of it. 

In [None]:
model1.load_weights(Data_path+"/Pre_trained_models/m1_Sign_Language_50epochs.h5")

model1.fit_generator(
        train_generator, 
        steps_per_epoch = len(train_generator), 
        epochs=2,
        validation_data = validation_generator,
        validation_steps = len(validation_generator), 
        verbose=1)

From the above results, it is evident that the model is overfitted. We can follow specific rules and build an optimal CNN model. In the next section, we will see how to configure the CNN model. 

## Scheming the ideal CNN Architecture
While building the CNN model in the previous example, we randomly took some filters and added three convolution layers. We have no logic or rationale for choosing the number of convolution layers and the number of filters. We can follow a useful trick while constructing the CNN model

In [None]:
########################
# Generators
########################

from tensorflow.keras.preprocessing.image import ImageDataGenerator

batch_size = 256
target_size = (64,64)


########################
# Data Directory
########################

data_dir = location  # this is the image datasets directory

########################
# Data generator : Any preprocessing options/steps can be  defined here
########################
datagen = ImageDataGenerator(rescale = 1./255,  # scaling the images matrix(standard preprocessing step)
                             validation_split=0.2) # set validation split

########################
# Train generator
########################
train_generator = datagen.flow_from_directory(
    data_dir,
    target_size=target_size,   # resizing the input images to a specific size
    batch_size=batch_size,     # Batch size, iterator will generate a random batch with this size
    color_mode = 'grayscale',  # keeping the channel to grayscale for easy calculations
    class_mode='categorical',
    shuffle=True,
    subset="training") # set as training data

########################
# Validation generator
########################

validation_generator = datagen.flow_from_directory(
    data_dir, # same directory as training data
    target_size=target_size,
    batch_size=batch_size,
    color_mode = 'grayscale', 
    class_mode='categorical',
    shuffle=True,
    subset="validation") # set as validation data

Now we will build a CNN model

In [None]:
model2 = Sequential()

# Convolution and Pooling layers
model2.add(Conv2D(16, (3, 3), input_shape = (64, 64, 1), activation = 'relu'))
model2.add(Conv2D(32, (3, 3), activation = 'relu'))
model2.add(MaxPooling2D(pool_size = (2, 2)))


model2.add(Conv2D(64, (3, 3), activation = 'relu'))
model2.add(Conv2D(64, (3, 3), activation = 'relu'))
model2.add(MaxPooling2D(pool_size = (2, 2)))

model2.add(Conv2D(128, (3, 3), activation = 'relu'))
model2.add(Conv2D(128, (3, 3), activation = 'relu'))
model2.add(MaxPooling2D(pool_size = (2, 2)))


# Flattening and  Fully connected dense layers
model2.add(Flatten())
model2.add(Dense(units = 32, activation = 'relu'))
model2.add(Dense(units = 10, activation = 'softmax'))

model2.summary()

There are 347,562 parameters in this model. We have now reduced the parameters from 1.7 million
to 0.34 million. We will compile and build the model using the below code

In [None]:
# model compilation
model2.compile(optimizer =SGD(lr=0.01, momentum = 0.9), loss = 'categorical_crossentropy', metrics = ['accuracy'])

########################
# fit model and train
########################

import time
start = time.time()

model2.fit_generator(
        train_generator, 
        steps_per_epoch = len(train_generator), #total number of batches in one train epoch(train observation/batch size; also called iterations per epoch)
        epochs=50,
        validation_data = validation_generator,
        validation_steps = len(validation_generator), #total number of batches in validation(validation observation/batch size)
        verbose=1)

model2.save_weights('m2_Receptive_field_50epochs.h5')

end = time.time()
print("Execution time is", int(end - start), "seconds")

Using the below code, we can load the model with 50 epochs and execute two epochs on top of it. 

In [None]:
model2.load_weights(Data_path+"Pre_trained_models/m2_Receptive_field_50epochs.h5")

model2.fit_generator(
        train_generator, 
        steps_per_epoch = len(train_generator), #total number of batches in one train epoch(train observation/batch size; also called iterations per epoch)
        epochs=2,
        validation_data = validation_generator,
        validation_steps = len(validation_generator), #total number of batches in validation(validation observation/batch size)
        verbose=1)

From the output, we can see that the above model is also overfitted(100% accuracy on train data and 74% on test data). We have two options now. We can either reduce the number of convolution layers and nodes; alternatively, we can keep the same architecture and introduce regularization. Usually, regularization is the preferred option. The below code introduces regularization in the above network diagram. We can also decrease the batch size to 128 or 64 to increase the iterations per epoch, which will help us in reducing the overall epochs.

 Below is the code for the model with
regularization

In [None]:
########################
# Generators
########################

from tensorflow.keras.preprocessing.image import ImageDataGenerator

batch_size = 64
target_size = (64,64)


########################
# Data Directory
########################

data_dir = location  # this is the image datasets directory

########################
# Data generator : Any preprocessing options/steps can be  defined here
########################
datagen = ImageDataGenerator(rescale = 1./255,  # scaling the images matrix(standard preprocessing step)
                             validation_split=0.2) # set validation split

########################
# Train generator
########################
train_generator = datagen.flow_from_directory(
    data_dir,
    target_size=target_size,   # resizing the input images to a specific size
    batch_size=batch_size,     # Batch size, iterator will generate a random batch with this size
    color_mode = 'grayscale',  # keeping the channel to grayscale for easy calculations
    class_mode='categorical',
    shuffle=True,
    subset="training") # set as training data

########################
# Validation generator
########################

validation_generator = datagen.flow_from_directory(
    data_dir, # same directory as training data
    target_size=target_size,
    batch_size=batch_size,
    color_mode = 'grayscale', 
    class_mode='categorical',
    shuffle=True,
    subset="validation") # set as validation data

In [None]:
model2 = Sequential()

# Convolution and Pooling layers
model2.add(Conv2D(16, (3, 3), input_shape = (64, 64, 1), activation = 'relu'))
model2.add(Conv2D(32, (3, 3), activation = 'relu'))
model2.add(MaxPooling2D(pool_size = (2, 2)))
model2.add(Dropout(0.5))

model2.add(Conv2D(64, (3, 3), activation = 'relu'))
model2.add(Conv2D(64, (3, 3), activation = 'relu'))
model2.add(MaxPooling2D(pool_size = (2, 2)))
model2.add(Dropout(0.5))

model2.add(Conv2D(128, (3, 3), activation = 'relu'))
model2.add(Conv2D(128, (3, 3), activation = 'relu'))
model2.add(MaxPooling2D(pool_size = (2, 2)))
model2.add(Dropout(0.5))


# Flattening and  Fully connected dense layers
model2.add(Flatten())
model2.add(Dense(units = 32, activation = 'relu'))
model2.add(Dropout(0.5))

model2.add(Dense(units = 10, activation = 'softmax'))

model2.summary()

We can see the dropout layers in the above model code. The same can be observed in its output.

We can now compile the model and execute it. We can also track the accuracy change in each epoch
by saving the model epochs in the history object.

In [None]:
# model compilation
model2.compile(optimizer =SGD(lr=0.01, momentum = 0.9), loss = 'categorical_crossentropy', metrics = ['accuracy'])

########################
# fit model and train
########################

import time
start = time.time()

history=model2.fit_generator(
        train_generator, 
        steps_per_epoch = len(train_generator), 
        epochs=50,
        validation_data = validation_generator,
        validation_steps = len(validation_generator), 
        verbose=1)

model2.save_weights('m2_Dropout_Rec_fld_50epochs.h5')

end = time.time()
print("Execution time is", int(end - start), "seconds")

In [None]:
plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
plt.title("Train and Valid Accuracy by Epochs")
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0,1])
plt.legend(loc='lower right')

In [None]:
model2.load_weights(Data_path+"/Pre_trained_models/m2_Dropout_Rec_fld_50epochs.h5")

history=model2.fit_generator(
        train_generator, 
        steps_per_epoch = len(train_generator), 
        epochs=2,
        validation_data = validation_generator,
        validation_steps = len(validation_generator), 
        verbose=1)

In [None]:
model2.load_weights(Data_path+"/Pre_trained_models/m2_Dropout_Rec_fld_100epochs.h5")

history=model2.fit_generator(
        train_generator, 
        steps_per_epoch = len(train_generator), 
        epochs=2,
        validation_data = validation_generator,
        validation_steps = len(validation_generator), 
        verbose=1)

In [None]:
model2.load_weights(Data_path+"/Pre_trained_models/m2_Dropout_Rec_fld_100epochs.h5")

history=model2.fit_generator(
        train_generator, 
        steps_per_epoch = len(train_generator), 
        epochs=2,
        validation_data = validation_generator,
        validation_steps = len(validation_generator), 
        verbose=1)

### Batch Normalization
Before building the model, we usually normalize or scale all the values of input values. For example, if some of the inputs are on the scale of thousands and others are in decimals, then we normalize the input. Different distributions for the inputs is generally considered as a problem, we normalize the inputs to avoid it. The formula for normalization is simple . This normalization of input will help us in faster calculations, and it also avoids the dominance of a few inputs. Input normalization is
applied for almost all the models. In some cases, the input normalization is done by default in the training data.

Below is a sample code to understand the number of parameter calculations.

In [None]:
model = Sequential()
model.add(Conv2D(1, (3, 3), input_shape = (32, 32, 1)))
model.add(BatchNormalization())

model.add(MaxPooling2D(pool_size = (2, 2)))
model.add(BatchNormalization())

model.add(Conv2D(2, (3, 3)))
model.add(BatchNormalization())

model.add(MaxPooling2D(pool_size = (2, 2)))
model.add(BatchNormalization())

model.add(Conv2D(3, (3, 3)))
model.add(BatchNormalization())

model.summary()

Now that we understand batch normalization, we will get back to our case study and include batch normalization layers. After adding batch normalization, we can afford to reduce the depth of the network. Below is the updated code with batch normalization layers

In [None]:
model3 = Sequential()

model3.add(Conv2D(16, (3, 3), input_shape = (64, 64, 1), activation = 'relu'))
model3.add(BatchNormalization())
model3.add(Dropout(0.5))

model3.add(Conv2D(16, (3, 3), activation = 'relu'))
model3.add(MaxPooling2D(pool_size = (2, 2)))
model3.add(BatchNormalization())
model3.add(Dropout(0.5))

model3.add(Conv2D(32, (3, 3), activation = 'relu'))
model3.add(MaxPooling2D(pool_size = (2, 2)))
model3.add(BatchNormalization())
model3.add(Dropout(0.5))

model3.add(Conv2D(32, (3, 3), activation = 'relu'))
model3.add(MaxPooling2D(pool_size = (2, 2)))
model3.add(BatchNormalization())
model3.add(Dropout(0.5))

model3.add(Conv2D(64, (3, 3), activation = 'relu'))
model3.add(BatchNormalization())
model3.add(Dropout(0.5))

model3.add(Flatten())
model3.add(Dense(units = 16, activation = 'relu'))
model3.add(Dropout(0.5))
model3.add(Dense(units = 10, activation = 'softmax'))

model3.summary()

We can observe the significant reduction of convolution layers in this code. The last model has 347,562 weights. This new model will have very less number of weights since we have removed two convolution layers with 128 nodes each.

From the model summary, we can see a considerable reduction in the number of parameters. Here we are attempting to build a simpler model that can give the same accuracy as the previous model.
Below is the code for training the model and plotting the results.

In [None]:
model3.compile(optimizer =SGD(lr=0.03, momentum = 0.9), loss = 'categorical_crossentropy', metrics = ['accuracy'])

########################
# fit model and train
########################

import time
start = time.time()

history=model3.fit_generator(
        train_generator, 
        steps_per_epoch = len(train_generator), 
        epochs=200,
        validation_data = validation_generator,
        validation_steps = len(validation_generator), 
        verbose=1)

model3.save_weights('m3_BatchNorm_200epochs.h5')

end = time.time()
print("Execution time is", int(end - start), "seconds")

In [None]:
## Plotting the results 
plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
plt.title("Train and Valid Accuracy by Epochs")
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0,1])
plt.legend(loc='lower right')

From the output we can see that accuracy has not improved a lot, we got the same accuracy of 80%
on train and test data. The point to note here is the simplicity of the model. We achieved the same
accuracy with almost seven times lesser parameters. As discussed earlier, batch normalization stand-
alone may not give us perfect results. Batch normalization along with other optimal parameters
gives us the best results. That brings us to the next topic on choosing the right optimizer

### Choosing the optimizers
We can try optimizers on a given dataset, but we can not be very sure that the accuracy will increase. These optimizers help us in reducing the overall execution time. In practice, we use almost all the optimizers. There is no preference. For a particular type of datasets, a certain type of optimizer works the best way. Below is the code for applying the adam optimizer on our data.

In [None]:
model3.load_weights(Data_path+"/Pre_trained_models/m3_BatchNorm_200epochs.h5")

history=model3.fit_generator(
        train_generator, 
        steps_per_epoch = len(train_generator), 
        epochs=2,
        validation_data = validation_generator,
        validation_steps = len(validation_generator), 
        verbose=1)

We can see from the above code that the model is kept the same. We changed the optimization
function from SGD to Adam.

In [None]:
model3.compile(optimizer =Adam(learning_rate=0.005, beta_1=0.9, beta_2=0.999), loss = 'categorical_crossentropy', metrics = ['accuracy'])

########################
# fit model and train
########################

import time
start = time.time()

history=model3.fit_generator(
        train_generator, 
        steps_per_epoch = len(train_generator), 
        epochs=100,
        validation_data = validation_generator,
        validation_steps = len(validation_generator), 
        verbose=1)

model3.save_weights('m3_BatchNorm_and_Adam_100epochs.h5')

end = time.time()
print("Execution time is", int(end - start), "seconds")

In [None]:
## Plotting the results 
plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
plt.title("Train and Valid Accuracy by Epochs")
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0,1])
plt.legend(loc='lower right')

From the output, we can see that the adam optimizer is working almost the same as SGD in our example. But we could achieve the same 80% accuracy within 100 epochs using Adam. There is a massive difference in execution time as well. In some cases, Adam gives a better result compared to SGD and RMSprop.

In [None]:
model3.load_weights(Data_path+"/Pre_trained_models/m3_BatchNorm_and_Adam_100epochs.h5")

history=model3.fit_generator(
        train_generator, 
        steps_per_epoch = len(train_generator), 
        epochs=3,
        validation_data = validation_generator,
        validation_steps = len(validation_generator), 
        verbose=1)