

# Biweekly Report

# Jake Watts

# Improving Model Performance


Here I implement a much deeper model in order to improve model performance. In my first model I used two convolution layers. Whereas this model contains model 11 layers. In addition this model contains data standardization, data augmentation, validation data, image padding, dropout, batch normalization, and dense layers. In addition this model is created with tensorflow while the first model was created using pytorch, which gave me a better understanding of how the frameworks compare as I haven't used either before.

In [None]:
import tensorflow as tf
import numpy as np
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Dense, Flatten, Dropout, Activation, BatchNormalization
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.regularizers import l2
from tensorflow.keras.callbacks import ReduceLROnPlateau
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import pandas as pd
import math
from keras.datasets import mnist

Loading in trading data and reshaping.

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape((x_train.shape[0], 28, 28, 1))
x_test = x_test.reshape((x_test.shape[0], 28, 28, 1))

Creating validation data.

In [None]:
random_seed = 2
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size = int(len(x_train)*0.05), random_state=random_seed)

Padding data to go from 28x28 images to 32x32.

In [None]:
x_train = np.pad(x_train, ((0,0), (2,2), (2,2), (0,0)), 'constant')
x_val = np.pad(x_val, ((0,0), (2,2), (2,2), (0,0)), 'constant')
x_test = np.pad(x_test, ((0,0), (2,2), (2,2), (0,0)), 'constant')

Standardizing data and adding une-hot encoding for the labels

In [None]:
mean_px = x_train.mean().astype(np.float32)
std_px = x_train.std().astype(np.float32)
x_train = (x_train - mean_px)/(std_px)

mean_px = x_test.mean().astype(np.float32)
std_px = x_test.std().astype(np.float32)
x_test = (x_test - mean_px)/(std_px)

mean_px = x_val.mean().astype(np.float32)
std_px = x_val.std().astype(np.float32)
x_val = (x_val - mean_px)/(std_px)

y_train = to_categorical(y_train, num_classes = 10)
y_val = to_categorical(y_val, num_classes = 10)

Adding data augmentation

In [None]:
datagen = ImageDataGenerator(
        featurewise_center = False,  # set input mean to 0 over the dataset
        samplewise_center = False,  # set each sample mean to 0
        featurewise_std_normalization = False,  # divide inputs by std of the dataset
        samplewise_std_normalization = False,  # divide each input by its std
        zca_whitening = False,  # apply ZCA whitening
        rotation_range = 10,  # randomly rotate images in the range (degrees, 0 to 180)
        zoom_range = 0.1, # Randomly zoom image 
        width_shift_range = 0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range = 0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip = False,  # randomly flip images
        vertical_flip = False)  # randomly flip images

datagen.fit(x_train)

Setting the learning rate to reduce val_loss stops improving over a period of two epochs. Will reduce by a factor of 0.2.

In [None]:
variable_learning_rate = ReduceLROnPlateau(monitor='val_loss', factor = 0.2, patience = 2)

The following is the most important part of the code for implementing the nueral network. The structure of this model was created by Jay Gupta for a kaggle competition and the model is called LeNet5v2. The architecture is as follows:

$ConvNet \rightarrow ConvNet \rightarrow Pool \rightarrow (Dropout) \rightarrow ConvNet \rightarrow Pool \rightarrow (Dropout) \rightarrow (Flatten) \rightarrow FullyConnected \rightarrow FullyConnected \rightarrow Softmax$

This network is different from the netowrk I implemented previously in several ways. Their are significantly more layers in this model which can hopefully pick up on more complex patterns. There are also four rather than two convulutional layers with 32 filters for the first two layers and 64 for the second two.

Batch normalization is added for speed and stability while dropout is added to prevent overfitting. Max pooling with a stride of 2 is the same as the previous model and kernl-size in the convulutional layers is still 5 with a stride of 1.

In [None]:
def LeNet5v2(input_shape = (32, 32, 1), classes = 10):
    """
    Implementation of a modified LeNet-5.
    Only those layers with learnable parameters are counted in the layer numbering.
    
    Arguments:
    input_shape -- shape of the images of the dataset
    classes -- integer, number of classes

    Returns:
    model -- a Model() instance in Keras
    """
    
    model = Sequential([
        
    # Layer 1
    Conv2D(filters = 32, kernel_size = 5, strides = 1, activation = 'relu', input_shape = (32,32,1), kernel_regularizer=l2(0.0005), name = 'convolution_1'),
    
    # Layer 2
    Conv2D(filters = 32, kernel_size = 5, strides = 1, name = 'convolution_2', use_bias=False),
    
    # Layer 3    
    BatchNormalization(name = 'batchnorm_1'),
        
    # -------------------------------- #  
    Activation("relu"),
    MaxPooling2D(pool_size = 2, strides = 2, name = 'max_pool_1'),
    Dropout(0.25, name = 'dropout_1'),
    # -------------------------------- #  
        
    # Layer 3
    Conv2D(filters = 64, kernel_size = 3, strides = 1, activation = 'relu', kernel_regularizer=l2(0.0005), name = 'convolution_3'),
        
    # Layer 4
    Conv2D(filters = 64, kernel_size = 3, strides = 1, name = 'convolution_4', use_bias=False),
        
    # Layer 5
    BatchNormalization(name = 'batchnorm_2'),
        
    # -------------------------------- #  
    Activation("relu"),
    MaxPooling2D(pool_size = 2, strides = 2, name = 'max_pool_2'),
    Dropout(0.25, name = 'dropout_2'),
    Flatten(name = 'flatten'),
    # -------------------------------- #  
        
    # Layer 6
    Dense(units = 256, name = 'fully_connected_1', use_bias=False),
        
    # Layer 7
    BatchNormalization(name = 'batchnorm_3'),
    
    # -------------------------------- #  
    Activation("relu"),
    # -------------------------------- #  
        
    # Layer 8
    Dense(units = 128, name = 'fully_connected_2', use_bias=False),
        
    # Layer 9
    BatchNormalization(name = 'batchnorm_4'),
        
    # -------------------------------- #  
    Activation("relu"),
    # -------------------------------- #  
        
    # Layer 10
    Dense(units = 84, name = 'fully_connected_3', use_bias=False),
        
    # Layer 11
    BatchNormalization(name = 'batchnorm_5'),
        
    # -------------------------------- #  
    Activation("relu"),
    Dropout(0.25, name = 'dropout_3'),
    # -------------------------------- #  

    # Output
    Dense(units = 10, activation = 'softmax', name = 'output')
        
    ])
    
    model._name = 'LeNet5v2'

    return model

Specifying input shape and classes for training

In [None]:
LeNet5Model = LeNet5v2(input_shape = (32, 32, 1), classes = 10)

Specifying the adam optimizer and cross entropy loss funxtion which were also used in the last model.

In [None]:
LeNet5Model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

Print the model lets us see the details, shaoe abd number of parameters for each layer of the netowrk.

In [None]:
LeNet5Model.summary()

Model: "LeNet5v2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
convolution_1 (Conv2D)       (None, 28, 28, 32)        832       
_________________________________________________________________
convolution_2 (Conv2D)       (None, 24, 24, 32)        25600     
_________________________________________________________________
batchnorm_1 (BatchNormalizat (None, 24, 24, 32)        128       
_________________________________________________________________
activation_5 (Activation)    (None, 24, 24, 32)        0         
_________________________________________________________________
max_pool_1 (MaxPooling2D)    (None, 12, 12, 32)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 12, 12, 32)        0         
_________________________________________________________________
convolution_3 (Conv2D)       (None, 10, 10, 64)        184

Training data using validation data and training for 30 epochs, longer than the 10 epochs in the previous model.

In [None]:
history = LeNet5Model.fit(x_train, y_train, epochs = 30, batch_size = 64, callbacks = [variable_learning_rate], validation_data = (x_val,y_val))


Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


One-hot encoding y_test

In [None]:
y_test = to_categorical(y_test, num_classes = 10)

Testing the model to check accuracy

In [None]:
print("Evaluate on test data")
results = LeNet5Model.evaluate(x_test, y_test, batch_size=128)
print("test loss, test acc:", results)

Evaluate on test data
test loss, test acc: [0.018291659653186798, 0.9959999918937683]


The testing accuracy is 99.6% which is an improvement over the previous much simpler model. The test accucracy is also equivalent to the training accuracy which indicates that adding in dropout to the network helped to prevent overfitting.

# Summary

Overall it is not too big of a surprise that this model performed with an accuracy rate that was 1.6% higher than the previous model as there was a great number of features added in to improve the model. If you care about increasing accuracy above 98% and have more time to implement and train a better model than this model is definitely better. Having now worked with both pytorch and tensflow frameworks I think I prefer tensorflow slightly but I want to continue trying out both frameworks in future projects.