# Histopathologic Cancer Detection
Authors: Abdul Qadir, Asmaa Aly, Wei-Ting Yap, Nathan Torento

Course: Practical Data Science at Minerva Schools at KGI

# Introduction
This paper's dataset is taken from the Kaggle competition on Histopathologic Cancer Detection. It uses the PatchCamelyon (PCam) dataset, around 300k fixed-size histopathology (the study of tissue disease)
colored scans of lymph nodes all around the body. 

The specific challenge in the original dataset and competition is to train a model that can most accurately detect metastatic cancer.

The overall .zip file contains pictures and train-test csv files. The .csv files contains only two columns: id, and label, where the id contains the unique id or name of the picture, and the label determines whether the picture is indeed indicative of metastatic cancer.

This paper is created by Abdul Qadir, Asmaa Alaa Aly, Wei-Ting Yap, and Nathan Torento. For their and the reader's convenience, code and text are all written in this Google Colab notebook. It consists of four parts that they've split amongst themselves.

1. Data preparation and exploration

2. Data pre-processing

3. Model creation assessment

4. Presentation of findings

# 1. Data preparation and exploration

### Downloading Dataset (Guide)

We are not allowed to share the dataset ourselves. Simply follow the instructions below, and at some point, you will be gain permission to download the data from the Kaggle website yourself.

In [1]:
from numpy.random import seed
seed(101)

import pandas as pd
import numpy as np


import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.layers import Dense, Dropout, Flatten, Activation
from tensorflow.keras.models import Sequential
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint
from tensorflow.keras.optimizers import Adam

import os
import cv2

from sklearn.utils import shuffle
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
import itertools
import shutil
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
# Install a kaggle package to download the dataset
! pip install -q kaggle
! pip install --upgrade --force-reinstall --no-deps kaggle

Collecting kaggle
[?25l  Downloading https://files.pythonhosted.org/packages/fc/14/9db40d8d6230655e76fa12166006f952da4697c003610022683c514cf15f/kaggle-1.5.8.tar.gz (59kB)
[K     |█████▌                          | 10kB 15.6MB/s eta 0:00:01[K     |███████████                     | 20kB 2.8MB/s eta 0:00:01[K     |████████████████▋               | 30kB 3.6MB/s eta 0:00:01[K     |██████████████████████▏         | 40kB 4.0MB/s eta 0:00:01[K     |███████████████████████████▊    | 51kB 3.3MB/s eta 0:00:01[K     |████████████████████████████████| 61kB 2.2MB/s 
[?25hBuilding wheels for collected packages: kaggle
  Building wheel for kaggle (setup.py) ... [?25l[?25hdone
  Created wheel for kaggle: filename=kaggle-1.5.8-cp36-none-any.whl size=73275 sha256=0d1a7b3f6e51336b1c692678919ae6f5af59f5c755059326981aee1635c54ea8
  Stored in directory: /root/.cache/pip/wheels/94/a7/09/68dc83c7c14fdbdf5d3f2b2da5b87e587bfc1e85df69b1130c
Successfully built kaggle
Installing collected packages: k

Follow the steps in the following link. 

You should have a **kaggle.json** file at the end of it.

https://www.kaggle.com/general/74235

In [None]:
# Run this cell, then upload your "kaggle.json" file when prompted.

from google.colab import files
files.upload()

Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"ntorento","key":"323b6b306242b2dbea4604c48c54ee8d"}'}

In [None]:
IMAGE_SIZE = 96
IMAGE_CHANNELS = 3

SAMPLE_SIZE = 80000 # the number of images we use from each of the two classes

In [None]:
# Below is code to gain permission to download the dataset

! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json
#! kaggle datasets list

mkdir: cannot create directory ‘/root/.kaggle’: File exists


In [None]:
# Download the desired dataset (in the default zip format)

! kaggle competitions download -c histopathologic-cancer-detection

Downloading histopathologic-cancer-detection.zip to /content
100% 6.30G/6.31G [01:35<00:00, 80.1MB/s]
100% 6.31G/6.31G [01:35<00:00, 70.7MB/s]


In [None]:
# Unzip and load the dataset onto your colab runtime
import zipfile
zip = zipfile.ZipFile('histopathologic-cancer-detection.zip')
zip.extractall()

In [None]:
# See image count in each folder?
print(len(os.listdir('../content/train')))
print(len(os.listdir('../content/test')))

220025
57458


# 2. Data pre-processing

Now that the data has been properly loaded and set-up, we must now pre-process our data: in our case, we mainnly subset the data, augment the images, and split it into train and test.

Note: The data takes hours to download in its full size, leaving a high possibility of crashing the kernel, not to mention the time required to train the model. Thus, we slightly modified someone else's preprocessing instructions throughout this entire stage to get a smaller subset of the data. This will make it faster and easier for us to run and train our model.

https://www.kaggle.com/vbookshelf/cnn-how-to-use-160-000-images-without-crashing

In [None]:
# Create a Dataframe containing all images
data = pd.read_csv('../content/train_labels.csv')

# Removing this image because it caused a training error previously
data[data['id'] != 'dd6dfed324f9fcb6f93f46f32fc800f2ec196be2']

# Removing this image because it's black
data[data['id'] != '9369c7278ec8bcc6c880d99194de09fc2bd4efbe']

print(data.shape)

(220025, 2)


Justification for augmentation

This Kaggle challenge is a Machine Learning challenge. Machine learning however, requires plenty and diverse training data to accurately predict future data points without overfitting or underfitting. We have a lot of data points already, but also want to diversity them to prevent overfitting to the original data that may be too similar to each other already. We decided to augment the images in order to increase the diversity of images as inspired by this github repo below.

https://github.com/aleju/imgaug

In [None]:
#Function for augmenting data
from skimage.transform import rotate, AffineTransform
import cv2
from skimage.util import random_noise
import random
import os
from skimage import io
from skimage import img_as_ubyte

ORIGINAL_SIZE = 96      # original size of the images - do not change

# AUGMENTATION VARIABLES
CROP_SIZE = 90          # final size after crop
RANDOM_ROTATION = 3    # range (0-180), 180 allows all rotation variations, 0=no change
RANDOM_SHIFT = 2        # center crop shift in x and y axes, 0=no change. This cannot be more than (ORIGINAL_SIZE - CROP_SIZE)//2 
RANDOM_BRIGHTNESS = 7  # range (0-100), 0=no change
RANDOM_CONTRAST = 5    # range (0-100), 0=no change
RANDOM_90_DEG_TURN = 1  # 0 or 1= random turn to left or right

def readCroppedImage(path, augmentations = True):
    '''
    This is a custom function to convert an input image, augment it through
    random rotation, random x or y shift, random cropping, random flipping, 
    random changes in brightness and contrast, and returning it as an rgb tensor.
    '''
    # augmentations parameter is included for counting statistics from images, where we don't want augmentations
    
    # OpenCV reads the image in bgr format by default
    bgr_img = cv2.imread(path)
    # We flip it to rgb for visualization purposes
    b,g,r = cv2.split(bgr_img)
    rgb_img = cv2.merge([r,g,b])
    
    if(not augmentations):
        return rgb_img / 255
    
    #random rotation
    rotation = random.randint(-RANDOM_ROTATION,RANDOM_ROTATION)
    if(RANDOM_90_DEG_TURN == 1):
        rotation += random.randint(-1,1) * 90
    M = cv2.getRotationMatrix2D((48,48),rotation,1)   # the center point is the rotation anchor
    rgb_img = cv2.warpAffine(rgb_img,M,(96,96))
    
    #random x,y-shift
    x = random.randint(-RANDOM_SHIFT, RANDOM_SHIFT)
    y = random.randint(-RANDOM_SHIFT, RANDOM_SHIFT)
    
    # crop to center and normalize to 0-1 range
    start_crop = (ORIGINAL_SIZE - CROP_SIZE) // 2
    end_crop = start_crop + CROP_SIZE
    rgb_img = rgb_img[(start_crop + x):(end_crop + x), (start_crop + y):(end_crop + y)] / 255
    
    # Random flip
    flip_hor = bool(random.getrandbits(1))
    flip_ver = bool(random.getrandbits(1))
    if(flip_hor):
        rgb_img = rgb_img[:, ::-1]
    if(flip_ver):
        rgb_img = rgb_img[::-1, :]
        
    # Random brightness
    br = random.randint(-RANDOM_BRIGHTNESS, RANDOM_BRIGHTNESS) / 100.
    rgb_img = rgb_img + br
    
    # Random contrast
    cr = 1.0 + random.randint(-RANDOM_CONTRAST, RANDOM_CONTRAST) / 100.
    rgb_img = rgb_img * cr
    
    # clip values to 0-1 range
    rgb_img = np.clip(rgb_img, 0, 1.0)
    
    return img_as_ubyte(rgb_img)

    #Augment test images randomly

images_path="train" #path to original images
augmented_path="train" # path to store augmented images
images=[] # to store paths of images from folder

for im in os.listdir(images_path):  # read image name from folder and append its path into "images" array     
    images.append(os.path.join(images_path,im))

images_to_generate=10000 #you can change this value according to your requirement

for i in range(images_to_generate):    
    image=random.choice(images)
    id = image[6:-4]
    label = data[data['id'] == id].iloc[0]['label']
    data = data.append({"id":'augmented_'+id,'label':label},ignore_index=True)
    transformed_image= readCroppedImage(image)
    new_image_path= "train/augmented_%s.tif" %(id)
    cv2.imwrite(new_image_path, transformed_image) # save transformed image to path

#Save new label file which has the augmented images
data.to_csv('new_train_labels.csv')

In [None]:
# Load the new csv that now includes the augmented images 
df_data = pd.read_csv('../content/new_train_labels.csv')

# Check the class distribution
df_data['label'].value_counts()

0    137223
1     93627
Name: label, dtype: int64

#### Balance the target distribution
As decided earlier with the variable SAMPLE_SIZE, we will subset our original data into 160000 images half labelled 0, the other labelled 1.

In [None]:
# take a random sample of class 0 with size equal to num samples in class 1
df_0 = df_data[df_data['label'] == 0].sample(SAMPLE_SIZE, random_state = 101)
# filter out class 1
df_1 = df_data[df_data['label'] == 1].sample(SAMPLE_SIZE, random_state = 101)

# concat the dataframes
df_data = pd.concat([df_0, df_1], axis=0).reset_index(drop=True)
# shuffle
df_data = shuffle(df_data)

df_data['label'].value_counts()

1    80000
0    80000
Name: label, dtype: int64

In [None]:
df_data.head()

Unnamed: 0.1,Unnamed: 0,id,label
44093,199391,81623b9afe6c5f48ceb3ea0819b3880bccbeb628,0
125344,189356,eb80ed7ca26a6a72ae2ef08951f15f0a789ada2f,1
63003,221637,augmented_2085718fe57cdd057aac28b0f270a2052250...,0
142531,10013,9c8a15a45c21b51c911d9978ed019381a8c37d6e,1
96910,163741,8645c490d084778555ca96d8a7e92acee5987fa4,1


In [None]:
# train_test_split

# stratify=y creates a balanced validation set.
y = df_data['label']

df_train, df_val = train_test_split(df_data, test_size=0.10, random_state=101, stratify=y)

print(df_train.shape)
print(df_val.shape)

(144000, 3)
(16000, 3)


In [None]:
# Check the training set counts
df_train['label'].value_counts()

1    72000
0    72000
Name: label, dtype: int64

In [None]:
# Check the validation set counts
df_val['label'].value_counts()

1    8000
0    8000
Name: label, dtype: int64

### Create a Directory Structure

In [None]:
# Create a new directory
base_dir = 'base_dir'
os.mkdir(base_dir)


#[CREATE FOLDERS INSIDE THE BASE DIRECTORY]

# now we create 2 folders inside 'base_dir':

# train_dir
    # a_no_tumor_tissue
    # b_has_tumor_tissue

# val_dir
    # a_no_tumor_tissue
    # b_has_tumor_tissue


# create a path to 'base_dir' to which we will join the names of the new folders
# train_dir
train_dir = os.path.join(base_dir, 'train_dir')
os.mkdir(train_dir)

# val_dir
val_dir = os.path.join(base_dir, 'val_dir')
os.mkdir(val_dir)



# [CREATE FOLDERS INSIDE THE TRAIN AND VALIDATION FOLDERS]
# Inside each folder we create seperate folders for each class

# create new folders inside train_dir
no_tumor_tissue = os.path.join(train_dir, 'a_no_tumor_tissue')
os.mkdir(no_tumor_tissue)
has_tumor_tissue = os.path.join(train_dir, 'b_has_tumor_tissue')
os.mkdir(has_tumor_tissue)


# create new folders inside val_dir
no_tumor_tissue = os.path.join(val_dir, 'a_no_tumor_tissue')
os.mkdir(no_tumor_tissue)
has_tumor_tissue = os.path.join(val_dir, 'b_has_tumor_tissue')
os.mkdir(has_tumor_tissue)

In [None]:
# check that the folders have been created
os.listdir('base_dir/train_dir')

['a_no_tumor_tissue', 'b_has_tumor_tissue']

### Transfer the images into the folders

In [None]:
# Set the id as the index in df_data
df_data.set_index('id', inplace=True)

In [None]:
for image in train_list:
    
    # the id in the csv file does not have the .tif extension therefore we add it here
    fname = image + '.tif'
    # get the label for a certain image
    target = df_data.loc[image,'label']
    
    print(int(target))
    break

0


In [None]:
train_list[1]

'3921341dcabd03dc16dfd373e8821e490a2479bd'

In [None]:
df_data.head(n=1)

Unnamed: 0_level_0,Unnamed: 0,label
id,Unnamed: 1_level_1,Unnamed: 2_level_1
81623b9afe6c5f48ceb3ea0819b3880bccbeb628,199391,0


In [None]:
# Get a list of train and val images
train_list = list(df_train['id'])
val_list = list(df_val['id'])


# Transfer the train images
count = 0
for image in train_list:
    count += 1
    # the id in the csv file does not have the .tif extension therefore we add it here
    fname = image + '.tif'
    # get the label for a certain image
    target = df_data.loc[image,'label']

    # these must match the folder names
    if target.any() == 0:
        label = 'a_no_tumor_tissue'
    if target.any() == 1:
        label = 'b_has_tumor_tissue'
    
    # source path to image
    src = os.path.join('../content/train', fname)
    # destination path to image
    dst = os.path.join(train_dir, label, fname)
    # copy the image from the source to the destination
    shutil.copyfile(src, dst)


# Transfer the val images

for image in val_list:
    
    # the id in the csv file does not have the .tif extension therefore we add it here
    fname = image + '.tif'
    # get the label for a certain image
    target = df_data.loc[image,'label']
    
    # these must match the folder names
    if target.any() == 0:
        label = 'a_no_tumor_tissue'
    if target.any() == 1:
        label = 'b_has_tumor_tissue'
    

    # source path to image
    src = os.path.join('../content/train', fname)
    # destination path to image
    dst = os.path.join(val_dir, label, fname)
    # copy the image from the source to the destination
    shutil.copyfile(src, dst)

In [None]:
# check how many train images we have in each folder
print(len(os.listdir('base_dir/train_dir/a_no_tumor_tissue')))
print(len(os.listdir('base_dir/train_dir/b_has_tumor_tissue')))


In [None]:
# check how many val images we have in each folder
print(len(os.listdir('base_dir/val_dir/a_no_tumor_tissue')))
print(len(os.listdir('base_dir/val_dir/b_has_tumor_tissue')))


### Set Up the Generators

In [None]:
train_path = 'base_dir/train_dir'
valid_path = 'base_dir/val_dir'

num_train_samples = len(df_train)
num_val_samples = len(df_val)
train_batch_size = 10
val_batch_size = 10


train_steps = np.ceil(num_train_samples / train_batch_size)
val_steps = np.ceil(num_val_samples / val_batch_size)

In [None]:
datagen = ImageDataGenerator(rescale=1.0/255)

train_gen = datagen.flow_from_directory(train_path,
                                        target_size=(IMAGE_SIZE,IMAGE_SIZE),
                                        batch_size=train_batch_size,
                                        class_mode='categorical')

val_gen = datagen.flow_from_directory(valid_path,
                                        target_size=(IMAGE_SIZE,IMAGE_SIZE),
                                        batch_size=val_batch_size,
                                        class_mode='categorical')

Found 144000 images belonging to 2 classes.
Found 16000 images belonging to 2 classes.
Found 16000 images belonging to 2 classes.


# 3. Model creation assessment

This is the stage where we create a model that trains on the data.
In our case, we chose to manually create a neural network model implemented with the keras library. We created a feedforward neural network with 24 layers all using the relu activation function.

In [None]:
# Model architecture
kernel_size = (3,3)
pool_size= (2,2)
first_filters = 32
second_filters = 64
third_filters = 128

dropout_conv = 0.3
dropout_dense = 0.3

# Neural network creation and layer adding
model = Sequential()
model.add(Conv2D(first_filters, kernel_size, activation = 'relu', input_shape = (96, 96, 3)))
model.add(Conv2D(first_filters, kernel_size, activation = 'relu'))
model.add(Conv2D(first_filters, kernel_size, activation = 'relu'))
model.add(MaxPooling2D(pool_size = pool_size)) 
model.add(Dropout(dropout_conv))

model.add(Conv2D(second_filters, kernel_size, activation ='relu'))
model.add(Conv2D(second_filters, kernel_size, activation ='relu'))
model.add(Conv2D(second_filters, kernel_size, activation ='relu'))
model.add(MaxPooling2D(pool_size = pool_size))
model.add(Dropout(dropout_conv))

model.add(Conv2D(third_filters, kernel_size, activation ='relu'))
model.add(Conv2D(third_filters, kernel_size, activation ='relu'))
model.add(Conv2D(third_filters, kernel_size, activation ='relu'))
model.add(MaxPooling2D(pool_size = pool_size))
model.add(Dropout(dropout_conv))

model.add(Flatten())
model.add(Dense(256, activation = "relu"))
model.add(Dropout(dropout_dense))
model.add(Dense(2, activation = "softmax"))

# Key details for each layer
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 94, 94, 32)        896       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 92, 92, 32)        9248      
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 90, 90, 32)        9248      
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 45, 45, 32)        0         
_________________________________________________________________
dropout (Dropout)            (None, 45, 45, 32)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 43, 43, 64)        18496     
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 41, 41, 64)        3

### Train the Model

In [None]:
# Compile the model
model.compile(Adam(lr=0.0001), loss='binary_crossentropy', 
              metrics=['accuracy'])

In [None]:
# Get the labels that are associated with each index
print(val_gen.class_indices)

{'a_no_tumor_tissue': 0, 'b_has_tumor_tissue': 1}


In [None]:
# Train through 20 epochs in hopes to improve accuracy
filepath = "model.h5"
checkpoint = ModelCheckpoint(filepath, monitor='val_accuracy', verbose=1, 
                             save_best_only=True, mode='max')

reduce_lr = ReduceLROnPlateau(monitor='val_accuracy', factor=0.5, patience=2, 
                                   verbose=0, mode='max', min_lr=0.00001)
                                                      
callbacks_list = [checkpoint, reduce_lr]

history = model.fit_generator(train_gen, steps_per_epoch=train_steps, 
                    validation_data=val_gen,
                    validation_steps=val_steps,
                    epochs=20, verbose=0,
                   callbacks=callbacks_list)


Epoch 00001: val_accuracy improved from -inf to 0.91037, saving model to model.h5

Epoch 00002: val_accuracy did not improve from 0.91037

Epoch 00003: val_accuracy improved from 0.91037 to 0.91219, saving model to model.h5

Epoch 00004: val_accuracy improved from 0.91219 to 0.92200, saving model to model.h5

Epoch 00005: val_accuracy improved from 0.92200 to 0.92625, saving model to model.h5

Epoch 00006: val_accuracy improved from 0.92625 to 0.92706, saving model to model.h5

Epoch 00007: val_accuracy improved from 0.92706 to 0.92813, saving model to model.h5

Epoch 00008: val_accuracy did not improve from 0.92813

Epoch 00009: val_accuracy improved from 0.92813 to 0.93831, saving model to model.h5

Epoch 00010: val_accuracy did not improve from 0.93831

Epoch 00011: val_accuracy did not improve from 0.93831

Epoch 00012: val_accuracy did not improve from 0.93831

Epoch 00013: val_accuracy improved from 0.93831 to 0.94275, saving model to model.h5

Epoch 00014: val_accuracy did not 

In [None]:
from keras.applications.vgg16 import VGG16, preprocess_input

# VGG model without the last classifier layers (include_top = False)
vgg16_model = VGG16(include_top = False,
                    input_shape = (96,96,3),
                    #weights='../input/VGG16weights/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5')
                    weights = 'imagenet')
    
# Freeze the layers 
for layer in vgg16_model.layers[:-12]:
    layer.trainable = False
    
# Check the trainable status of the individual layers
for layer in vgg16_model.layers:
    print(layer, layer.trainable)

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
<tensorflow.python.keras.engine.input_layer.InputLayer object at 0x7f8912ae2cc0> False
<tensorflow.python.keras.layers.convolutional.Conv2D object at 0x7f872cedfe80> False
<tensorflow.python.keras.layers.convolutional.Conv2D object at 0x7f889e42f2b0> False
<tensorflow.python.keras.layers.pooling.MaxPooling2D object at 0x7f889e42ff60> False
<tensorflow.python.keras.layers.convolutional.Conv2D object at 0x7f889e10f898> False
<tensorflow.python.keras.layers.convolutional.Conv2D object at 0x7f872d0036a0> False
<tensorflow.python.keras.layers.pooling.MaxPooling2D object at 0x7f889e1109e8> False
<tensorflow.python.keras.layers.convolutional.Conv2D object at 0x7f889e0dca58> True
<tensorflow.python.keras.layers.convolutional.Conv2D object at 0x7f872d0fab38> True
<tensorflow.python.keras.layers.convolutional.Conv2D object at 0x7f889e10ccf8> True
<tensorflow.

In [None]:
from keras.models import Sequential
from keras.layers import Dense,Flatten,Dropout
from keras import optimizers

model = Sequential()
model.add(vgg16_model)
model.add(Flatten())
model.add(Dense(1024, activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(512, activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(2, activation="softmax"))

In [None]:
model.compile(loss='binary_crossentropy',optimizer=optimizers.SGD(lr=0.00001, momentum=0.95),metrics=['accuracy'])

In [None]:
### Evaluate the model using the val set

filepath_2 = "model.h5_2"
checkpoint_2 = ModelCheckpoint(filepath_2, monitor='val_accuracy', verbose=1, 
                             save_best_only=True, mode='max')

reduce_lr_2 = ReduceLROnPlateau(monitor='val_accuracy', factor=0.5, patience=2, 
                                   verbose=0, mode='max', min_lr=0.00001)
                              
                              
callbacks_list_2 = [checkpoint_2, reduce_lr_2]

history_2 =  model.fit_generator(train_gen, steps_per_epoch=train_steps, 
                    validation_data=val_gen,
                    validation_steps=val_steps,
                    epochs=20, verbose=0,
                   callbacks=callbacks_list_2)


Epoch 00001: val_accuracy improved from -inf to 0.89844, saving model to model.h5_2
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
INFO:tensorflow:Assets written to: model.h5_2/assets

Epoch 00002: val_accuracy improved from 0.89844 to 0.92106, saving model to model.h5_2
INFO:tensorflow:Assets written to: model.h5_2/assets

Epoch 00003: val_accuracy did not improve from 0.92106

Epoch 00004: val_accuracy improved from 0.92106 to 0.93094, saving model to model.h5_2
INFO:tensorflow:Assets written to: model.h5_2/assets

Epoch 00005: val_accuracy improved from 0.93094 to 0.93831, saving model to model.h5_2
INFO:tensorflow:Assets written to: model.h5_2/assets

Epoch 00006: val_accuracy did not improve from 0.93831

Epoch 00007: val_accuracy did not improve from 0.93831

Epoch 00008: val_accuracy improved f

In [None]:
model.save("vgg16_model.h5")

#### Save the History Logs for both models as CSV files

In [None]:
import pandas as pd

# convert the history.history dict to a pandas DataFrame:     
hist_df_keras = pd.DataFrame(history.history) 
hist_df_vgg16 = pd.DataFrame(history_2.history) 
# or save to csv: 
hist_csv_file = 'history_keras_model.csv'
with open(hist_csv_file, mode='w') as f:
    hist_df_keras.to_csv(f)

hist_csv_file_16 = 'history_vgg16_model.csv'
with open(hist_csv_file_16, mode='w') as f:
    hist_df_vgg16.to_csv(f)

# 4. Presentation of findings

This is the section where we analyze the model.