The networks tried earlier didn't gave the best results. So, let's try running the Inception V3 network inspired by https://github.com/stratospark/food-101-keras/blob/master/Food%20Classification%20with%20Deep%20Learning%20in%20Keras.ipynb

Importing all the necessary packages

In [None]:
import matplotlib.pyplot as plt
import matplotlib.image as img
import numpy as np
from scipy.misc import imresize

%matplotlib inline

import os
from os import listdir
from os.path import isfile, join
import shutil
import stat
import collections
from collections import defaultdict

from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets

import h5py
from sklearn.model_selection import train_test_split
from keras.utils.np_utils import to_categorical
from keras.applications.inception_v3 import preprocess_input
from keras.models import load_model

We used image augmentation in modeling.py file we can use multiprocessing.pool to accelerate image augmentation during the training.

The original code loads all the data in the memory in one go, instead we are interested in loading the data in batches as we are working on a much smaller RAM size.

In [None]:
%%time
from keras.applications.inception_v3 import InceptionV3
from keras.applications.inception_v3 import preprocess_input, decode_predictions
from keras.preprocessing import image
from keras.layers import Input

from keras.models import Sequential, Model, load_model
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D, GlobalAveragePooling2D, AveragePooling2D
from keras.layers.normalization import BatchNormalization
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ModelCheckpoint, CSVLogger, LearningRateScheduler, ReduceLROnPlateau
from keras.optimizers import SGD
from keras.regularizers import l2
import keras.backend as K
import math


K.clear_session()

n_classes = 101

base_model = InceptionV3(weights='imagenet', include_top=False, input_tensor=Input(shape=(299, 299, 3)))
x = base_model.output
x = AveragePooling2D(pool_size=(8, 8))(x)
x = Dropout(.4)(x)
x = Flatten()(x)
predictions = Dense(n_classes, init='glorot_uniform', W_regularizer=l2(.0005), activation='softmax')(x)

model = Model(input=base_model.input, output=predictions)

opt = SGD(lr=.01, momentum=.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

checkpointer = ModelCheckpoint(filepath='../../model/model4.{epoch:02d}-{val_loss:.2f}.hdf5', verbose=1, save_best_only=True)
csv_logger = CSVLogger('../../logs/model4.log')

def schedule(epoch):
    if epoch < 15:
        return .01
    elif epoch < 28:
        return .002
    else:
        return .0004
lr_scheduler = LearningRateScheduler(schedule)

# mixing the old code into GoogleNet
# original sixe of the batch size was 64 but due to the limitation of the GPU memory the batch size is decreased. 
batch_size = 32

# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(
    featurewise_center=False,  # set input mean to 0 over the dataset
    samplewise_center=False,  # set each sample mean to 0
    featurewise_std_normalization=False,  # divide inputs by std of the dataset
    samplewise_std_normalization=False,  # divide each input by its std
    zca_whitening=False,  # apply ZCA whitening
    rotation_range=0,  # randomly rotate images in the range (degrees, 0 to 180)
    width_shift_range=0.2,  # randomly shift images horizontally (fraction of total width)
    height_shift_range=0.2,  # randomly shift images vertically (fraction of total height)
    horizontal_flip=True,  # randomly flip images
    vertical_flip=False, # randomly flip images
    zoom_range=[.8, 1],
    channel_shift_range=30,
    fill_mode='reflect')

# this is the augmentation configuration we will use for testing:
# only rescaling
test_datagen = ImageDataGenerator()

# this is a generator that will read pictures found in
# subfolers of 'data/train', and indefinitely generate
# batches of augmented image data
train_generator = train_datagen.flow_from_directory(
    '../../data/raw/food-101/smallersample/train/',  # this is the target directory
    target_size=(299, 299),  # all images will be resized to 299x299
    batch_size=batch_size,
    seed=42,
    class_mode='categorical')  

# this is a similar generator, for validation data
validation_generator = test_datagen.flow_from_directory(
    '../../data/raw/food-101/smallersample/test/',
    target_size=(299, 299),
    batch_size=batch_size,
    seed=42,
    class_mode='categorical')

# model.fit_generator(
#     train_generator,
#     validation_data=validation_generator,
#     validation_steps=25250 // batch_size,
#     steps_per_epoch=75750 // batch_size,
#     epochs=32,
#     callbacks=[lr_scheduler, csv_logger, checkpointer])

This code works fine but is hard to train on the GPUs I have i.e 1070M due to heating issues and 980Ti due to the small memory size. Let's try to run the code on Goggle colab or any other platform like AWS Sagemaker.  

<h4>Model Evaluation</h4>

After training the code on Google colab. We have weights of the trained model. Let's load the trained model

In [None]:
model = load_model('../../model/model4.08-0.67.hdf5')

We can now make prediction for an image, we need to know what is the id of the images. Going by the assumption that model trained on colab used the same ids.

In [None]:
train_generator.class_indices

In [None]:
from IPython.display import Image as Images, display

display(Images(filename="../../data/raw/food-101/images/fried_rice/260614.jpg"))

Let's take the above image of fried rice and make prediction. As can be seen from cell above, the predicted id should be 44. But before that we need to convert the image data such that it could be understood by th model.

In [None]:
import matplotlib.image as img
import numpy as np
from scipy.misc import imresize

# to check if the image is of shape 299 x 299. if not then resize to this shape. 
min_side=299
img_arr = img.imread("../../data/raw/food-101/images/fried_rice/260614.jpg")
img_arr_rs = img_arr

try:
    w, h, _ = img_arr.shape
    if w < min_side:
        wpercent = (min_side/float(w))
        hsize = int((float(h)*float(wpercent)))
        #print('new dims:', min_side, hsize)
        img_arr_rs = imresize(img_arr, (min_side, hsize))
        resize_count += 1
    elif h < min_side:
        hpercent = (min_side/float(h))
        wsize = int((float(w)*float(hpercent)))
        #print('new dims:', wsize, min_side)
        img_arr_rs = imresize(img_arr, (wsize, min_side))
        resize_count += 1
except:
    print('Skipping bad image')
    
# cropping the image to be 299 x 299
imageData = center_crop(img_arr_rs, (299, 299))
# changing the shape of the imageData to fit the prediction
imageData = imageData[np.newaxis,:,:,:]

y_pred = model.predict(imageData)
preds = np.argmax(y_pred, axis=1)
preds

Yipppeeee! we got index 44 which is the index of fried rice. So it worked.

We also want to evaluate the test set using multiple crops. This is expected to raise the accuracy by 5% compared to single crop evaluation. It is common to use the following crops: Upper Left, Upper Right, Lower Left, Lower Right, Center. We also take the same crops on the image flipped left to right, creating a total of 10 crops. 

In [None]:
def center_crop(x, center_crop_size, **kwargs):
    centerw, centerh = x.shape[0]//2, x.shape[1]//2
    halfw, halfh = center_crop_size[0]//2, center_crop_size[1]//2
    return x[centerw-halfw:centerw+halfw+1,centerh-halfh:centerh+halfh+1, :]

In [None]:
def predict_10_crop(img, top_n=5, plot=False, preprocess=True, debug=False):
    flipped_X = np.fliplr(img)
    crops = [
        img[:299,:299, :], # Upper Left
        img[:299, img.shape[1]-299:, :], # Upper Right
        img[img.shape[0]-299:, :299, :], # Lower Left
        img[img.shape[0]-299:, img.shape[1]-299:, :], # Lower Right
        center_crop(img, (299, 299)),
        
        flipped_X[:299,:299, :],
        flipped_X[:299, flipped_X.shape[1]-299:, :],
        flipped_X[flipped_X.shape[0]-299:, :299, :],
        flipped_X[flipped_X.shape[0]-299:, flipped_X.shape[1]-299:, :],
        center_crop(flipped_X, (299, 299))
    ]
    if preprocess:
        crops = [preprocess_input(x.astype('float32')) for x in crops]

    if plot:
        fig, ax = plt.subplots(2, 5, figsize=(10, 4))
        ax[0][0].imshow(crops[0])
        ax[0][1].imshow(crops[1])
        ax[0][2].imshow(crops[2])
        ax[0][3].imshow(crops[3])
        ax[0][4].imshow(crops[4])
        ax[1][0].imshow(crops[5])
        ax[1][1].imshow(crops[6])
        ax[1][2].imshow(crops[7])
        ax[1][3].imshow(crops[8])
        ax[1][4].imshow(crops[9])
    
    y_pred = model.predict(np.array(crops))
    preds = np.argmax(y_pred, axis=1)
    top_n_preds= np.argpartition(y_pred, -top_n)[:,-top_n:]
    if debug:
        print('Top-1 Predicted:', preds)
        print('Top-5 Predicted:', top_n_preds)
    return preds, top_n_preds

Let's see if it worked on this pad thai image.

In [None]:
from IPython.display import Image as Images, display

display(Images(filename="../../data/abc.jpg"))

Incoming image could be of any size, we need a function that convert it to appropriate size for training. 

In [None]:
import matplotlib.image as matimg
import numpy as np
from PIL import Image
import PIL

# filePath = '../../data/abc.jpg'
allowedAspectratio = 1.3
minAllowedSize = 299

def imageResize(filePath):
    img = Image.open(filePath)
    resizeSize = 400
    width, height = img.size
    mainImageToggle = True
    
    if(height > width):
        if(width < minAllowedSize):
            wpercent = (resizeSize/float(img.size[0]))
            hsize = int((float(img.size[1])*float(wpercent)))
            img = img.resize((resizeSize, hsize), PIL.Image.ANTIALIAS)
            img.save('../../data/updatedImage.jpg')
            im = Image.open('../../data/updatedImage.jpg')
            width, height = im.size   # Get dimensions
            mainImageToggle = False
        if((height/width) > allowedAspectratio):
            extraHeight = height - (width * 1.3)
            top = extraHeight/2
            bottom = height - extraHeight/2
            if(mainImageToggle):
                img = img.crop((0, top, width, bottom))
                img.save('../../data/updatedImage.jpg')
            else:
                im = im.crop((0, top, width, bottom))
                im.save('../../data/updatedImage.jpg')
        else:
            img.save('../../data/updatedImage.jpg')
    else:
        if(height < minAllowedSize):
            hpercent = (resizeSize/float(img.size[1]))
            wsize = int((float(img.size[0])*float(hpercent)))
            img = img.resize((wsize, resizeSize), PIL.Image.ANTIALIAS)
            img.save('../../data/updatedImage.jpg')
            im = Image.open('../../data/updatedImage.jpg')
            width, height = im.size   # Get dimensions
            mainImageToggle = False
        if((width/height) > allowedAspectratio):
            extraWidth = width - (height * 1.3)
            left = extraWidth/2
            right = width - extraWidth/2
            if(mainImageToggle):
                img = img.crop((left, 0, right, height))
                img.save('../../data/updatedImage.jpg')
            else:
                im = im.crop((left, 0, right, height))
                im.save('../../data/updatedImage.jpg')
        else:
            img.save('../../data/updatedImage.jpg')

Let's make a prediction now and get top-1 and top-5 accuracy for 10 cropped images.  

In [None]:
path = '../../data/raw/food-101/smallersample/newtest/baby_back_ribs/112142.jpg'
imageResize(path)
img_arr = matimg.imread("../../data/updatedImage.jpg")
prediction, topNPrediction = predict_10_crop(img_arr, top_n=5, plot=True, preprocess=False, debug=True)

Nice! We got the predictions. Let's see what is the most common prediction in all 10 crops.

In [None]:
counts = np.bincount(prediction)
mostCommonPrediction = np.argmax(counts)
print(mostCommonPrediction)

labelDictonary = train_generator.class_indices
print(list(labelDictonary.keys())[list(labelDictonary.values()).index(mostCommonPrediction)])

Splitting the data into train and test so that, we can make the performance analysis on test data.

In [None]:
# Helper method to split dataset into train and test folders

from shutil import copy
from collections import defaultdict

def prepare_data(filepath, src, dest):
    classes_images = defaultdict(list)
    with open(filepath, 'r') as txt:
        paths = [read.strip() for read in txt.readlines()]
        for p in paths:
            food = p.split('/')
            classes_images[food[0]].append(food[1] + '.jpg')
            
    for food in classes_images.keys():
        print("\nCopying images into ",food)
        if not os.path.exists(os.path.join(dest,food)):
            os.makedirs(os.path.join(dest,food))
        for i in classes_images[food]:
            copy(os.path.join(src,food,i), os.path.join(dest,food,i))

    print("Copying Done!")

In [None]:
# Prepare test data by copying images from food-101/images to food-101/test using the file test.txt
print("Creating test data...")
prepare_data('../../data/raw/food-101/meta/test.txt', '../../data/raw/food-101/images', '../../data/raw/food-101/test')

Reading the classes from the meta data file available from the source code.

In [None]:
filename = '../../data/raw/food-101/meta/classes.txt'

with open(filename) as f:
    content = f.readlines()
    
# you may also want to remove whitespace characters like `\n` at the end of each line
classList = [x.strip() for x in content] 

Going through each image and making a prediction. Also, as we already know the number of test images, we can setup 2 fixed sized numpy array. One 2D and other one 3D. First one will contain actual class along with guess for all 10 crops and most common value out of all 10 crop guesses. Second array contains the top 5 guess for all 10 crops for all 25250 test images. 

In [None]:
from tqdm import tqdm
import os
from PIL import Image
import matplotlib.image as matimg
import keras.backend as K

path = '../../data/raw/food-101/test/'

topNPred = 5
topPred2D = np.zeros((25250,12))
top5Pred3D = np.zeros((25250,10,topNPred))
imageCount = 0

for r, d, f in tqdm(os.walk(path)):
    for file in f:
        fileName = os.path.join(r, file)
        className = fileName.split('/')[-1]
        className = className.split('\\')[0]
        try:
            imageResize(fileName)
            img_arr = matimg.imread("../../data/updatedImage.jpg")
            prediction, topNPrediction = predict_10_crop(img_arr, top_n=topNPred, plot=False, preprocess=False, debug=False)
            
            K.clear_session()
            # populating the 2D array
            topPred2D[imageCount][0] = classList.index(className) # actual class index from classes.txt file
            topPred2D[imageCount][1:-1] = prediction # prediction for all 10 crops
            counts = np.bincount(prediction) 
            mostCommonPrediction = np.argmax(counts)
            topPred2D[imageCount][-1] = mostCommonPrediction # most common value among all 10 predictions
        
            top5Pred3D[imageCount] = topNPrediction
        except:
            print(fileName)
            
        
        imageCount = imageCount + 1

Let's store the arrays to a hfd5 file under processed folder. hdf5 files reduce the data size considerably and reading or writing to the file is very fast. It will save us lot of time to load the data everytime when we do performance analysis.

In [None]:
import h5py

# Address to store the HDF5 file 
hdf5Path = r'..\..\data\processed\predictions.hdf5'

h5f = h5py.File(hdf5Path, 'w')
h5f.create_dataset('topPred2D', data=topPred2D)
h5f.create_dataset('top5Pred3D', data=top5Pred3D)
h5f.close()