### Pizza Detector model
Ran on AWS EC2 instance using [this AMI](https://github.com/bitfusionio/amis/tree/master/awsmrkt-bfboost-ubuntu14-cuda75-tensorflow)

In [1]:
# image processing imports
from keras.preprocessing import image as image_utils 
from keras.preprocessing.image import ImageDataGenerator

# modeling imports
from keras.models import Sequential
from keras.layers import Convolution2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras.callbacks import ModelCheckpoint

Using TensorFlow backend.


---

## Modeling

In [2]:
# path to images for .flow_from_directory() to pass in images
train_data_dir = '/home/ubuntu/data/pizza_class_data/train/'
validation_data_dir = '/home/ubuntu/data/pizza_class_data/validation'

### Image preprocessing
Augments the images via random transformations so the model generalizes better.

In [57]:
# resize images to these dimensions
img_width, img_height = 150, 150

# augmented image generator for training set
train = ImageDataGenerator(
    rescale=1./255,
    rotation_range=90,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    vertical_flip=True,
    width_shift_range=0.2,
    height_shift_range=0.2)

# augmented image generator for validation set - really only rescaling here
test = ImageDataGenerator(
    rescale=1./255)

### Generate training images
The generators are iterators, they returns batches of image samples when requested. You get batches of images (and their labels) by calling the `.flow_from_directory()` function — which automatically labels the data based on folder structure and processes the images to the proper array format for Keras/Tensorflow.

In [58]:
# generates training images
train_generator = train.flow_from_directory(
        train_data_dir,
        target_size=(img_width, img_height),
        batch_size=32,
        class_mode='binary')

# generates validation images
validation_generator = test.flow_from_directory(
        validation_data_dir,
        target_size=(img_width, img_height),
        batch_size=32,
        class_mode='binary')

Found 3200 images belonging to 2 classes.
Found 800 images belonging to 2 classes.


### Instantiate model & layers
I'll be using four convolutional layers and max-pooling layers plus two fully connected layers. Generally for more complex tasks, you may want more convolutional layers to extract higher and higher level features.

###### Convolution layer
In a convolutional layer a filter moves across the image and the dot product generates a map of where the feature occurs in the image. This is repeated with different filters (features) to create a stack of filtered images = convolutional layer.

**nb_filter:** Number of convolutional kernels (filters) to use = 32
- Rough rule for # of filters is the more complex the task, the more filters (but don't need the same number filters for each convolutional layer)

**filter_length:** convolution filter size (ie n_conv x n_conv) = 3 
- Don't want these too large/small or the resulting matrix might not be very meaningful.  


###### ReLU Activation layer
Normalizes the feature map weights from the convolutional layers — changes any negative values to zero.


###### Pooling layer
In a pooling layer a window is moved across filtered images (in strides) and the max value wins, making the filtered images smaller — this is good for performance and also makes the model less sensitive to position.

**pool_size:** tuple of 2 integers, factors by which to downscale (vertical, horizontal). (2, 2) will halve the image in each dimension.
- Again shouldn't be too large or lose too much info. 
- Pooling layer is max pooling, which can be thought of as a “feature detector”

**strides:** tuple of 2 integers, or None. The stride value is the amount which the pool size moves across the filtered image in the pooling layer.

###### Dropout layer
In the dropout layer, hidden & visible units (# of filters/windows in convolutional/pooling layers) are "dropped". This is essentially a regularization technique for reducing overfitting.

###### (Final) Fully connected layer
Stacking the model layers many times, gets the images more filtered & smaller. The final fully connected layer is a single array of weights which “vote” on what the class will be.

_**note on dim-ordering:**_ "tf" mode means that the images should have shape (samples, width, height, channels), "th" mode means that the images should have shape (samples, channels, width, height). Default will be "tf" in keras config file.


_**note on color:**_ CNN handles 3 channels by transforming the images to YUV color space — which separates out the luminescence (Y component) from the color components (U and V). The luminescence is less important for recognition, since it depends more on the light and less on the object properties, while U and V components are more relevant. The CNN then performs convolution on each of these channels independently in the first convolution layer, and adds the outputs. Then all the color information is encoded and processed by the remaining layers.

In [7]:
#instantiate model
model = Sequential()

# four convolutional & pooling layers
model.add(Convolution2D(32, 3, 3, input_shape=(img_width, img_height, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))


# two fully-connected layers
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))

model.add(Dropout(0.5))
model.add(Dense(1))
# sigmoid activation - good for a binary classification
model.add(Activation('sigmoid'))

### Compile model

Keras will compile the model using whatever backend you have configured (Theano or TensorFlow). Specify the loss function you want to optimize — categorical cross-entropy, which is the standard loss function for multiclass classification because it's well-suited to comparing two probability distributions.

Also specify the particular optimization (how different your predicted distribution is from the actual distribution) method to use, rmsprop —  which adapts the learning rate based on how training is going and improves the training process. 

In [8]:
# configure model's learning process
model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

In [9]:
# checkpointing used to output the model weights each time an improvement is observed during training
filepath="weights.best.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
callbacks_list = [checkpoint]

### Fit model
Fit model on data generated batch-by-batch by data generator. Generator runs in parallel to the model, for efficiency — this allows you to do real-time data augmentation on images in parallel to training your model


In [11]:
pizza_model = model.fit_generator(
    
        train_generator,
        # number of training samples
        samples_per_epoch=3200,
    
        nb_epoch=100,
    
        validation_data=validation_generator,
        # number of training samples
        nb_val_samples=800,
        
        # lets me save the best models weights
        callbacks=callbacks_list
)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

In [None]:
# save model to JSON
pizza_model_json = model.to_json()
with open("pizza_model.json", "w") as json_file:
    json_file.write(pizza_model_json)

In [16]:
# save weights to HDF5
model.save_weights("pizza_model.h5")

### History
History object is a record of training loss values and metrics values at successive epochs, as well as validation loss values and validation metrics values. It is not part of the model object so you have to pickle it (or use some other method) to use it elsewhere.

In [29]:
# pickle history object

history = pizza_model.history

file_Name = "model_history"
fileObject = open(file_Name,'wb') 

pickle.dump(history,fileObject)   

fileObject.close()