## Image Classifier - From Siraj's Udacity Lecture

Here is my attempt at recreating the image classification model shown in Siraj's video lecture through the "Foundation in Deep Learning" course on Udacity. The original tutorial (that Siraj borrowed from) can be found here: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

The tutorial uses Keras to build the image classifier from just a few hundred pictures in each class. 

Keras is essentially a high-level neural netwroks API, written in python. It is capable of running on top of TensorFlow (or Theano). The goal of developing this API was to focus on enabling fast experimentation (which is awesome). 
Keras provides access to a deep learning library that:
* Allows for easy and fast prototyping
* Supports both convolutional networks and recurrent network, as the combination thereof
* Runs on CPU and GPU

(More about this on [Keras Documentation](http://keras.io))

In this notebook I attempt to understand the following key options:
* training a small network from scratch
* using the bottleneck features on pre trained network
* fine tuning the top laters of pre trained network

And this is accomplished by using the following features from Keras:
* fit_generator for training Keras a model using Python data generators
* ImageDataGEnerator for real-time data augmentation
* layer freezing and model fine tuning


### Data

The data used here is available [here](https://www.kaggle.com/c/dogs-vs-cats/data)

The recommended folder structure is as follows:

```python
data/
    train/
        dogs/ ### 1024 pictures
            dog001.jpg
            dog002.jpg
            ...
        cats/ ### 1024 pictures
            cat001.jpg
            cat002.jpg
            ...
    validation/
        dogs/ ### 416 pictures
            dog001.jpg
            dog002.jpg
            ...
        cats/ ### 416 pictures
            cat001.jpg
            cat002.jpg
            ...
```
Note : for this example we only consider 2x1000 training images and 2x400 testing images among the 2x12500 available.

Note: The kaggle website linked above uses about 25,000 images, I use only  about 1500 images for this model and these can be dounf in the git repo from the tutorial linked above.



### Data Loading

In [1]:
##TensorFlow is the backend for Keras in this notebook
!pip install pillow
!KERAS_BACKEND=tensorflow python -c "from keras import backend"

Using TensorFlow backend.


Note: To ensure the above code works, make sure you go into your terminal (Im using mac) and type up the following:


conda create -n tensorflow python=3.5

source activate tensorflow


conda install pandas matplotlib jupyter notebook scipy scikit-learn


pip install tensorflow


pip install keras

This essentially ensures all the dependancies have been installed and taken care of

#### Next:

In [40]:
import os
import numpy as np
from keras.models import Sequential
from keras.layers import Activation, Dropout, Flatten, Dense
from keras.preprocessing.image import ImageDataGenerator
from keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras import optimizers
from keras.preprocessing import image


In [3]:
#dimensions of the images

img_width, img_height = 150, 150

train_data_dir = '/Users/vivek/data/train'
validation_data_dir = '/Users/vivek/data/validation'

### Imports

In [4]:
#used to rescale the pixel values from [0,255] to [0,1] interval
datagen = ImageDataGenerator(rescale=1./255)

#automatically retrive images and their classes for training and validation
train_generator = datagen.flow_from_directory(
        train_data_dir,
        target_size=(img_width,img_height),
        batch_size = 16,
        class_mode = 'binary')

validation_generator = datagen.flow_from_directory(
        train_data_dir,
        target_size=(img_width,img_height),
        batch_size = 32,
        class_mode = 'binary')

Found 2048 images belonging to 2 classes.
Found 2048 images belonging to 2 classes.


##  Small Conv Net

### Model Architecture Definition

In [5]:
#Building the Model

#Sequential function allows us to build a linear stack of layers
#so we treat each layer as an object that feeds data to the next one.

#The alternative is a graph model (not using here), and it would allow
#multiple seperate inputs and outputs. This is a simpler example though
model = Sequential()

#Now we add out first layer, the convolutional layer
#The first layer of a CNN is always a convolutional layer
#The input is a 32x32x3 array of pixel values. 3 refers to RGB
model.add(Convolution2D(32, 3, 3, input_shape=(img_width, img_height,3)))


#Now we pass this feature map through an activtion layer, ReLU
#ReLU is a non linear operation that replaces all the negative pixel 
#values in the feature map with zero. THis layer increases the non linear
#properties of our model, meaning our neural net will be able to learn 
#more complex functions
model.add(Activation('relu'))

#We now initalize our max pooling layer. This reduces the dimensionality
#of each feature map but retains the most imp information
#Reduces the computational complexity of the network too 
#We use Max pooling here which takes the max value from a feature of size
#we define at each slide
model.add(MaxPooling2D(pool_size=(2, 2)))

#A classis CNN looks like so:
#Input->Cov->ReLU->Pool->Conv->ReLU->Pool->FullyConnected

#Following that we have the next layers like so:
model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))


#A technique used to prevent overfitting, the point when a model isn't
#able to predict labels for novel data is Dropout. A dropout layer
#drops out a random set of activations in that layer by setting them to zero
#as data flows through it

#we first flatten the feature map into 1D to prepare for dropout
model.add(Flatten())

#now initalize a fully connected layer with a dense function and 
#apply ReLU to it
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))

#After dropout, we initialize one more fully connected layer
#this will output an n dimensional vector, where n is the number of classes
#we have, so here we will have 2 - Dogs and Cats
#Applying sigmoid to it, will convert the data into probabilities
#for each class 
model.add(Dense(1))
model.add(Activation('sigmoid'))



  del sys.path[0]


An important question at this stage is how does the network learn? 

The key to this is that we'll want to minimize our loss function, which is measures the difference between the target output and the expected output. To accomplish this we take the derivative of the loss, with respect to the weights in each layer.

In [6]:
#The learning process can then be figured using the complie method
#where our loss is defined as binary crossentropy(the preferred
#loss function for binary problems - Cat/Dog type)
#We will then use or optimizer- rmsprop, which will perform the
#gradient descent. And finally a list of metrics, set to accuracy
#as this is a classification problem.

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

### Training

Now we wrte out our fit function to train the model, giving it parameters for the training and validation data and the number of epochs to run for each. We also save our weights, so we can use our trained model later.

In [18]:
nb_epoch = 10
nb_train_samples = 2048
nb_validation_samples = 832

In [20]:
model.fit_generator(
        train_generator,
        samples_per_epoch=nb_train_samples,
        nb_epoch=nb_epoch,
        validation_data=validation_generator,
        nb_val_samples=nb_validation_samples)

  


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1220fe978>

In [22]:
#Next I save the weights so I can load them and use them later if I want
#Note: This will require for you to import h5py
#model.save_weights('models/basic_cnn_20_epochs.h5')

#This can be loaded as follows
##model.load_weights('models_trained/basic_cnn_20_epochs.h5')

In terms of the number of epochs, I first tested it with 1 epoch and then ran it through 10. Ideally, the more epochs you run through , the more accuracy you will get. However I am currently running this on a macbook air and running 30 epochs would take forever. Hence, the brevity.

My value accuracy is still pretty strong sitting at about 99.5%

### Evaluating on validation set

Computing loss and accuracy:

In [24]:
model.evaluate_generator(validation_generator, nb_validation_samples)

[0.01594636875880549, 0.99515474759615385]

As we see we have minimal loss here and a very high accuracy. 