# Image Classification using GoogLeNet Architecture from Scratch

#### In this notebook we are trying to make Object Classification using CNN like [GoogleNet](https://arxiv.org/pdf/1409.4842.pdf).
- We have used **GOOGLE CLOUD Platform** to test and train our model
- We have also used image augmentation to boost the performance of deep networks.
- Due to overfitting of the model and getting less accuracy in the [Dataset](http://vision.stanford.edu/aditya86/ImageNetDogs/) of Dog (Subset of Imagenet Dataset) in our CNN model so we also try to other CNN and run on CIFAR-10 and got 68% accuracy in 50 Epochs

#### Steps to setup GCP(Google Cloud Platform) for Keras and Tensorflow

1. Setup Virtual Machine Instance in GCP using [link](https://cloud.google.com/compute/docs/instances/). Make sure your instance have GPU.
- Follow all the steps in [link](https://medium.com/google-cloud/running-jupyter-notebooks-on-gpu-on-google-cloud-d44f57d22dbd) to setup Anaconda, Tensorflow and Keras with GPU driver.
- To import dataset to VM use ssh to your VM run:
    - `gcloud compute scp ~/localdirectory/ example-instance:~/destinationdirectory`
- Navigate to you jupyter on local browser and make a new notebook.


**Some more library to install to run the notebook:**
- Install PIL : pip install pillow --> Python Imaging Library which adds image processing capabilities to your python interpreter.

- Install tqdm  : pip install tqdm --> tqdm is used to show progress bar
- Install h5py  : pip install h5py --> used to store weights in local

In [1]:
#Import library
import keras
from keras.datasets import cifar10
from keras.layers import Input
from keras.models import Model
from keras.layers import Dense, Dropout, Flatten, Input, AveragePooling2D, merge
from keras.layers import Conv2D, MaxPooling2D, BatchNormalization,GlobalAveragePooling2D
from keras.layers import Concatenate
from keras.optimizers import SGD
from keras.models import model_from_json

#pre-processing Images
from sklearn.datasets import load_files       
from keras.utils import np_utils
import numpy as np
from glob import glob


from keras.preprocessing.image import ImageDataGenerator
#
from PIL import ImageFile                            
ImageFile.LOAD_TRUNCATED_IMAGES = True 

from keras.preprocessing import image                  
from tqdm import tqdm

Using TensorFlow backend.


### Allow the GPU as memory is needed rather than pre-allocate memory
- You can find more details of tensorflow GPU [here](https://www.tensorflow.org/programmers_guide/tensors)

In [2]:
# backend
import tensorflow as tf
from keras import backend as k

# Don't pre-allocate memory; allocate as-needed
config = tf.ConfigProto()
config.gpu_options.allow_growth = True

# Create a session with the above options specified.
k.tensorflow_backend.set_session(tf.Session(config=config))

In [3]:
# function to load dataset
def load_dataset(path):
    #load files from path
    data = load_files(path)
    #takes the filename and put in array
    dog_files = np.array(data['filenames'])
    #one hot encoding
    dog_targets = np_utils.to_categorical(np.array(data['target']), 133)
    return dog_files, dog_targets


In [4]:
train_files, train_targets = load_dataset('Dog/train')
valid_files, valid_targets = load_dataset('Dog/valid')
test_files, test_targets = load_dataset('Dog/test')

In [5]:
# Just getting first 5 dogs breads
dog_names = [item.split('.')[1].rstrip('\/') for item in sorted(glob("Dog/train/*/"))]
dog_names[:5]

['Affenpinscher',
 'Afghan_hound',
 'Airedale_terrier',
 'Akita',
 'Alaskan_malamute']

In [6]:

print('There are %d total dog categories.' % len(dog_names))
print('There are %s total dog images.\n' % len(np.hstack([train_files, valid_files, test_files])))
print('There are %d training dog images.' % len(train_files))
print('There are %d validation dog images.' % len(valid_files))
print('There are %d test dog images.'% len(test_files))

There are 133 total dog categories.
There are 8341 total dog images.

There are 6670 training dog images.
There are 835 validation dog images.
There are 836 test dog images.


##### This dataset is already split into train, validation and test parts. As the traning set consits of 6670 images, there are only 50 dogs per breed on average.

#### PreProcess the Data
- Path_to_tensor is the function that takes the image path, convert into array and return the 4D tensor with shape (1,224,224,3) (batch,height, width, color)
- paths_to_tensor array of image path return the tensor of image in array 

In [7]:

def path_to_tensor(img_path):
    # loads RGB image as PIL.Image.Image type
    img = image.load_img(img_path, target_size=(224, 224))
    # convert PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
    x = image.img_to_array(img)
    # convert 3D tensor to 4D tensor with shape (1, 224, 224, 3) and return 4D tensor
    return np.expand_dims(x, axis=0)

def paths_to_tensor(img_paths):
    list_of_tensors = [path_to_tensor(img_path) for img_path in tqdm(img_paths)]
    return np.vstack(list_of_tensors)


Rescale the images by dividing every pixel in every image by 255.

In [8]:
from PIL import ImageFile                            
ImageFile.LOAD_TRUNCATED_IMAGES = True 

train_tensors = paths_to_tensor(train_files).astype('float32')/255
valid_tensors = paths_to_tensor(valid_files).astype('float32')/255
test_tensors = paths_to_tensor(test_files).astype('float32')/255

100%|██████████| 6670/6670 [01:03<00:00, 104.63it/s]
100%|██████████| 835/835 [00:07<00:00, 119.19it/s]
100%|██████████| 836/836 [00:07<00:00, 116.94it/s]


## Image Augmentation
#### While we have train the CNN found that it was overfitting by huge number where train accuracy was 50% and validation accuracy was only 18% in the 70 epochs. So to reduce the overfitting we try to do image augmentation.
#### This helps prevent overfitting and helps the model generalize better.

In [9]:
 from keras.preprocessing.image import ImageDataGenerator

In [9]:
#this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(
        rescale=1./255, # Rescaling factor
        shear_range=0.2, # Shear angle in counter-clockwise direction in degrees
        zoom_range=0.2, # Range for random zoom
        horizontal_flip=True,# Randomly flip inputs horizontally.
        fill_mode='nearest' #fill_mode is the strategy used for filling in newly created pixels,
        ) 
batch_size=16

In [10]:
# this is the augmentation configuration we will use for testing
# only rescaling
test_datagen = ImageDataGenerator(rescale=1./255)

##### This is a generator that will read pictures found in subfolers of 'dogs/train', and indefinitely generate batches of augmented image data

In [11]:
train_generator = train_datagen.flow_from_directory(
        'Dog/train',  # this is the target directory
        target_size=(224, 224),  # all images will be resized to 224 x 224
        batch_size=batch_size,
        class_mode='categorical') # since we use categorical value

Found 6670 images belonging to 133 classes.


##### This is a similar generator, for validation data

In [12]:
validation_generator = test_datagen.flow_from_directory(
        'Dog/valid',
        target_size=(224, 224),  # all images will be resized to 224 x 224
        batch_size=batch_size,
        class_mode='categorical') 

Found 835 images belonging to 133 classes.


<center> **GoogLeNet Inception Architecture** 

![GoogeLeNet inception Architecture](http://yeephycho.github.io/blog_img/GoogLeNet.JPG)

#### It is generally difficult to decide which architecture will be good for the particular dataset it's most of the time trail and error if you are making CNN from scratch. Pre-train CNN with it's will give more accuracy in less iteration compare to training from scratch because network has to learn from beginning.

**This notebook will only work on tensorflow not with theano** **Theano use channels as first whereas tensorflow uses channels as last**
- Lets start with input tensor which will be: 

In [13]:
input = Input(shape = (224, 224, 3))

In [14]:
## So let's start to make CNN for Our dataset which is Dog's dataset which contains 133 classes with total 8341

- Starting with CNN first layer from the diagram it would be `convolution` with `7 x 7` patch size and `stride` of `(2,2)` with input image of `224 x 244` followed by `BatchNormalization` for faster learning and higher overall accuracy. If want to know about [this](https://medium.com/deeper-learning/glossary-of-deep-learning-batch-normalisation-8266dcd2fa82) blog has good explanation.

In [15]:
x = Conv2D(64,( 7, 7), strides=(2, 2), padding='same',activation='relu')(input)
x = BatchNormalization()(x) #default axis is 3 is you are using theano it would be 1.

- `MaxPooling` with `3 x 3` with strides as `2`

In [16]:
x = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)  

- Next is `convolution` with `3 x 3` with stride 1 it has two convolution 3 x 3 reduce with 64 layers and 3 x 3 192 layers but as our dataset is way small compare to ImageNet we try to reduce the layer from 64 layers to 48 and 192 to 128.

In [24]:
x = Conv2D(48,(1,1),strides=(1,1),padding='same',activation='relu')(x)
x = BatchNormalization()(x)
x = Conv2D(64,(1,1),strides=(1,1),padding='same',activation='relu')(x)
x = BatchNormalization()(x)

### Inception 3a type in the GoogLeNet Architecture
- It is couple steps to complete this layers so making function so we can able to reuse it.
- `1 x 1` 64 convolution layers followed by BatchNormalization
- `3 x 3` 80 convolution layers where input is as out of `1 x 1` convolution layer followed by BatchNormalization
- `5 x 5` 16 convolution layers where input is as out of `1 x 1` convolution layer followed by BatchNormalization
- Last convolution layers is pooling which is 32 convolution layers with `1 x 1`
- Merge ouptut of `1 x 1` , `3 x 3` and `5 x 5` with respect to last axis 


    So while calling function I would call as add_module(input, 64, 80, 16, 32, 32) 

In [17]:
def add_module(input,reduce_1, onex1, threex3, fivex5, pool):
    #print(input.shape)
    
    Conv2D_reduce = Conv2D(reduce_1, (1,1), strides=(2,2), activation='relu', padding='same')(input)
    Conv2D_reduce = BatchNormalization()(Conv2D_reduce)
    #print(Conv2D_reduce.shape)
    
    Conv2D_1_1 = Conv2D(onex1, (1,1), activation='relu', padding='same')(input)
    Conv2D_1_1 = BatchNormalization()(Conv2D_1_1)
    #print(Conv2D_1_1.shape)
    Conv2D_3_3 = Conv2D(threex3, (3,3),strides=(2,2), activation='relu', padding='same')(Conv2D_1_1)
    Conv2D_3_3 = BatchNormalization()(Conv2D_3_3)
    #print(Conv2D_3_3.shape)
    Conv2D_5_5 = Conv2D(fivex5, (5,5),strides=(2,2), activation='relu', padding='same')(Conv2D_1_1)
    Conv2D_5_5 = BatchNormalization()(Conv2D_5_5)
    #print(Conv2D_5_5.shape)
    
    MaxPool2D_3_3 = MaxPooling2D(pool_size=(2,2), strides=(2,2))(input)
    #print(MaxPool2D_3_3.shape)
    Cov2D_Pool = Conv2D(pool, (1,1), activation='relu', padding='same')(MaxPool2D_3_3)
    Cov2D_Pool = BatchNormalization()(Cov2D_Pool)
    #print(Cov2D_Pool.shape)
    
    concat = Concatenate(axis=-1)([Conv2D_reduce,Conv2D_3_3,Conv2D_5_5,Cov2D_Pool])
    #print(concat.shape)
    
    return concat

### Inception 3b 
- It is couple steps to complete this layers.
- `1 x 1` 80 convolution layers followed by BatchNormalization
- `3 x 3` 16 convolution layers where input is as out of `1 x 1` convolution layer followed by BatchNormalization
- `5 x 5` 48 convolution layers where input is as out of `1 x 1` convolution layer followed by BatchNormalization
- Last convolution layers is pooling which is 64 convolution layers with `1 x 1`
- Merge ouptut of `1 x 1` , `3 x 3` and `5 x 5` with respect to last axis

    So while calling function I would call as add_module(input, 48, 80, 16, 48, 64) 
    
And than adding maxpooling with `3 x 3` with strides of 3
### So putting all together
I am not using more complex architecture because that might overfitting model as per my dataset as shown in diagram because imagenet hase 1000 categorical images with each images have more than 1000 images of each category whereas in our case we have small dataset and for that the whole architecture implementation would overfit the model.
#### Final layer I am using activation funtion as softmax with dense of 133 (num_classes).

In [18]:
input = Input(shape=(224,224,3))
x = Conv2D(64,( 7, 7), strides=(2, 2), padding='same',activation='relu')(input)
x = BatchNormalization()(x)
x = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)  
x = Conv2D(48,(1,1),strides=(1,1),padding='same',activation='relu')(x)
x = BatchNormalization()(x)
x = Conv2D(64,(1,1),strides=(1,1),padding='same',activation='relu')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)
x = add_module(x, 64, 80, 16, 32, 32) 
# x = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)
x = add_module(x, 48, 80, 16, 48, 64)
# x = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)
# x = add_module(x)
# --- Last Layer --- 


# Now commes 3 level inception

x = AveragePooling2D((7, 7), strides=(1, 1), padding='valid')(x) 
x = Dropout(0.5)(x)
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='linear')(x)
Output = Dense(133, activation='softmax')(x)

# Lets make the model 

In [19]:
model = Model(inputs= input, outputs = Output)

In [20]:
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_2 (InputLayer)            (None, 224, 224, 3)  0                                            
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 112, 112, 64) 9472        input_2[0][0]                    
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 112, 112, 64) 256         conv2d_2[0][0]                   
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, 56, 56, 64)   0           batch_normalization_2[0][0]      
__________________________________________________________________________________________________
conv2d_3 (

#### Traning Starts
 We have setup VM with two GPU attach to instance so we are going to use parallel model of gpu

In [21]:
from keras.utils import multi_gpu_model

parallel_model = multi_gpu_model(model, gpus=2)

### Optimizers selecting is very important to get the model good accuracy as per the paper I am using SGD optimizers as it gives the best result incase of the imagenet.

In [22]:
parallel_model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy',metrics=["accuracy"])

#### First try to run for 20 epochs it takes ~ 145s for 2 GPUs K-80 tesla with 16 memory and where enresult loss_function for training is 3.8295 and for validation is 4.2274.
#### Run only 20 epochs to check the model is working.
#### Still running on 40 more epochs but got validation_accuracy of 12% and training_accuracy as 20%
#### Reasons for running less epochs to check if model is overfitting or not. 

#### Commented out because I don't want to run my training by mistake in this file because it takes forever and will not able to do anything else in this file.
![First-20-Epochs](https://github.com/vishal6557/ADS/blob/master/Screen%20Shot%202018-04-23%20at%204.01.17%20AM.png?raw=true)

In [33]:
parallel_model.fit_generator(
        train_generator,
        steps_per_epoch=6670 // batch_size,
        epochs=20,
        validation_data=validation_generator,
        validation_steps=835 // batch_size)
model.save_weights('testing_fina1.h5')

#### Save model in Json Format

In [29]:
model_json = model.to_json()
with open("final_ model.json", "w") as json_file:
    json_file.write(model_json)

#### Load model from json with its weight.

In [32]:
from keras.models import model_from_json
json_file = open('final_ model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
loaded_model.load_weights("testing_fina17.h5")

#### Don't forget to compile the model before using it or else it will give error

In [107]:
parallel_model = multi_gpu_model(loaded_model, gpus=2)
loaded_model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy',metrics=["accuracy"])

### Create a generator to get the accuracy and from the model you trained

In [80]:
generator = train_datagen.flow_from_directory(
        'Dog/train',
        target_size=(224, 224),
        batch_size=batch_size,
        class_mode=None,  # this means our generator will only yield batches of data, no labels
        shuffle=False) 

Found 6670 images belonging to 133 classes.


In [108]:
score = loaded_model.evaluate_generator(validation_generator, 800/16, workers=12)

scores = loaded_model.predict_generator(validation_generator, 800/16, workers=12)


In [109]:
correct=0
for i, n in enumerate(validation_generator.filenames):
       if "Affenpinscher" in n and scores[i][0] <= 0.5:
        correct += 1

print("Correct:", correct, " Total: ", len(validation_generator.filenames))
print("Loss: ", score[0], "Accuracy: ", score[1]*100,"%")

Correct: 8  Total:  835
Loss:  3.401902123070753 Accuracy:  16.95965176890109 %


#### This will give you the accuracy of the dog bread we search as `_Affenpinscher_`

# Trying CIFAR-10 for GoogleNet
- As the image size is small in CIFAR-10 i.e. 32x32 we are using one layer as mention in figure.

![googlenet](https://qph.fs.quoracdn.net/main-qimg-1593dbc4944be77ade976bbb8e1dc0b2-c)

In [59]:
# Hyperparameters
batch_size = 128
num_classes = 10
epochs = 50

In [54]:
from keras.datasets import cifar10
# Load CIFAR10 Data
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
img_height, img_width, channel = x_train.shape[1],x_train.shape[2],x_train.shape[3]

# convert to one hot encoing 
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

In [55]:
input = Input(shape=(img_height, img_width, channel,))

Conv2D_1 = Conv2D(64, (3,3), activation='relu', padding='same')(input)
MaxPool2D_1 = MaxPooling2D(pool_size=(2, 2), strides=(2,2))(Conv2D_1)
BatchNorm_1 = BatchNormalization()(MaxPool2D_1)

Module_1 = add_module(BatchNorm_1, 16, 16, 16, 16,16)
Module_1 = add_module(Module_1,16, 16, 16, 16,16)

Output = Flatten()(Module_1)
Output = Dense(num_classes, activation='softmax')(Output)

In [56]:
model = Model(inputs=[input], outputs=[Output])
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_11 (InputLayer)           (None, 32, 32, 3)    0                                            
__________________________________________________________________________________________________
conv2d_79 (Conv2D)              (None, 32, 32, 64)   1792        input_11[0][0]                   
__________________________________________________________________________________________________
max_pooling2d_28 (MaxPooling2D) (None, 16, 16, 64)   0           conv2d_79[0][0]                  
__________________________________________________________________________________________________
batch_normalization_71 (BatchNo (None, 16, 16, 64)   256         max_pooling2d_28[0][0]           
__________________________________________________________________________________________________
conv2d_81 

In [60]:
parallel_model = multi_gpu_model(model, gpus=2)
RMsprop=RMSprop(lr=0.0001, rho=0.9, epsilon=None, decay=0.0)
parallel_model.compile(loss='categorical_crossentropy', optimizer=RMsprop, metrics=['accuracy'])

In [None]:
parallel_model.fit(x_train, y_train, epochs=epochs, validation_data=(x_test, y_test))

Train on 50000 samples, validate on 10000 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
 9952/50000 [====>.........................] - ETA: 35s - loss: 0.7262 - acc: 0.7468

In [None]:
# This I think stop because I keep running my VM and sleep my laptop so it didnot autosave.

In [66]:
scores = model.evaluate(x_test,  y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Accuracy: 69.57%


###### Challenges we face doing this project so might help to others who want to do similar type of Project :

-  Make sure you have good computation power with atleast two GPU.

- Trying to run model in parallel so it takes less time to train and test.

- Start with simple architecture try to run 30-40 echos first and check the model is overfitting or not If it is overfitting than you don't need to break the trainning. 

- Try to take best weight in every epochs I know it kind of tricky in parallel model but you can do this by following the links 

- Use jupyter notebook in background using nohup the documentation for that in this [link](https://hackernoon.com/aws-ec2-part-4-starting-a-jupyter-ipython-notebook-server-on-aws-549d87a55ba9)

- If you to try notebook which contain code of THEANO you can do it in 3 easy steps:
    1. First [install](http://deeplearning.net/software/theano/install.html) theano with GPU.
    
    2. `nano ~/.keras/keras.json ` Change following:
    
           {
                "image_dim_ordering": "th",
                "backend": "theano",
                "image_data_format": "channels_first"
            } 
       
    - Restart the jupyter notebook
- My model was overfitting and I was not able to figureout what it was overfitting so first thing I tried is Image Augmentation , than change the learning rate than change the layers and than so on. It's I guess  **trail and error**  methods and In my case each epochs takes around 160-180s with two K-80 tesla GPU it was time cosuming but good way to learn. Shoud have patience.
    
    

The content of this project itself is licensed under the [Creative Commons Attribution 3.0 United States License](http://creativecommons.org/licenses/by/3.0/us/), and the underlying source code used to format and display that content is licensed under the [MIT LICENSE](https://github.com/vishal6557/ADS/blob/master/LICENSE)

## Citations

<a id='google-net'>
[1] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. "ImageNet Classification with Deep Convolutional Neural Networks." NIPS 2012
<br>

<a id='inception-v1-paper'>
[2] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed,
Dragomir Anguelov, Dumitru Erhan, Andrew Rabinovich.
"Going Deeper with Convolutions." CVPR 2015.
<br>

<a id='vgg-paper'>
[3] Karen Simonyan and Andrew Zisserman. "Very Deep Convolutional Networks for Large-Scale Image Recognition." ICLR 2015
<br>

<a id='resnet-cvpr'>
[4] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." CVPR 2016.
<br>

<a id='resnet-eccv'>
[5] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Identity Mappings in Deep Residual Networks." ECCV 2016.

## References

[1] [GoogleNet in Keras](http://joelouismarino.github.io/blog_posts/blog_googlenet_keras.html) for understanding of GoogleNet Architecture.

[2] [Keras Documentation](https://keras.io/) for how to use Keras

[3] [Convolution Neural Networks for Visual Recognition](http://cs231n.github.io/convolutional-networks/) for understanding how CNN works

[4] [How convolution neural network Works](https://www.youtube.com/watch?v=FmpDIaiMIeA&t=634s)

[5] [Dog breed classification with Keras](http://machinememos.com/python/keras/artificial%20intelligence/machine%20learning/transfer%20learning/dog%20breed/neural%20networks/convolutional%20neural%20network/tensorflow/image%20classification/imagenet/2017/07/11/dog-breed-image-classification.html)

[6] Keras blog [Image Augmentation](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html)

[7] [Image Datagenertor-Methods](https://keras.io/preprocessing/image/#imagedatagenerator-methods)