# Implementing Alexnet

For reference: [Alexnet paper](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf)

Alexnet is a convolutional neural network which compteted in the **ImageNet Large Scale Visual Recognition Challenge** in 2012. For those that aren’t familiar, this competition can be thought of as the annual Olympics of computer vision, where teams from across the world compete to see who has the best computer vision model for tasks such as classification, localization, detection, and more. The network achieved a top-5 error of 15.3% (Top 5 error is the rate at which, given an image, the model does not output the correct label with its top 5 predictions), more than 10.8 percentage points ahead of the runner up.     
AlexNet was designed by the SuperVision group, consisting of Alex Krizhevsky, Geoffrey Hinton, and Ilya Sutskever. 

### Structure of the AlexNet: 
The network was made up of 5 conv layers, max-pooling layers, dropout layers, and 3 fully connected layers. The network they designed was used for classification with 1000 possible categories.

Input -> Conv1 -> Pool1 -> Conv2 -> Pool2 -> Conv3 -> Conv4 -> Pool4 -> Conv5 -> Pool5 -> FC1 -> FC2 -> FC3 -> output 


![image](https://www.researchgate.net/profile/Walid_Aly/publication/312188377/figure/fig4/AS:448996423540740@1484060497977/Figure-7-An-illustration-of-the-architecture-of-AlexNet-CNN-14.ppm)

### Details: 
Layer 1:   
Input Image size is – 224 x 224 x 3    
Number of filters – 96    
Filter size – 11 x 11 x 3    
Stride – 4    
Layer 1 Output    
224/4 x 224/4 x 96 = 55 x 55 x 96 (because of stride 4)   
Split across 2 GPUs – So 55 x 55 x 48 for each GPU   

Layer 2 is a Max Pooling Followed by Convolution     
Input – 55 x 55 x 96     
Max pooling – 55/2 x 55/2 x 96 = 27 x 27 x 96     
Number of filters – 256     
Filter size – 5 x 5 x 48     
Layer 2 Output     
27 x 27 x 256      
Split across 2 GPUs – So 27 x 27 x 128 for each GPU     

In [79]:
from __future__ import division, print_function

import os, json
from glob import glob
import numpy as np
from scipy import misc, ndimage
from scipy.ndimage.interpolation import zoom



from keras.utils.data_utils import get_file
from keras import backend as K
from keras.layers.normalization import BatchNormalization
from keras.models import Model
from keras.utils.data_utils import get_file
from keras.models import Sequential
from keras.layers.core import Flatten, Dense, Dropout, Lambda
from keras.layers.convolutional import Conv2D, MaxPooling2D, ZeroPadding2D
from keras.layers.pooling import GlobalAveragePooling2D
from keras.optimizers import SGD, RMSprop, Adam
from keras.preprocessing import image

def alex_preprocess(x):
    return x

class Alexnet():
    """The VGG 16 Imagenet model"""


    def __init__(self):
        self.FILE_PATH = 'http://files.fast.ai/models/'
        self.WEIGHTS_PATH = "datasets/"

        self.create()
#         self.get_classes()


#     def get_classes(self):
#        fname = 'imagenet_class_index.json'
#        fpath = get_file(fname, self.FILE_PATH+fname, cache_subdir='models')
#        with open(fpath) as f:
#            class_dict = json.load(f)
#        self.classes = [class_dict[str(i)][1] for i in range(len(class_dict))]

    def predict(self, imgs, details=False):
        all_preds = self.model.predict(imgs)
        idxs = np.argmax(all_preds, axis=1)
        preds = [all_preds[i, idxs[i]] for i in range(len(idxs))]
        classes = [self.classes[idx] for idx in idxs]
        return np.array(preds), idxs, classes


    def ConvBlock(self, layers, filters, nb_rowcol=3):
        model = self.model
        for i in range(layers):
            model.add(ZeroPadding2D((1, 1)))
            model.add(Conv2D(filters, (nb_rowcol, nb_rowcol), activation='relu',data_format='channels_first'))
        model.add(MaxPooling2D((3, 3), strides=(2, 2)))


    def FCBlock(self):
        model = self.model
        model.add(Dense(4096, activation='relu'))
        model.add(Dropout(0.5))


    def create(self):
        model = self.model = Sequential()
        model.add(Lambda(alex_preprocess, input_shape=(3,227,227)))
        
        self.ConvBlock(1, 96, 11)
        self.ConvBlock(1, 256, 5)
        self.ConvBlock(2, 384, 3)
        self.ConvBlock(1, 256, 3)

        model.add(Flatten())
        self.FCBlock()
        self.FCBlock()
        model.add(Dense(1000, activation='softmax'))

        #fname = "alexnet_weights.h5"
        #model.load_weights(self.WEIGHTS_PATH+fname)
        
        return model 
    
    def get_batches(self, path, gen=image.ImageDataGenerator(), shuffle=True, batch_size=8, class_mode='categorical'):
        return gen.flow_from_directory(path, target_size=(224,224),
                class_mode=class_mode, shuffle=shuffle, batch_size=batch_size)


    def ft(self, num):
        model = self.model
        model.pop()
        for layer in model.layers: layer.trainable=False
        model.add(Dense(num, activation='softmax'))
        self.compile()

    def finetune(self, batches):
        model = self.model
        model.pop()
        for layer in model.layers: layer.trainable=False
        model.add(Dense(batches.nb_class, activation='softmax'))
        self.compile()


    def compile(self, lr=0.001):
        self.model.compile(optimizer=Adam(lr=lr),
                loss='categorical_crossentropy', metrics=['accuracy'])


    def fit_data(self, trn, labels,  val, val_labels,  nb_epoch=1, batch_size=64):
        self.model.fit(trn, labels, nb_epoch=nb_epoch,
                validation_data=(val, val_labels), batch_size=batch_size)


    def fit(self, batches, val_batches, nb_epoch=1):
        self.model.fit_generator(batches, samples_per_epoch=batches.nb_sample, nb_epoch=nb_epoch,
                validation_data=val_batches, nb_val_samples=val_batches.nb_sample)


    def test(self, path, batch_size=8):
        test_batches = self.get_batches(path, shuffle=False, batch_size=batch_size, class_mode=None)
        return test_batches, self.model.predict_generator(test_batches, test_batches.nb_sample)

In [80]:
from cnn_utils import *
# Loading the data (signs)
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

In [81]:
classes

array([0, 1, 2, 3, 4, 5])

In [83]:
alexnet = Alexnet().create()

In [84]:
alexnet.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lambda_34 (Lambda)           (None, 3, 227, 227)       0         
_________________________________________________________________
zero_padding2d_142 (ZeroPadd (None, 5, 229, 227)       0         
_________________________________________________________________
conv2d_141 (Conv2D)          (None, 96, 219, 217)      58176     
_________________________________________________________________
max_pooling2d_109 (MaxPoolin (None, 47, 109, 217)      0         
_________________________________________________________________
zero_padding2d_143 (ZeroPadd (None, 49, 111, 217)      0         
_________________________________________________________________
conv2d_142 (Conv2D)          (None, 256, 107, 213)     313856    
_________________________________________________________________
max_pooling2d_110 (MaxPoolin (None, 127, 53, 213)      0         
__________

In [85]:
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
alexnet.compile(loss='mse',
              optimizer=sgd,
              metrics=['accuracy'])

history = alexnet.fit_generator(X_train_orig,
                        samples_per_epoch=2000,
                        validation_data=X_test_orig,
                        nb_val_samples=800,
                        nb_epoch=80,
                        verbose=1)



ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Main Points

Trained the network on ImageNet data, which contained over 15 million annotated images from a total of over 22,000 categories.
Used ReLU for the nonlinearity functions (Found to decrease training time as ReLUs are several times faster than the conventional tanh function).    
- Used data augmentation techniques that consisted of image translations, horizontal reflections, and patch extractions.
- Implemented dropout layers in order to combat the problem of overfitting to the training data.
- Trained the model using batch stochastic gradient descent, with specific values for momentum and weight decay.
- Trained on two GTX 580 GPUs for five to six days.

Why It’s Important?    
The neural network developed by Krizhevsky, Sutskever, and Hinton in 2012 was the coming out party for CNNs in the computer vision community. This was the first time a model performed so well on a historically difficult ImageNet dataset. Utilizing techniques that are still used today, such as data augmentation and dropout, this paper really illustrated the benefits of CNNs and backed them up with record breaking performance in the competition.