<img src="https://ucfai.org//course/sp19/conv-nets/banner.jpg">

<div class="col-12">
    <a class="btn btn-success btn-block" href="https://ucfai.org/signup">
        First Attendance? Sign Up!
    </a>
</div>

<div class="col-12">
    <h1> How Computers Can See and Other Ways Machines Can Think </h1>
    <hr>
</div>

<div style="line-height: 2em;">
    <p>by: 
        <strong> Irene Tanner</strong>
        (<a href="https://github.com/irenelt97">@irenelt97</a>)
        <strong> John Muchovej</strong>
        (<a href="https://github.com/ionlights">@ionlights</a>)
     on 2019-02-20</p>
</div>

# Classifying Dog Breeds with a CNN and Transfer Learning
### Ever wondered what type of breed that cute dog is?
Let's use the power of CNN's and transfer learning to find out!
The link to the slide deck is [here](https://docs.google.com/presentation/d/1DmZ5SEkmaMfS6Q-vheb1X2ICZZPvBlWIi_4KP4yfA3U/edit?usp=sharing).

### Download and Extract the Dataset
These commands will download the dataset of dog images to your collab instance/current directory. Once downloaded, the images are then unzipped. A script to download from google drive links is also needed to download pre-trained weights for our model later in the notebook.

In [0]:
!wget https://s3-us-west-1.amazonaws.com/udacity-aind/dog-project/dogImages.zip

In [0]:
!unzip dogImages.zip > /dev/null

In [0]:
!wget https://raw.githubusercontent.com/circulosmeos/gdown.pl/master/gdown.pl

### Examples of our Data
It is very important to see the type of data you are dealing with before anything else. Here is a simple function to load in an image and print it's dimensions. Let's see a few different images from our training set.

In [0]:
import matplotlib.image as mpimg                
import matplotlib.pyplot as plt                        
%matplotlib inline

def show_img(path):
    img = mpimg.imread(path)
    print("Image Shape: ", img.shape)
    plt.imshow(img)
    plt.grid(False)
    plt.show()

show_img("dogImages/train/003.Airedale_terrier/Airedale_terrier_00164.jpg")
show_img("dogImages/train/027.Bloodhound/Bloodhound_01904.jpg")
show_img("dogImages/train/048.Chihuahua/Chihuahua_03439.jpg")

### What do you notice about each of these images?
The shape of each image is w.r.t. (width, height, num_channels). Number of channels in this case is three, one channel for Red, Green, and Blue.

It seems though that each of these images has a different resolution! That is an issue, as the network can only take in a fixed image size as input. When the images are loaded in, they will need to be resized according to an input dimension that we can set. This is easy to do, but make sure to read up on how resizing works for each library. You usually want to preserve the aspect ratio so you don't stretch or distort the image.

### Includes

In [0]:
# for one-hot encoding labels
from keras.utils import np_utils
# for loading in file paths and labels
from sklearn.datasets import load_files  
import numpy as np
from glob import glob

### Load in pretrained InceptionV3 network
For this project, we will use a pretrained model called InceptionV3, trained on [imagenet](http://www.image-net.org/) dataset. Imagenet is a huge collection of small images with over 1000 different classes to predict. This gives the advantage of using a model that already has some learned knowledge, then **fine tuning** so it learns the features of our dataset. Check out the paper on InceptionV3 at the end of this notebook. This is especially good in our case, as Imagenet contains a variety of dogs, so the network has seen dogs before.

After the workshop, try using other pretrained models that come with Keras. You can check them out [here](https://keras.io/applications/) (InceptionResNetV2 or VGG16/VGG19 might be good models to try next). Make sure to check what the default image sizes for the network are!

In [0]:
from keras.applications.inception_v3 import InceptionV3
#from keras.applications.resnet50 import ResNet50
#from keras.applications.vgg16 import VGG16
#from keras.applications.inception_resnet_v2 import InceptionResNetV2
#from keras.applications.vgg19 import VGG19

# include_top=False means that we only load in the convolutional layers of the network, not the classifier layers
# replace this with a different model if you want to try another model
inception_model = InceptionV3(weights='imagenet', include_top=False)

###  Define functions to load images
Usually, when dealing with image data, you need a lot of disk space and RAM/VRAM space to load the images, as each image is quite big. Because of this, it would be inefficient to load them in all at once, taking up lots of RAM and VRAM when training. Instead, we can use a **generator** that will load images in batches for us on the fly. Python has its own generator that can be used using the **yield** operator, but that cannot be multithreaded with Keras. Instead, we define a class for our generator that inherits from Keras' Sequence class.

To do so, the functions `__init__()`, `__len__()`, and `__getitem(index)__` must be defined. The len() functions returns how many batches are in the generator, and the getitem() defines how the images are processed before being returned to the generator. Read up more on Keras' generator [here](https://keras.io/utils/).

With this, some helper functions are defined for the generator to use. Note that we want an array of images, so the entire image batch will have a shape of (num_images, w, h, num_channels).

In [0]:
from keras.preprocessing import image as image_processor
from keras.utils import Sequence
# needed because of truncated image error when loading train images:
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

# Loads in a single image from path and resizes it to img_dim
def load_image_from_path(img_path, img_dim):
    # loads RGB image as PIL.Image.Image type
    img = image_processor.load_img(img_path, target_size=img_dim)
    # convert PIL.Image.Image type to 3D tensor with shape (w, h, 3)
    x = image_processor.img_to_array(img)
    # convert 3D tensor to 4D tensor with shape (1, w, h, 3) and return 4D tensor
    # this extra dimension represents the number of images in this array, which in this case is just 1 
    return np.expand_dims(x, axis=0)

# loads in all images in the directory given with img_dim
def load_directory(img_paths, img_dim):
    images = [load_image_from_path(img_path, img_dim) for img_path in img_paths]
    # loads in all images in img_path, then concatenates them along axis 0
    return np.vstack(images)

# loads in paths for all of the data, only with class label for each image
def load_dataset(path):
    # from sklearn: returns filenames with a label 0-133 for each class
    data = load_files(path)
    dog_files = np.array(data['filenames'])
    # one-hot encode labels
    dog_targets = np_utils.to_categorical(np.array(data['target']), 133)
    return dog_files, dog_targets

# defines the sequence generator class for the dog images
class Dog_Sequence_Generator(Sequence):
    def __init__(self, image_directory, img_dim, batch_size):
        # load dog files with labels:
        self.dog_files, self.dog_targets = load_dataset(image_directory)
        
        self.img_dim = img_dim
        self.batch_size = batch_size

    def __len__(self):
        # returns number of batches in this generator
        return int(np.ceil(len(self.dog_files) / float(self.batch_size)))

    def __getitem__(self, idx):
        #get image paths for current batch
        batch_images = self.dog_files[idx * self.batch_size : (idx + 1) * self.batch_size]
        #load images into memory
        batch_images = load_directory(batch_images, img_dim)
        #get image targets
        batch_targets = self.dog_targets[idx * self.batch_size : (idx + 1) * self.batch_size]

        return batch_images, batch_targets

### Create generators for the data
The default image size of InceptionV3 is (299, 299, 3) for imagenet pretrained weights. The batch size is also defined, which in this case I have chosen 8. If you get OOM (out of memory) errors when training, reduce the batch size.


In [0]:
img_dim = (299, 299)#for inceptionv3
batch_size = 8

# generators for each type of data
train_gen = Dog_Sequence_Generator('dogImages/train', img_dim, batch_size)
valid_gen = Dog_Sequence_Generator('dogImages/valid', img_dim, batch_size)
test_gen = Dog_Sequence_Generator('dogImages/test', img_dim, batch_size)

# get label names
dog_names = [item[20:-1] for item in sorted(glob("dogImages/train/*/"))]

print("Number of images: training: {}, validation: {}, testing: {}.\n".format(len(glob("dogImages/train/*/*")), len(glob("dogImages/valid/*/*")), 
                                                                                len(glob("dogImages/test/*/*"))))
print("Number of batches: training: {}, validation: {}, testing: {}.".format(len(train_gen), len(valid_gen), len(test_gen)))

### Model:
Now it is time to define the model. We already created the first part of our model, but it does not have a classifier for the images. Here we will add the last few convolutional/pooling layers, and our dense layers for classification.

First off, for Keras, [Convolutional2D](https://keras.io/layers/convolutional/) layers can be created as:
```python
Conv2D(filters, kernel_size, strides=(1, 1), padding='valid', activation=None)
```
With networks dealing with image data, it is a good idea to use ReLu or LeakyReLu activation functions for all of your layers. These activation functions have been shown to get better results. Read more about them [here](https://cs231n.github.io/neural-networks-1/#actfun).

Padding determines whether to "pad" the edges of images with zeros when the filter goes past the image while performing the convolution. Usually, filters don't perfectly fit in an image. So, with padding as "valid", the images are not padded with zeros and the kernel stops from going past the imaged edge. This can help with reducing noise in data, as those zeros are meaningless when padding. However, the features close to the edges of images are also ignored, so that can cause the network to not learn certain features. Padding with "same" pads edges with zeros.

[Pooling](https://keras.io/layers/pooling/) layers are defined as:
```python
MaxPooling2D(pool_size=(2, 2), strides=None, padding='valid')
GlobalAveragePooling2D()
```
Global Average Pooling averages the features of previous convolutional layers to one dimension, that is stretched out along the channels dimension we seen have before. For example, if the output of the last convolutional layer was (20, 16, 16, 128), then those are averaged down to something like (20, 512). For our purpose, this layer will be used to average the convolutional layers down to an input shape that a dense layer can take in.

Here is something you may of not seen before, and that is Keras' [functional](https://keras.io/getting-started/functional-api-guide/) model API. This provides a bit more flexibility when building models than Sequential does. We need this in order to have our inputs go through the pretrained model created before and output through the classifier network we will build now. That part is taken care of for you here, so we will build the model like we have before with the Sequential API.

In [0]:
from keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D
from keras.layers import Dropout, Flatten, Dense
from keras.models import Sequential, Model

output_model = Sequential()

### Put Model Here ###



######################
out = inception_model.output
out = output_model(out)
out = Dense(133, activation='softmax')(out)

inception_model = Model(inputs=inception_model.input, outputs=out)
output_model.summary()
# inception_model.summary()# uncomment to see inception_model



### Model Compiliation
So for this model, I used RMSprop to train the network. The Adam optimizer you have seen before might give better results, so try that on your own! For the loss function, categorical_crossentropy is used since we are using one-hot encoded labels.

In [0]:
from keras import optimizers as opt 
lr = 1e-3
inception_model.compile(loss='categorical_crossentropy', optimizer=opt.RMSprop(lr=lr), metrics=['accuracy'])

### Training
Since we are using Keras generators for our data, our fit function is now fit_generator. Is still uses all the same parameters as the regular fit function though. We also define the number of works, which is the number of threads to run for processing the data in our generator. In collab, we only have two cores, but up this number if you are running it on your own computer and have the extra threads to spare.

For callbacks, we have the checkpointer callback for our weights, an EarlyStopping callback to stop the model training early when validation loss does not improve for a defined number of epochs, and a reduce learning rate callback, which reduces the learn rate by a specified factor if the validation loss does not improve after a defined number of epochs. This is why epochs is set to 300, because the model will end early once it converges.

This model would take a long time to train in collab, even with the GPU instance. So let's download and load in weights that were already trained on before.

In [0]:
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau

epochs = 300
num_workers = 2

checkpointer = ModelCheckpoint(filepath='weights.best.transfer.inception.test.hdf5', verbose=1, save_best_only=True)
early_stop = EarlyStopping(patience=6, verbose=1)
reduce_lr =  ReduceLROnPlateau(factor=0.1, patience=2, verbose=1, cooldown=1, min_lr=1e-14)
                           
inception_model.fit_generator(train_gen, validation_data=valid_gen,
                              epochs=epochs, use_multiprocessing=True, 
                              callbacks=[checkpointer, early_stop, reduce_lr], workers=num_workers, verbose=1)

### Load best validation loss weights
This will download the weight file for the model we created from google drive.

In [0]:
!chmod +x gdown.pl
!./gdown.pl https://drive.google.com/file/d/1EexGOGyFaqVWshRUaZdXe5y3RFWXQlzr/view weights.inception.soln.hdf5

In [0]:
# comment first line and uncommment second line if loading in your own weights
inception_model.load_weights('weights.inception.soln.hdf5')
#inception_model.load_weights('weights.best.transfer.inception.test.hdf5')

### Test model accuracy
Again, we use evaluate_generator instead of evaluate function since we have a generator.

In [0]:
metrics = inception_model.evaluate_generator(test_gen)

print('Test loss: {0:.4f}, Test accuracy: {1:.2f}%'.format(metrics[0], metrics[1]*100))

### Let's see some dogs!

In [0]:
def classify(image_path):
    display_img = mpimg.imread(image_path)
    print("Hello!")
    plt.imshow(display_img)
    plt.grid(False)
    plt.show()
    input_img = load_image_from_path(image_path, img_dim)
    pred_class = dog_names[np.argmax(inception_model.predict(input_img))]
    print("You are a {} dog!! {}/10".format(pred_class, np.random.randint(low=10, high=15, size=None)))

In [0]:
classify("dogImages/test/008.American_staffordshire_terrier/American_staffordshire_terrier_00538.jpg")

## What's next?
If you have a NVIDIA GPU at home, try training this model on there. It should not take too long on something like a 1060 or above. Use different pre-trained models, classifier networks, and optimizers to get better results. This is extremely important to do if you want to build up an intuition on creating better models.

## References
- [InceptionV3 Paper](https://arxiv.org/abs/1512.00567)
- [Transfer Learning](https://cs231n.github.io/transfer-learning/)