<a href="https://colab.research.google.com/github/riblidezso/wigner_dl_demo/blob/master/imagenet_inference.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using an Imagenet trained model

----


Here we will show how to load a model trained on the 1.2 million images in ILSVRC, and use this model to make predictions on new images.


---

In [1]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib


In [2]:
import keras
from keras.models import Model
from keras.layers import Conv2D, MaxPooling2D, Input
from keras.layers import Dense, Dropout, Flatten
from keras.applications.vgg16 import decode_predictions

from PIL import Image

Using TensorFlow backend.


## Define the  Vgg16 model


- 2nd place in ILSVRC 2014
- the best single model of the competition ( more tricks in the winner )
- [arxiv paper](https://arxiv.org/abs/1409.1556)

A few architectrural changes compared to LeNet.

* ReLU non-linearity instead of tanh or sigmoid
* move to 3x3 conv ( and 2 convolutions per blocks instead of 1 ) ('deeper')
* larger images -> repeat blocks multiple times to achieve large FOV for last conv untis  ('deeper')
* richer/more data: more filters ('wider' model)
* And a regularization layer: Dropout ([link to orignal paper](www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf))
    * during training randomly knock out a fraction of neurons (0 output)
    * during testing switch all on ( multiply outputs with the dropout probabilty )
    * the results is something like an 'ensemble' of slightly different networks
    * it's popularity has declined but still used in the best "inception" networks 


Note the Keras functional API. Here you can find a  [guide](https://keras.io/getting-started/functional-api-guide/) 

In [4]:
def VGG16():
    """
    Return a vgg16 model.
    
    Keras has a built in vgg16 model which omits Dropouts.
    I don't want to omit the dropouts as they are part of
    the original vgg16 model. therefore I have to define the
    vgg16 model myself.
    """
    img_input = Input(shape=(224,224,3),name='input')

    # Block 1
    x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(img_input)
    x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)

    # Block 2
    x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1')(x)
    x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)

    # Block 3
    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1')(x)
    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2')(x)
    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)

    # Block 4
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)

    # Block 5
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x)

    # Classification block
    x = Flatten(name='flatten')(x)
    x = Dropout(0.5,name='Dropout1')(x)
    x = Dense(4096, activation='relu', name='fc1')(x)
    x = Dropout(0.5,name='Dropout2')(x)
    x = Dense(4096, activation='relu', name='fc2')(x)
    x = Dense(1000, activation='softmax', name='predictions')(x)

    vgg16 = Model(inputs=img_input, outputs=x)
    
    return vgg16

[Download model parameters from Keras official](https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5)

It was published originally by the authors of the model in caffe, and converted to Keras.

In [0]:
!wget https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5

In [0]:
vgg16 = VGG16()  # initialize model
# load weights
vgg16.load_weights('vgg16_weights_tf_dim_ordering_tf_kernels.h5')
vgg16.summary()  # just check it

## Define a prediction wrapper


In [5]:
def predict1(model,img):
    """Predict arbitrary image."""
    # resize im to input size
    # aspect ratios are not respected here...
    # works well with instagram
    img = img.resize((224,224))  
    
    # turn to numpy array from PIL Image
    im = array(img)
    
    # watch out images were trained as BGR (opencv!)
    # PIL load images to rgb!!!
    im = im[...,[2,1,0]]
    
    # scale them as they did during training
    im = im.astype('float64')  # to float!
    im[:,:,0] -= 103.939  # color mean in training set
    im[:,:,1] -= 116.779  # color mean in training set
    im[:,:,2] -= 123.68  # color mean in training set
    
    im = im.reshape(-1,224,224,3)  # resize for 'batch'
    preds = model.predict(im)   # predict
    
    return preds

# See some examples

- Watch out with google search images, that's exactly how the imagenet dataset was generated :)
- Uploaded today to instagram  (credit to "whogivesafuck"), surely not in imagenet training

In [0]:
!wget https://raw.githubusercontent.com/riblidezso/wigner_dl_demo/master/k.jpg
img = Image.open('k.jpg')
print(img.size)
img

In [0]:
preds = predict1(vgg16,img)
decode_predictions(preds)

Well done!

# Another 

- credit to "little_reader93"

In [0]:
!wget https://raw.githubusercontent.com/riblidezso/wigner_dl_demo/master/b.jpg
img = Image.open('b.jpg')
img

In [0]:
preds = predict1(vgg16,img)
decode_predictions(preds)

Also

### What if the object is small?

* credit to madam_pusteblume

In [0]:
!wget https://raw.githubusercontent.com/riblidezso/wigner_dl_demo/master/g.jpg
img = Image.open('g.jpg')
img

In [0]:
img.resize((224,224))

No good answers in top 5 :(

In [0]:
preds = predict1(vgg16,img)
decode_predictions(preds)

In [0]:
decode_predictions(preds,top=20)

 
## Multi crop predictions!

Let's cut the image into crops and predict on each 'crop', and average the predictions.

In [0]:
def predict_mc(model,img, stride=100):
    """Multi crop predict image."""
    # turn to numpy array from PIL Image
    im = array(img)
    
    # watch out images were trained as BGR (opencv!)
    # PIL load images to rgb!!!
    im = im[...,[2,1,0]]
    
    # scale them as they did during training
    im = array(img).astype('float64')
    im[:,:,0] -= 103.939
    im[:,:,1] -= 116.779
    im[:,:,2] -= 123.68
    
    # crops
    ims = []
    for yi in range(0,im.shape[0]-224,stride):
        for xi in range(0,im.shape[1]-224,stride):
            ims.append( im[yi:yi+224,xi:xi+224,:] )
    
    preds = model.predict(array(ims)) 
    return preds

In [0]:
%%time
preds = predict_mc(vgg16,img, stride=10)

Getting better! There is a similar dog there.

In [0]:
decode_predictions(array([preds.mean(axis=0)]))

## What if the input is not square?


Option 1: you might just resize it
- People try to avoid non isotropic resizing, altough that works reasonably well too!

In [0]:
!wget https://raw.githubusercontent.com/riblidezso/wigner_dl_demo/master/c.jpg
img = Image.open('c.jpg')
img

In [0]:
img.resize((224,224))

Actually works! :) (Pembroke and Cardigan are corgi types!)

In [0]:
preds = predict1(vgg16,img)
decode_predictions(preds)

### Multi crop could also handle this!

In [0]:
%%time
preds = predict_mc(vgg16,img,stride=100)

Corgi order changed, idk :)

In [0]:
decode_predictions(array([preds.mean(axis=0)]))

## Do they work upside down?



In [0]:
im = array(img)
im = flipud(im)
img = Image.fromarray(im)
img

In [0]:
preds = predict1(vgg16,img)
decode_predictions(preds)

Less well!

(Note the lower confidence too!)

## Do they work left right flipped?




In [0]:
img = Image.open('c.jpg')
im = array(img)
im = fliplr(im)
img = Image.fromarray(im)
img

In [0]:
preds = predict1(vgg16,img)
decode_predictions(preds)

Of course they do :) (they were trained to)

### Further tricks:

- Adding horizontal flips
- Multi scale evaluation

## You might wonder, why use 224x224 images?

---


How large an object needs to be to be recognised? Do you really need 24MP images?

224x224 is probably enough for anything if it fills the image. 

ILSRVC images usually contain pretty large objects which almost fill the image.

---


( Images in ILSVRC have characteristic size of 300-600 pixels. )

And they don't really use 224x224:

During training they first resize to a larger size and crop smaller squares from the larger image. (Resize to smaller size = 256, and use 224x224 crops on it.)
And that's how you test too.

And actually you can use larger. 299 (inception), 450 (darknet19), 512 (Baidu).


---

All in all imagenet trained models are good at recognizing objects which are around 100-200 pixel in size.

Try to predict images where the object size is as large as possible, but no larger than 200x200.

---