# Using a pretrained Imagenet network to predict images into one of the 1000 Imagenet classes

In this notebook you will learn how load a pretrained Imagenet network. You will see how you can download iamges from the web and resize it to the corresponding input size of the network. In addition, you will use the orginal preprocessing of the VGG16 network. Then you will classify some images into one of 1000 classes. Fist you will have a look at rather clear and obvious examples for a dog (affenpinscher) and an elephant (tusker) and then a quite unusual image of an elephant inside a building from the Smithsonian Museum of Natural History in Washington DC. The pretrained network never saw an elephant in that way because it was not part of the Imagenet training dataset. Will the VGG16 network be able to predict the unusual image into the correct class?



**Content:**
* Load the pretrained VGG16 network that was trained on the 1000 classes of Imagenet
* Download and resize images from urls
* Define a function to apply the original preprocessing that the VGG team used when they trained the network
* Define a function to undo the original preprocessing (to be able to plot the image afterwards, if necessary)
* Predict the two clear examples of a dog and an elephant and decode the predictions into the corresponding lables
* Predict the unusual image of the elephant in the museum and decode the predictions into the corresponding lables



In [None]:
# !pip install tensorflow==2.1.0

#### Imports

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from urllib.request import urlopen
from PIL import Image

%matplotlib inline
plt.style.use('default')

print("TF  Version",tf.__version__)


### Loading the pretrained VGG16 network, trained on the large Imagetnet dataset 
In the next cells you download the pretrained VGG16 network, you specify inclide_top = True, because you will use the network for classification and not for feature extraction. The weights = "imagenet" means that you want to use the pretrained weights and not random weights. When you print the model summary, you can see that the network input size is 224x224x3, so you will need to resize the images into that size. The output is 1000 which corresponds to the probabilities for the 1000 classes, in between we have convolution, maxpooling and dense layers.


In [None]:
# The pretrained VGG16 network need quite some memory, 
# make sure you have enough memory allocated for docker if you are running this notebook locally

model_vgg=tf.keras.applications.vgg16.VGG16(include_top=True, weights='imagenet')

In [None]:
model_vgg.summary()

In the next cell you define two function to preprocess the input image and to undo the preprocessing. The preprocessing is very simple, it is just substracting the mean value of every channel, the mean values for the channels are calculated on the Imagenet training dataset. Note that we first need to shift the channels around because the VGG team used the BGR and not the RGB format.

In [None]:
def preprocess_input(img):
    x=np.zeros((224,224,3),dtype="float32")
    x[:,:,0]=img[:,:,2]
    x[:,:,1]=img[:,:,1]
    x[:,:,2]=img[:,:,0]
    mean = [103.939, 116.779, 123.68]
    x[:,:, 0] = x[:,:, 0]-mean[0]
    x[:,:, 1] = x[:,:, 1]-mean[1]
    x[:,:, 2] = x[:,:, 2]-mean[2]
    return x 

def undo_preprocess_input(img):
    mean = [103.939, 116.779, 123.68]
    img[:,:, 0] = img[:,:, 0]+mean[0]
    img[:,:, 1] = img[:,:, 1]+mean[1]
    img[:,:, 2] = img[:,:, 2]+mean[2]
    x=np.zeros((224,224,3),dtype="float32")
    x[:,:,0]=img[:,:,2]
    x[:,:,1]=img[:,:,1]
    x[:,:,2]=img[:,:,0]
    return x 

## Loading two clear images of a dog (affenpinscher) and an elephant (tusker)

In the next few cells you will download two images from urls and resize them to the input size of the pretrained VGG16 model which is 224x244x3. You plot them before and after the resizing. As you can see the images are very clear and there should be no problem to classify them into the correct label.



In [None]:
img1 = (Image.open(urlopen("https://raw.githubusercontent.com/tensorchiefs/dl_book/master/imgs/Affenpinscher_Molly.jpg")))
plt.imshow(img1)
plt.show()
new_width  = 224
new_height = 224
img1 = img1.resize((new_width, new_height), Image.ANTIALIAS)
plt.imshow(img1)
plt.show()
img1=np.array(img1)

In [None]:
img2 = (Image.open(urlopen("https://raw.githubusercontent.com/tensorchiefs/dl_book/master/imgs/African_Elephant.jpg")))
plt.imshow(img2)
plt.show()
new_width  = 224
new_height = 224
img2 = img2.resize((new_width, new_height), Image.ANTIALIAS)
plt.imshow(img2)
plt.show()
img2=np.array(img2)

In [None]:
plt.figure(figsize=(10,10))
plt.subplot(1,2,1)
plt.imshow(img1)
plt.subplot(1,2,2)
plt.imshow(img2)


Now that the images are in the right size, let's use the network to predict the label. Don't forget to preprocess the input image before the prediction.

In [None]:
img1=preprocess_input(img1)
print(img1.shape)
img2=preprocess_input(img2)
print(img2.shape)

In [None]:
pred1=model_vgg.predict(np.expand_dims(img1,axis=0))
tf.keras.applications.vgg16.decode_predictions(pred1)

In [None]:
pred2=model_vgg.predict(np.expand_dims(img2,axis=0))
tf.keras.applications.vgg16.decode_predictions(pred2)

As you can see the network has no problem to predict the correct label, affenpinscher and tusker are there with a high probability.

## Loading and predicting "the elephant in the room" 

Let's see if the VGG16 network is also able to predict an image that was not part of the training dataset, in this case an elephant inside a museum. Note that there are a lot of other objects in the image and the lighting is also not very good. Let's load, resize, preprocess and predict the image.



In [None]:
### Image by mana5280 on Unsplash, Smithsonian Museum of Natural History, Washington DC

img = (Image.open(urlopen("https://raw.githubusercontent.com/tensorchiefs/dl_book/master/imgs/mana5280-o69yU0jE0Nk-unsplash.jpg")))
plt.imshow(img)
plt.show()
new_width  = 224
new_height = 224
img = img.resize((new_width, new_height), Image.ANTIALIAS)
plt.imshow(img)
plt.show()
img=np.array(img)

In [None]:
img=preprocess_input(img)
print(img.shape)

In [None]:
pred=model_vgg.predict(np.expand_dims(img,axis=0))
tf.keras.applications.vgg16.decode_predictions(pred)

**You can see that the VGG16 network is not able to predict the elephant in the room (the top prediction is horse cart), even though as a human you have no problem at all to see the elephant! The problem is that this is a quite unusual image and in the Imagenet training dataset there were no elephants inside. They used "normal" images of elephants in free wilderness.**

**This is a principle weakness of deep learning and machine learning in general. We, as humans, obviously learn differently. No child in the world would not see the elephant in this image once she learned what an elephant looks like.**


#### Optional Exercise:
Read in your own image of an animal in a normal or unusual environment and check the predictions.  
Can you find an other "elephant in the room"?

In [None]:
from tensorflow import keras
from IPython.display import Image, display
import matplotlib.pyplot as plt
import matplotlib.cm as cm

#model_vgg = tf.keras.applications.vgg16.VGG16(include_top=True, weights='imagenet')
img_size = (224, 224)
preprocess_input = keras.applications.vgg16.preprocess_input
decode_predictions = keras.applications.vgg16.decode_predictions
last_conv_layer_name = "block5_conv3"
img_path = keras.utils.get_file("tusker.jpg", "https://raw.githubusercontent.com/tensorchiefs/dl_book/master/imgs/mana5280-o69yU0jE0Nk-unsplash.jpg")

In [None]:
def get_img_array(img_path, size):
    img = keras.preprocessing.image.load_img(img_path, target_size=size)    # `img` is a PIL image 
    array = keras.preprocessing.image.img_to_array(img)    # `array` is a float32 Numpy array of shape (224, 224, 3)
    array = np.expand_dims(array, axis=0)    # We add a dimension to transform our array into a "batch" of size (1, 224, 224, 3)
    return array

def make_gradcam_heatmap(img_array, model, last_conv_layer_name, pred_index=None):
    # First, we create a model that maps the input image to the activations of the last conv layer as well as the output predictions
    grad_model = tf.keras.models.Model([model.inputs], [model.get_layer(last_conv_layer_name).output, model.output])
    # Then, we compute the gradient of the top predicted class for our input image with respect to the activations of the last conv layer
    with tf.GradientTape() as tape:
        last_conv_layer_output, preds = grad_model(img_array)
        if pred_index is None:
            pred_index = tf.argmax(preds[0])
        class_channel = preds[:, pred_index]
    # This is the gradient of the output neuron (top predicted or chosen) with regard to the output feature map of the last conv layer
    grads = tape.gradient(class_channel, last_conv_layer_output)
    # This is a vector where each entry is the mean intensity of the gradient over a specific feature map channel
    pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))
    # We multiply each channel in the feature map array by "how important this channel is" with regard to the top predicted class
    last_conv_layer_output = last_conv_layer_output[0]
    heatmap = last_conv_layer_output @ pooled_grads[..., tf.newaxis]
    heatmap = tf.squeeze(heatmap)     # then sum all the channels to obtain the heatmap class activation
    heatmap = tf.maximum(heatmap, 0) / tf.math.reduce_max(heatmap)     # For visualization purpose, we will also normalize the heatmap between 0 & 1
    return heatmap.numpy()

In [None]:
#img_array = preprocess_input(get_img_array(img_path, size=img_size)) # Prepare image
img_array = preprocess_input(img).reshape(1,224,224,3)
model_vgg.layers[-1].activation = None # Remove last layer's softmax
preds = model_vgg.predict(img_array) # Print what the top predicted class is
print("Predicted:", decode_predictions(preds, top=5)[0]) 
heatmap = make_gradcam_heatmap(img_array, model_vgg, last_conv_layer_name)# Generate class activation heatmap

In [None]:
def save_and_display_gradcam(img_path, heatmap, cam_path="cam.jpg", alpha=0.4):
    img = keras.preprocessing.image.load_img(img_path)    # Load the original image
    img = keras.preprocessing.image.img_to_array(img)
    heatmap = np.uint8(255 * heatmap)    # Rescale heatmap to a range 0-255
    jet = cm.get_cmap("jet")    # Use jet colormap to colorize heatmap
    jet_colors = jet(np.arange(256))[:, :3]    # Use RGB values of the colormap
    jet_heatmap = jet_colors[heatmap]    # Create an image with RGB colorized heatmap
    jet_heatmap = keras.preprocessing.image.array_to_img(jet_heatmap)
    jet_heatmap = jet_heatmap.resize((img.shape[1], img.shape[0]))
    jet_heatmap = keras.preprocessing.image.img_to_array(jet_heatmap)
    superimposed_img = jet_heatmap * alpha + img    # Superimpose the heatmap on original image
    superimposed_img = keras.preprocessing.image.array_to_img(superimposed_img)
    superimposed_img.save(cam_path)    # Save the superimposed image
    display(Image(cam_path))    # Display Grad CAM

In [None]:
heatmap = make_gradcam_heatmap(img_array, model_vgg, last_conv_layer_name, pred_index=603) # horsecart
save_and_display_gradcam(img_path, heatmap)

In [None]:
heatmap = make_gradcam_heatmap(img_array, model_vgg, last_conv_layer_name, pred_index=698) # index for palace
save_and_display_gradcam(img_path, heatmap)

In [None]:
heatmap = make_gradcam_heatmap(img_array, model_vgg, last_conv_layer_name, pred_index=101) # index for tusker:101,ind:385, afr:386
save_and_display_gradcam(img_path, heatmap)