# Exercise on the value of unsupervised constructed features for training a classifier with few labeled examples:SOLUTION

To get unsupervised constructed features of an image, we have used a pretrained CNN as feature extractor. For this purpose we pushed each image through a pretrained CNN and extracted the activations in the first fully connected layer. As pretrained CNN we use a VGG16 architecture that was trained on ImageNet data and was the second winner of the ImageNet competition in 2014.

In this manner we have got unsupervised constructed features for 1000 images of the MNIST data set and 1000 images of the CIFAR10 data set. In both data sets we have 10 distinguished classes. The data sets are balanced meaning we have 100 images per class. 

In the last exercise we have used 2D plots to check if the feature quality is good enough to lead to visible clusters corresoonding to the 10 classes in the data. For the MNISt data it is very easy to seperate the 10 digits. Therefore we will only continue with the more challenging CIFAR10 data.

As a second check on the quality of the feature representation of the CIFAR10 data, we will use once the pixel-features and once the VGG-features to train a classifier using only 100 labeled data (on average 10 per class). If the VGG-feature are indeed better, we would expect to achieve a better classifier when using the VGG-feature compared to the pixel feature.

a) Which accuracy would you expect for a classifier which cannot distinguish between the 10 classes and is only guessing?

**Solution: 10%**


b) Go through the code which is used to set-up, train, and evaluate a CNN classifier using the raw pixel features. Discuss your thoughts on the achieved accuracy (e.g. with your neighbor).

**Solution: The accuracy is with ~20% better then guessing but still very bad. However, this is not surprising since the resolution of the images are very low and it is alread by eye quite difficutl to distinguish between the classes. Moreover, we have  only very few training examples (only 10 per class), quite bad features (the raw pixel values) and a model with many parameters (~45k parameter).**

b) Now we use the unsupervised constructed VGG features. We want to check, if these VGG features are good enough to train a classifier with only few labeled data and still get a satisfying performance. For this purpose, please complet the code to set up a fully connected NN and run the provided subsequent code to train it and determine its accuracy on the test set. Compare it to the accuracy which we achieve with a RF. Discuss the results (e.g. with your neighbor).

**Solution: For code completion see below. The accuracy of the fcNN is with more than 55% much better than the accuray of the from scratch trained CNN which was 20%. This implies that the VGG-features are quite good and more informative than the raw pixel features. With the RF we do not achieve a better performance - it looks even worse than the fcNN. This is quite surprising since we would expect the fcNN to overfit the data since it has only 1000 training data and ~800 parameters. A possible reason might be that the features were constructed in the VGG model for a following fcNN classifier.**



### Imports

In [1]:
%matplotlib inline

import matplotlib.pyplot as plt
import matplotlib.image as imgplot
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix
from pylab import *


import time
import tensorflow as tf
tf.set_random_seed(1)

import keras
import sys
print ("Keras {} TF {} Python {}".format(keras.__version__, tf.__version__, sys.version_info))

  from ._conv import register_converters as _register_converters


Keras 2.1.5 TF 1.6.0 Python sys.version_info(major=3, minor=5, micro=2, releaselevel='final', serial=0)


Using TensorFlow backend.


### CIFAR Data preparation

In [2]:
#downlad cifar data
from keras.datasets import cifar10

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
del [x_test,y_test]

In [3]:
#loop over each class label and sample 100 random images over each label and save the idx to subset
np.random.seed(seed=222)
idx=np.empty(0,dtype="int8")
for i in range(0,len(np.unique(y_train))):
    idx=np.append(idx,np.random.choice(np.where((y_train[0:len(y_train)])==i)[0],100,replace=False))

x_train= x_train[idx]
y_train= y_train[idx]

In [4]:
print(x_train.shape)
print(y_train.shape)
print(np.unique(y_train,return_counts=True))

(1000, 32, 32, 3)
(1000, 1)
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8), array([100, 100, 100, 100, 100, 100, 100, 100, 100, 100]))


In [5]:
#make train vaild and test
#loop over each class label and sample 100 random images over each label and save the idx to subset
np.random.seed(seed=123)
idx_train=np.empty(0,dtype="int8")
for i in range(0,len(np.unique(y_train))):
    idx_train=np.append(idx_train,np.random.choice(np.where((y_train[0:len(y_train)])==i)[0],10,replace=False))

x_train_new = x_train[idx_train]
y_train_new = y_train[idx_train]

In [6]:
x_test_new=(np.delete(x_train,idx_train,axis=0))
y_test_new=(np.delete(y_train,idx_train,axis=0))

In [7]:
np.random.seed(seed=127)
idx_vaild=np.empty(0,dtype="int8")
for i in range(0,len(np.unique(y_test_new))):
    idx_vaild=np.append(idx_vaild,np.random.choice(np.where((y_test_new[0:len(y_test_new)])==i)[0],10,replace=False))

x_vaild_new = x_test_new[idx_vaild]
y_valid_new = y_test_new[idx_vaild]

In [8]:
x_test_new=(np.delete(x_test_new,idx_vaild,axis=0))
y_test_new=(np.delete(y_test_new,idx_vaild,axis=0))

In [9]:
x_train_new = np.reshape(x_train_new, (100,32,32,3))
x_vaild_new = np.reshape(x_vaild_new, (100,32,32,3))
x_test_new = np.reshape(x_test_new, (800,32,32,3))

In [10]:
print(np.unique(y_train_new,return_counts=True))
print(np.unique(y_valid_new,return_counts=True))
print(np.unique(y_test_new,return_counts=True))

(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8), array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10]))
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8), array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10]))
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8), array([80, 80, 80, 80, 80, 80, 80, 80, 80, 80]))


In [11]:
from keras.utils.np_utils import to_categorical   

y_train_new=to_categorical(y_train_new,10)
y_valid_new=to_categorical(y_valid_new,10)
y_test_new=to_categorical(y_test_new,10)



In [12]:
print(x_train_new.shape)
print(y_train_new.shape)

print(x_vaild_new.shape)
print(y_valid_new.shape)

print(x_test_new.shape)
print(y_test_new.shape)

(100, 32, 32, 3)
(100, 10)
(100, 32, 32, 3)
(100, 10)
(800, 32, 32, 3)
(800, 10)


In [13]:
# center and standardize the data
X_mean = np.mean( x_train_new, axis = 0)
X_std = np.std( x_train_new, axis = 0)

x_train_new = (x_train_new - X_mean ) / (X_std + 0.0001)
x_vaild_new = (x_vaild_new - X_mean ) / (X_std + 0.0001)
x_test_new = (x_test_new - X_mean ) / (X_std + 0.0001)

### Setting up the the CNN classifier based on raw image data

In [14]:
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout, BatchNormalization
from keras.layers import Convolution2D, MaxPooling2D, Flatten


In [15]:
# here we define  hyperparameter of the NN
batch_size = 10
nb_classes = 10
nb_epoch = 30
img_rows, img_cols = 32, 32
kernel_size = (3, 3)
input_shape = (img_rows, img_cols, 3)
pool_size = (2, 2)

In [16]:
model = Sequential()

model.add(Convolution2D(8,kernel_size,padding='same',input_shape=input_shape))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Convolution2D(8, kernel_size,padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))

model.add(Convolution2D(16, kernel_size,padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))

model.add(Convolution2D(16,kernel_size,padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))

model.add(Flatten())
model.add(Dense(40))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(Activation('relu'))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])


In [17]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 32, 32, 8)         224       
_________________________________________________________________
batch_normalization_1 (Batch (None, 32, 32, 8)         32        
_________________________________________________________________
activation_1 (Activation)    (None, 32, 32, 8)         0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 32, 32, 8)         584       
_________________________________________________________________
batch_normalization_2 (Batch (None, 32, 32, 8)         32        
_________________________________________________________________
activation_2 (Activation)    (None, 32, 32, 8)         0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 16, 16, 8)         0         
__________

In [18]:
history=model.fit(x_train_new, y_train_new, 
                  batch_size=10, 
                  epochs=30,
                  verbose=2, 
                  validation_data=(x_vaild_new, y_valid_new),shuffle=True)

Train on 100 samples, validate on 100 samples
Epoch 1/30
 - 2s - loss: 2.6003 - acc: 0.0800 - val_loss: 2.7379 - val_acc: 0.1300
Epoch 2/30
 - 1s - loss: 1.9207 - acc: 0.3100 - val_loss: 2.5829 - val_acc: 0.1400
Epoch 3/30
 - 1s - loss: 1.6643 - acc: 0.5000 - val_loss: 2.4480 - val_acc: 0.1400
Epoch 4/30
 - 1s - loss: 1.4609 - acc: 0.5400 - val_loss: 2.3477 - val_acc: 0.1400
Epoch 5/30
 - 1s - loss: 1.3011 - acc: 0.6300 - val_loss: 2.2482 - val_acc: 0.2300
Epoch 6/30
 - 1s - loss: 1.2309 - acc: 0.6500 - val_loss: 2.2582 - val_acc: 0.2100
Epoch 7/30
 - 1s - loss: 1.0791 - acc: 0.7700 - val_loss: 2.3142 - val_acc: 0.2100
Epoch 8/30
 - 1s - loss: 0.8637 - acc: 0.8200 - val_loss: 2.3450 - val_acc: 0.1900
Epoch 9/30
 - 1s - loss: 0.7310 - acc: 0.8500 - val_loss: 2.3296 - val_acc: 0.2400
Epoch 10/30
 - 1s - loss: 0.8137 - acc: 0.8700 - val_loss: 2.3573 - val_acc: 0.2200
Epoch 11/30
 - 1s - loss: 0.6653 - acc: 0.9200 - val_loss: 2.3427 - val_acc: 0.2100
Epoch 12/30
 - 1s - loss: 0.5666 - acc:

### Evaluation of the CNN classifier that was trained on raw image data

In [19]:
from sklearn.metrics import confusion_matrix
pred=model.predict(x_test_new)
print(confusion_matrix(np.argmax(y_test_new,axis=1),np.argmax(pred,axis=1)))
print("Acc = " ,np.sum(np.argmax(y_test_new,axis=1)==np.argmax(pred,axis=1))/len(y_test_new))


[[27  4  6  4 13  7  0  6  7  6]
 [ 4 13  3  0 19  8  0  7  4 22]
 [ 3  4 17  4 30  6  4  6  2  4]
 [ 0  2  9 10 22 13  6 11  0  7]
 [ 1  1 21  5 31  3  6  7  0  5]
 [ 7  2 10 12 16  7  5 17  1  3]
 [ 2  4 18  5 18 10  4 12  1  6]
 [ 3  4 12  2 22 10  2 21  0  4]
 [20  2  9  1 16  6  1  5  9 11]
 [ 4 16  5  2  7  8  0 10  0 28]]
Acc =  0.20875


### Getting the VGG features for CIFAR

In [20]:
# Downloading embeddings
import urllib
import os
if not os.path.isfile('cifar_EMB_1000.npz'):
    urllib.request.urlretrieve(
    "https://www.dropbox.com/s/si287al91c1ls0d/cifar_EMB_1000.npz?dl=1",
    "cifar_EMB_1000.npz")
%ls -hl cifar_EMB_1000.npz

-rw-r--r-- 1 root root 18M Apr  2 20:48 cifar_EMB_1000.npz


In [21]:
Data=np.load("cifar_EMB_1000.npz")
vgg_features_cifar = Data["arr_0"]

In [22]:
vgg_features_cifar_train = vgg_features_cifar[idx_train]
vgg_features_cifar_test=(np.delete(vgg_features_cifar,idx_train,axis=0))
vgg_features_cifar_valid = vgg_features_cifar_test[idx_vaild]
vgg_features_cifar_test=(np.delete(vgg_features_cifar_test,idx_vaild,axis=0))


In [23]:
print(vgg_features_cifar_train.shape)
print(vgg_features_cifar_valid.shape)
print(vgg_features_cifar_test.shape)

(100, 4096)
(100, 4096)
(800, 4096)


### Setting up the the CNN classifier based on VGG feature

In [24]:
model = Sequential()
model.add(Dense(200,batch_input_shape=(None, 4096)))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Activation('relu'))
model.add(Dense(200))

#### we still need to add the last layers to get the predictions on the 10 classes
### your code here

model.add(Dense(nb_classes))
model.add(Activation('softmax'))

####### end of your code ######


model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])


In [25]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_3 (Dense)              (None, 200)               819400    
_________________________________________________________________
batch_normalization_6 (Batch (None, 200)               800       
_________________________________________________________________
dropout_2 (Dropout)          (None, 200)               0         
_________________________________________________________________
activation_7 (Activation)    (None, 200)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 200)               40200     
_________________________________________________________________
dense_5 (Dense)              (None, 10)                2010      
_________________________________________________________________
activation_8 (Activation)    (None, 10)                0         
Total para

In [26]:
history=model.fit(vgg_features_cifar_train, y_train_new, 
                  batch_size=10, 
                  epochs=20,
                  verbose=2, 
                  validation_data=(vgg_features_cifar_valid, y_valid_new),shuffle=True)

Train on 100 samples, validate on 100 samples
Epoch 1/20
 - 1s - loss: 2.3188 - acc: 0.2700 - val_loss: 1.9186 - val_acc: 0.4300
Epoch 2/20
 - 0s - loss: 0.8169 - acc: 0.7100 - val_loss: 1.5315 - val_acc: 0.5500
Epoch 3/20
 - 0s - loss: 0.6511 - acc: 0.8100 - val_loss: 1.7482 - val_acc: 0.5300
Epoch 4/20
 - 0s - loss: 0.3293 - acc: 0.9400 - val_loss: 1.7157 - val_acc: 0.4900
Epoch 5/20
 - 0s - loss: 0.4517 - acc: 0.8600 - val_loss: 1.6540 - val_acc: 0.5300
Epoch 6/20
 - 0s - loss: 0.2674 - acc: 0.9400 - val_loss: 1.6287 - val_acc: 0.4700
Epoch 7/20
 - 0s - loss: 0.3415 - acc: 0.9100 - val_loss: 1.5802 - val_acc: 0.5200
Epoch 8/20
 - 0s - loss: 0.1937 - acc: 0.9600 - val_loss: 1.6792 - val_acc: 0.5300
Epoch 9/20
 - 0s - loss: 0.1627 - acc: 0.9700 - val_loss: 1.5265 - val_acc: 0.5300
Epoch 10/20
 - 0s - loss: 0.1536 - acc: 0.9500 - val_loss: 1.5365 - val_acc: 0.5100
Epoch 11/20
 - 0s - loss: 0.1530 - acc: 0.9600 - val_loss: 1.6413 - val_acc: 0.5000
Epoch 12/20
 - 0s - loss: 0.1489 - acc:

### Evaluation of the CNN classifier that was trained on VGG features

In [27]:
pred=model.predict(vgg_features_cifar_test)

#### we now want to get the confusion matrix for the predictions on the test data
### your code here

print(confusion_matrix(np.argmax(y_test_new,axis=1),np.argmax(pred,axis=1)))
print("Acc = " ,np.sum(np.argmax(y_test_new,axis=1)==np.argmax(pred,axis=1))/len(y_test_new))

########## end of your code ###############################


[[40  1  4  1  1  0  1  1 26  5]
 [ 1 54  1  0  0  1  2  0  3 18]
 [ 7  1 40  4 11  5  8  1  2  1]
 [ 3  0  6 27  0 16 16  0  5  7]
 [ 4  0  8  1 32  3  9 13  5  5]
 [ 0  0  2 24  0 44  4  5  0  1]
 [ 4  0 26  2  1  1 46  0  0  0]
 [ 2  0  4  4  6  3  2 54  3  2]
 [ 2  1  3  0  0  1  1  0 67  5]
 [ 1  5  1  0  0  0  0  0  2 71]]
Acc =  0.59375


### Baseline: use VGG feature to train a Random Forest model

In [28]:
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier()
clf.fit(vgg_features_cifar_train,np.argmax(y_train_new, axis=1))

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)

In [29]:
from sklearn.metrics import confusion_matrix
pred=clf.predict(vgg_features_cifar_test)
print(confusion_matrix(np.argmax(y_test_new, axis=1), pred))
np.sum(pred==np.argmax(y_test_new, axis=1))/len(np.argmax(y_test_new, axis=1))


[[39  6  5  2  2  0  3  2 17  4]
 [ 5 54  2  0  0  0  1  1 10  7]
 [11  7 18  3 16  6  8  3  6  2]
 [ 6  5 16 16  6 14  7  5  2  3]
 [ 9  1 12  9 29  2  3 12  2  1]
 [ 3  4 12 13  4 26  7  8  1  2]
 [ 8  2 13  9 10  3 26  1  6  2]
 [ 6  3  8  3 15  3  0 37  2  3]
 [12  8  9  2  5  0  3  0 38  3]
 [ 5 24  2  1  1  0  1  2  8 36]]


0.39875