01 Course project - Welcome
---
In this final course project, you will have the chance to demonstrate what you have learned during the last courses in a concrete machine learning project: **building an image classifier**.

For this project, you will work on the **Swissroads** data set which contains several hundreds images of vehicles found in the EPFL - Lausanne area including **cars, trucks, vans, bikes, motorcycles and others**.

The goal of this project is to **test the different classifiers** and techniques from the course using high-level features extracted with a **pretrained** convolutional neural network from **TensorFlow Hub** and compare the results with your own **ConvNet** implementation trained from the raw image pixels.

# 02 Image classifier

# Feature extraction 

In this first part of the project, start by extracting a set of **high-level features** for each image in the data set. To achieve this, you can use ex. the <a href="https://tfhub.dev/google/imagenet/inception_v3/feature_vector/1">Inception v3</a> or <a href="https://tfhub.dev/google/imagenet/mobilenet_v2_100_224/feature_vector/2">MobileNet v2</a> ConvNets which respectively extract 2048 and 1280 **high-level features**.

Suggestion: consider storing the extracted high-level features, e.g. in npz files, for quickly reloading them into each of the following notebooks.

**Note**: All your models should be **trained** on the training set, and the fine tuning of your hyperparameters should be **validated** on the validation set. The final test set should only be used for the final comparison to **test** the accuracies of your models on a new dataset.

In [14]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Create image generator
train_generator = ImageDataGenerator(rescale = 1/255
                                    #,horizontal_flip = True
                                    #,rotation_range = 5
                                    ,validation_split = None) #0.2 )

valid_generator = ImageDataGenerator(rescale = 1/255 )
test_generator = ImageDataGenerator(rescale = 1/255 )

In [15]:
import os
trainset = train_generator.flow_from_directory( os.path.join("data/swissroads", 'train')
                                               ,batch_size = 32
                                               ,target_size = (224, 224)
                                               ,shuffle = False #as not splitting
                                               )#,subset = 'training' )

validset = valid_generator.flow_from_directory( os.path.join('data/swissroads', 'valid')
                                               ,batch_size = 32
                                               ,target_size = (224, 224)
                                               ,shuffle = False
                                               )#,subset = 'validation' )

testset = test_generator.flow_from_directory( os.path.join('data/swissroads', 'test')
                                             ,batch_size = 32
                                             ,target_size = (224, 224)
                                             ,shuffle = False
                                             )#,subset = 'validation' )

Found 280 images belonging to 6 classes.
Found 139 images belonging to 6 classes.
Found 50 images belonging to 6 classes.


In [16]:
import numpy as np

#function piling all director batches and return a whole one
def f_get_wholeImgBatch(p_directIter, p_DI_sizeBatch=32):
    img_batches = None
    lbl_batches = None
    
    while True:
        batch_imgs, batch_labels = p_directIter.next()

        if img_batches is None:
            img_batches = batch_imgs
            lbl_batches = batch_labels
        else:
            img_batches= np.concatenate((img_batches, batch_imgs), axis=0)
            lbl_batches= np.concatenate((lbl_batches, batch_labels), axis=0)
            

        if batch_imgs.shape[0] < p_DI_sizeBatch or img_batches is None:
            break;
    
    if img_batches is not None and img_batches.shape[0]<len(p_directIter.filenames):
        #gets into 2th nested function
        return f_get_wholeImgBatch(p_directIter, p_DI_sizeBatch)
    
    return img_batches, lbl_batches

In [17]:
# train data set
img_batches_tr, lbl_batches_tr = f_get_wholeImgBatch(trainset, 32)
targ_tr = np.matmul(lbl_batches_tr, np.array(list(trainset.class_indices.values())) ).astype(np.int32)

# validation data set
img_batches_va, lbl_batches_va = f_get_wholeImgBatch(validset, 32)
targ_va = np.matmul(lbl_batches_va, np.array(list(validset.class_indices.values())) ).astype(np.int32)

# test data set
img_batches_te, lbl_batches_te = f_get_wholeImgBatch(testset, 32)
targ_te = np.matmul(lbl_batches_te, np.array(list(testset.class_indices.values())) ).astype(np.int32)

print("Train", img_batches_tr.shape, img_batches_tr.dtype, targ_tr.shape, targ_tr.dtype)
print("Valid", img_batches_va.shape, img_batches_va.dtype, targ_va.shape, targ_va.dtype)
print("Test", img_batches_te.shape, img_batches_te.dtype, targ_te.shape, targ_te.dtype)

Train (280, 224, 224, 3) float32 (280,) int32
Valid (139, 224, 224, 3) float32 (139,) int32
Test (50, 224, 224, 3) float32 (50,) int32


In [18]:
import tensorflow as tf
import tensorflow_hub as hub

In [19]:

def f_create_graph(p_mod_url='https://tfhub.dev/google/imagenet/mobilenet_v2_100_224/feature_vector/2'):
    # Create graph
    img_graph = tf.Graph()

    with img_graph.as_default():
        # Download module
        module_url = p_mod_url
        feature_extractor = hub.Module(module_url)

        # Create input placeholder
        input_imgs = tf.placeholder(dtype=tf.float32, shape=[None, 224, 224, 3])

        # A node with the features
        imgs_features = feature_extractor(input_imgs)

        # Collect initializers
        init_op = tf.group( [tf.global_variables_initializer()
                            ,tf.tables_initializer() ] )

    img_graph.finalize() # Good practice: make the graph "read-only"
    
    return img_graph, input_imgs, imgs_features, init_op

In [20]:
# Create a session
def f_create_sess(p_img_graph, p_input_imgs, p_imgs_features, p_init_op, p_img_batches):
    sess = tf.Session(graph=p_img_graph)

    # Initialize it
    sess.run(p_init_op)

    # Extract features
    features = sess.run(p_imgs_features, feed_dict={p_input_imgs: p_img_batches})
    #features.shape
    return features

In [21]:
#Function to save data as a npz file 
def f_save_npzF(p_pathFile, p_feat, p_targ, p_data, p_fnames=None, p_categ_names=None):
    np.savez(p_pathFile, features=p_feat, category=p_targ, data=p_data, filenames=p_fnames, categorynames=p_categ_names)

In [22]:
def f_dict2array1(p_dict):
    cat_names=[]
    for k, v in trainset.class_indices.items():
        cat_names.append(str(v)+":"+k)
    
    return cat_names

In [23]:
import tensorflow as tf
import tensorflow_hub as hub

##** treating Train data set: img_batches_tr, lbl_batches_tr

img_graph, input_imgs, imgs_features, init_op = f_create_graph()

feat = f_create_sess(img_graph, input_imgs, imgs_features, init_op, img_batches_tr)
print("var features ",feat.shape)

f_save_npzF("data/data_train.npz", feat, targ_tr, img_batches_tr, np.array(trainset.filenames), f_dict2array1(trainset.class_indices))

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


var features  (280, 1280)


In [24]:
#Validation: img_batches_va, lbl_batches_va

img_graph, input_imgs, imgs_features, init_op = f_create_graph()

feat = f_create_sess(img_graph, input_imgs, imgs_features, init_op, img_batches_va)
print("var features ",feat.shape)

f_save_npzF("data/data_valid.npz", feat, targ_va, img_batches_va, np.array(validset.filenames), f_dict2array1(validset.class_indices))

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


var features  (139, 1280)


In [25]:
#Test: img_batches_te, lbl_batches_te

img_graph, input_imgs, imgs_features, init_op = f_create_graph()

feat = f_create_sess(img_graph, input_imgs, imgs_features, init_op, img_batches_te)
print("var features ",feat.shape)

f_save_npzF("data/data_test.npz", feat, targ_te, img_batches_tr, np.array(testset.filenames), f_dict2array1(testset.class_indices))

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


var features  (50, 1280)


In [26]:
fRes = open("data/results09.csv","w+")
#fRes.write("This is line %d\n" % (i+1))
fRes.write('idx,model,test_accuracy'+'\r\n') 
fRes.close() 