# Initialization
As first step we mount the Google Drive directory. Then, in order to speed up the overall computation, we copy and unzip the **food** and the **distractor** datasets directly into Colab. Due to the different skeletons of the datasets, for the Distractor dataset we also added a command to create the directory "distractor" to extract inside the unzipped dataset.

 

In [1]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [2]:
!cp '/content/gdrive/MyDrive/[MIRCV]FoodWebSearch/food.zip' .
!unzip -q food.zip
!rm food.zip

In [3]:
!mkdir '/content/distractor'
!cp '/content/gdrive/MyDrive/[MIRCV]FoodWebSearch/mirflickr25k.zip' .
!unzip -q mirflickr25k.zip -d '/content/distractor'
!rm mirflickr25k.zip


We create the variables for each dataset and for the batch size, assigning respective values.

In [12]:
import tensorflow as tf
import numpy as np
from os import listdir
import sklearn
from sklearn import preprocessing

FOOD_DIR = '/content/food-101/images/'
DISTRACTOR_DIR = '/content/distractor/'

BATCH_SIZE = 128

To speed up the application we choose to use the GPU provided by the machine. This command allows us to exploit the GPU.

In [5]:
# check hardware acceleration
device_name = tf.test.gpu_device_name()
print('Found GPU: ' , device_name)

Found GPU:  /device:GPU:0


# Datasets

For each dataset we collect all the files. We use as **shuffle** parameters "False" in order to mantain the same order of the files respect to the original directory, for both datasets. As expected, the food dataset has 101000 files, divided in 101 lables, the distractor has 25000 files for a single class.

In [6]:
food_dataset = tf.keras.preprocessing.image_dataset_from_directory(
    FOOD_DIR,
    seed=123,
    shuffle=False,
    color_mode='rgb', 
    image_size=(224, 224),
    batch_size=BATCH_SIZE)



Found 101000 files belonging to 101 classes.


In [7]:
distractor_dataset = tf.keras.preprocessing.image_dataset_from_directory(
    DISTRACTOR_DIR,
    seed=123,
    shuffle=False, 
    color_mode='rgb', 
    image_size=(224, 224),
    batch_size=BATCH_SIZE)



Found 25000 files belonging to 1 classes.


# Preparing the structures
As first step of this section we need to generate a structure that contains all the ids of the images. As unique identifier we choose to use the original name of each file contained in the datasets. 

For the food dataset we need to iterate over all the directories and append all the names of the images in the structure **descriptors**.

For the distractor dataset we have only a single directory to iterate over, so we retrieve the names in the **dataset_strings** structure and then we concatenate the two lists.

The structure **descriptors** at the end contains the ids of the food dataset and the distractor ones, in this specific order.

In [8]:
list_food = []
dataset_strings = []
descriptors = []
for dir in listdir('/content/food-101/images/'):
  if dir != '.DS_Store':
    list_food.append(tf.data.Dataset.list_files('/content/food-101/images/' + dir + '/*.jpg', shuffle=False))
for i in list_food:
  dataset_strings.append([f.numpy() for f in i.take(-1)])
for i in dataset_strings:
  for j in i:
    elem = str(j).split('/')[-1].replace("'","")
    descriptors.append(elem)

#generation of the ids
list_distractor = tf.data.Dataset.list_files(('/content/distractor/mirflickr/*.jpg'), shuffle=False)
dataset_strings = [str(f.numpy()).split('/')[-1].replace("'","") for f in list_distractor.take(-1)]
descriptors += dataset_strings



# Extracting features
We use a map function to apply the pre-processing step to all the images in both datasets.
Then we call the "mobilenetv2.predict()" function on the datasets, to extract the features, saving them in **food_features** and **distractor_features** respectively. We put these together, normalized, in the **features_list** structure.

In [9]:
#Pre-Processing

def preprocess(images, labels):
  images = tf.keras.applications.mobilenet_v2.preprocess_input(images)
  return images, labels
  
food_dataset = food_dataset.map(preprocess, deterministic=True)
distractor_dataset = distractor_dataset.map(preprocess, deterministic=True)

In [10]:
#extracting features

mobilenetv2 = tf.keras.applications.MobileNetV2(
    weights='imagenet',
    include_top=False,
    pooling = 'max', 
    input_shape=(224,224,3)
)


food_features = mobilenetv2.predict(food_dataset, batch_size=BATCH_SIZE, verbose=1)
distractor_features = mobilenetv2.predict(distractor_dataset, batch_size=BATCH_SIZE, verbose=1)

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224_no_top.h5


In [13]:
# normalizing
food_features = sklearn.preprocessing.normalize(food_features)
distractor_features = sklearn.preprocessing.normalize(distractor_features)

In [14]:
#generating features_list
features_list = list(food_features) + list(distractor_features)

# Saving as two numpy files
As final step we generate two **.npy** files, one for the ids and one for the features.

In [18]:
#save as 2 numpy files
np.save('/content/gdrive/MyDrive/[MIRCV]FoodWebSearch/deployment/mn_id.npy', descriptors)
np.save('/content/gdrive/MyDrive/[MIRCV]FoodWebSearch/deployment/mn_features.npy', features_list)


# Fine-tuned features extraction



As last step we extract the features from our fine-tuned model.

In [16]:
model = tf.keras.models.load_model('/content/gdrive/MyDrive/[MIRCV]FoodWebSearch/deployment/food_classifier.h5')

model = tf.keras.Model(inputs=model.input, outputs=model.get_layer('dense_hidden').output) #remove classifier from model

food_features_finetuned = model.predict(food_dataset, batch_size=BATCH_SIZE, verbose=1)
distractor_features_finetuned = model.predict(distractor_dataset, batch_size=BATCH_SIZE, verbose=1)
#normalizing
food_features_finetuned = sklearn.preprocessing.normalize(food_features_finetuned)
distractor_features_finetuned = sklearn.preprocessing.normalize(distractor_features_finetuned)

#generating features_list
features_list_finetuned = list(food_features_finetuned) + list(distractor_features_finetuned)



In [17]:
#save as 2 numpy files
np.save('/content/gdrive/MyDrive/[MIRCV]FoodWebSearch/deployment/ft_id.npy', descriptors)
np.save('/content/gdrive/MyDrive/[MIRCV]FoodWebSearch/deployment/ft_features.npy', features_list_finetuned)