<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>

<i>Licensed under the MIT License.</i>

# Accuracy on SOTA datasets 

This notebook looks into reproducing the accuracy on datasets which are commonly used in SOTA approaches.


### Dataset-specific setups:

We follow the literature closely when setting up our experiments. See https://arxiv.org/abs/1511.06452:

- CUB200-2011 dataset: 
   - all images in `data_rank` are used as both the query set and the gallery set. 
   - The images within the first 100 of the 200 classes from the dataset are used for training the DNN, the remaining 100 classes are used to compute Recall@K.

- CARS196: 
   - The images within the first 98 of the 196 classes from the dataset are used for training the DNN, the remaining 98 classes are used to compute Recall@K.

## Initialization

In [1]:
# Ensure edits to libraries are loaded and plotting is shown in the notebook.
%matplotlib inline
%reload_ext autoreload
%autoreload 2

In [2]:
# Regular python libraries
import sys
import numpy as np
from pathlib import Path
import random

import scrapbook as sb

# fast.ai
import fastai
from fastai.vision import (
    accuracy,
    cnn_learner,
    DatasetType,
    ImageList,
    imagenet_stats,
    models,
    partial,
)

# Computer Vision repository
sys.path.extend([".", "../.."])  # to access the utils_cv library
from utils_cv.classification.data import Urls
from utils_cv.classification.model import TrainMetricsRecorder
from utils_cv.common.data import unzip_url
from utils_cv.common.gpu import which_processor, db_num_workers
from utils_cv.similarity.data import comparative_set_builder
from utils_cv.similarity.metrics import (
    compute_distances,
    positive_image_ranks,
    recall_at_k,
)
from utils_cv.similarity.model import compute_features, compute_features_learner
from utils_cv.similarity.plot import (
    plot_comparative_set,
    plot_distances,
    plot_ranks_distribution,
    plot_recalls,
)

In [3]:
print(f"Fast.ai version = {fastai.__version__}")
which_processor()

Fast.ai version = 1.0.57
Torch is using GPU: Tesla V100-PCIE-16GB


## Data preparation
We use popular datasets such as CUB-200-2011 or Cars196. These datasets come with a predefine split into training (used to refine the DNN) and into testing (used to compute Recall@K metric). Not that the train and test sets typically do not share any classes.

In [4]:
# Set dataset, model and evaluation parameters
#DATA_FINETUNE_PATH = "/home/pabuehle/Desktop/data/cub_200_2011/train"
#DATA_RANKING_PATH = "/home/pabuehle/Desktop/data/cub_200_2011/test"
DATA_FINETUNE_PATH = "/home/pabuehle/Desktop/data/cars196/train"
DATA_RANKING_PATH = "/home/pabuehle/Desktop/data/cars196/test"
#DATA_FINETUNE_PATH = "C:/Users/pabuehle/Desktop/data/cub_200_2011_subset3/train"
#DATA_RANKING_PATH = "C:/Users/pabuehle/Desktop/data/cub_200_2011_subset3/test"
#DATA_FINETUNE_PATH = "C:/Users/pabuehle/Desktop/data/cub_200_2011/train"
#DATA_RANKING_PATH = "C:/Users/pabuehle/Desktop/data/cub_200_2011/test"

# DNN configuration and learning parameters
EPOCHS_HEAD = 0
EPOCHS_BODY = 0 #12
LEARNING_RATE = 10* 1e-4
BATCH_SIZE = 16
ARCHITECTURE = models.resnet18
IM_SIZE = 224 #300

We can now build our training data object, and split it to get a certain percentage (here 20%) assigned to a validation set.

In [5]:
# Load images into fast.ai's ImageDataBunch object
random.seed(642)
data_finetune = (
    ImageList.from_folder(DATA_FINETUNE_PATH)
    .split_by_rand_pct(valid_pct=0.1, seed=20)
    .label_from_folder()
    .transform(size=IM_SIZE)
    .databunch(bs=BATCH_SIZE, num_workers = db_num_workers())
    .normalize(imagenet_stats)
)

print(f"""\FINE-TUNING:
Training set: {len(data_finetune.train_ds.x)} images
Validation set: {len(data_finetune.valid_ds.x)} images\
"""
)

\FINE-TUNING:
Training set: 7249 images
Validation set: 805 images


## Classification model fine-tuning

We begin by retrieving a [ResNet18](https://arxiv.org/pdf/1512.03385.pdf) CNN from fast.ai's library which is pre-trained on ImageNet, and fine-tune the model on our training set. We use the same training parameters and take the same approach as what we did in our [classification notebooks](https://github.com/microsoft/ComputerVision/tree/master/classification/notebooks), training first the (new) last layer only, and then the full DNN.

Note how we train the DNN here on an image classification task but will use it as featurizer later for image similarity.

In [6]:
learn = cnn_learner(
    data_finetune,
    ARCHITECTURE,
    metrics=[accuracy],
    #callback_fns=[partial(TrainMetricsRecorder, show_graph=True)],
    ps=0 #Leave dropout at zero. Higher values tend to perform significantly worse
)

In [7]:
import math
import torch
import torch.nn as nn
from fastai.layers import FlattenedLoss
from IPython.core.debugger import set_trace

NUMBER_CLASSES = len(data_finetune.classes)
FEATURE_DIM = learn.model[1][-1].in_features
TEMPERATURE = 0.05
#FC_WEIGHT = learn.model[1][6].weight

print(NUMBER_CLASSES, FEATURE_DIM, TEMPERATURE)


class NormSoftmaxLoss(nn.Module):
    """
    L2 normalize weights and apply temperature scaling on logits.
    https://github.com/azgo14/classification_metric_learning/blob/master/metric_learning/modules/losses.py#L7
    """

    def __init__(self, dim=FEATURE_DIM, num_instances=NUMBER_CLASSES, temperature=TEMPERATURE):
        super(NormSoftmaxLoss, self).__init__()
        #self.weight = FC_WEIGHT 
        self.weight = nn.Parameter(torch.Tensor(num_instances, dim)).cuda(torch.device("cuda"))
        # Initialization from nn.Linear (https://github.com/pytorch/pytorch/blob/v1.0.0/torch/nn/modules/linear.py#L129)
        stdv = 1. / math.sqrt(self.weight.size(1))
        self.weight.data.uniform_(-stdv, stdv)
        self.temperature = temperature
        self.loss_fn = nn.CrossEntropyLoss()
 
    def forward(self, embeddings, instance_targets):
        norm_emb = nn.functional.normalize(embeddings, dim=1)
        norm_weight = nn.functional.normalize(self.weight, dim=1)
        prediction_logits = nn.functional.linear(norm_emb, norm_weight)
        loss = self.loss_fn(prediction_logits / self.temperature, instance_targets)
        return loss

98 512 0.05


In [8]:
learn.loss_func = FlattenedLoss(NormSoftmaxLoss)

In [9]:
# This line achieves two things:
# (1) Remove last fc layer so that input to loss function is the DNN feature, e.g. 512 dimensional.
# (2) Instead, add normalization which is referred to in the paper in section X.X

learn.model[1][6] = nn.LayerNorm(FEATURE_DIM, elementwise_affine=False)
#learn.model[1][6] = nn.Identity()

### Run DNN training

In [10]:
# Train the last layer using a larger rate since most of the DNN is fixed.
learn.fit_one_cycle(EPOCHS_HEAD, 10* LEARNING_RATE) #, callbacks = [MyCallback()])

epoch,train_loss,valid_loss,accuracy,time


  warn("Your generator is empty.")


Let's now unfreeze all the layers and fine-tuning the model more


In [11]:
learn.unfreeze()
learn.fit_one_cycle(EPOCHS_BODY, LEARNING_RATE)

epoch,train_loss,valid_loss,accuracy,time


## Feature extraction

We now load the ranking set and extract the DNN features for each image 

In [12]:
# Load images into fast.ai's ImageDataBunch object
data_rank = (
    ImageList.from_folder(DATA_RANKING_PATH)
    .split_none()
    .label_from_folder()
    .transform(size=IM_SIZE)
    .databunch(bs=BATCH_SIZE, num_workers = db_num_workers())
    .normalize(imagenet_stats)
)

print(f"""\FINE-TUNING:
Training set: {len(data_rank.train_ds.x)} images
Validation set: {len(data_rank.valid_ds.x)} images\
"""
)

# Display example images
#data_rank.show_batch(rows=3, figsize=(6, 6))

\FINE-TUNING:
Training set: 8131 images
Validation set: 1 images


In [13]:
learn.model

Sequential(
  (0): Sequential(
    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (4): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  

The following line will allow us to extract the penultimate layer (ie 512 floating points vector) after running an image  through the model.

In [14]:
# Use last layer as image representation
embedding_layer = learn.model[1][-1] 
print(embedding_layer)

LayerNorm((512,), eps=1e-05, elementwise_affine=False)


In [None]:
#Compute DNN features for all validation images
dnn_features = compute_features_learner(data_rank, DatasetType.Train, learn, embedding_layer)

## Image Retrieval Example
The cell below shows how to find and display the most similar images in the validation set for a given query image (which we also select from the validation set). This example is similar to the one shown in the [00_webcam.ipynb](https://github.com/microsoft/ComputerVision/tree/master/similarity/notebooks/00_webcam.ipynb) notebook. 

In [None]:
# Get the DNN feature for the query image
query_im_path =  str(data_rank.train_ds.items[1])
query_feature = dnn_features[query_im_path]
print(f"Query image path: {query_im_path}")
print(f"Query feature dimension: {len(query_feature)}")
assert len(query_feature) == 512

# Compute the distances between the query and all reference images
distances = compute_distances(query_feature, dnn_features)
plot_distances(distances, num_rows=1, num_cols=7, figsize=(15,5))

## Quantitative evaluation

Our approach for quantitative evaluation is as follows:
- We extract the features of each query image in the query set, and search the K most similar images in the gallery set. 
- If one of the K retrieved images have the same label with the query image, then that increases recall.

In [None]:
#init
rank1_count = 0
labels = data_rank.train_ds.y
#im_paths = [str(s) for s in list(data_rank.train_ds.items)]
im_paths = data_rank.train_ds.items
assert len(labels) == len(im_paths) == len(dnn_features)

query_indices = range(len(im_paths))[::11] #only use every nth image to speed up accuracy computation

# Loop over all images and use as query image once
for query_index in query_indices:
    if query_index % 1100 == 11:
        acc = 100.0 * rank1_count / query_index
        print(query_index, len(im_paths), acc)

    # Get query information
    query_im_path =  str(im_paths[query_index])
    query_feature = dnn_features[query_im_path]
    
    # Compute distance to all images in the gallery set
    # L2-normalizes all embeddings before computing the L2 distance.
    # Note that the ranking is identical to setting `method = "cosine"` in the function below)
    # (See: https://en.wikipedia.org/wiki/Cosine_similarity)
    distances = compute_distances(query_feature, dnn_features) #, method = "cosine")

    # Find the smallest distance
    minDist = float('inf')
    minDistIndex = None
    for index, distance in enumerate(distances):
        if index == query_index: #ignore the query image itself
            continue

        if distance[1] < minDist:
            minDist = distance[1]
            minDistIndex = index

    # Compute rank
    if labels[query_index] == labels[minDistIndex]:
        rank1_count += 1
        
print(rank1_count, len(query_indices))

In [None]:
print("Recall@1 accuracy = {:2.2f}".format(100.0 * rank1_count / len(query_indices)))

In [None]:
# Log some outputs using scrapbook which are used during testing to verify correct notebook execution
#sb.glue("median_rank", median_rank)
#sb.glue("random_rank", random_rank)

# Appendix
### Example how to use callbacks

In [None]:
# from fastai.callback import Callback
# from IPython.core.debugger import set_trace          
              
# class MyCallback():
#     "Base class for callbacks that want to record values, dynamically change learner params, etc."
#     _order=0
#     def on_train_begin(self, **kwargs)->None:
#         "To initialize constants in the callback."
#         pass
#     def on_epoch_begin(self, **kwargs)->None:  
#         "At the beginning of each epoch."
#         pass
    
    
#     def on_batch_begin(self, **kwargs)->None:
#         "Set HP before the output and loss are computed."
#         pass
    
    
#     def on_loss_begin(self, last_output, **kwargs)->None:
#         "Called after forward pass but before loss has been computed."
#         #last_input == RGB image normalized
#         #x = nn.functional.normalize(x, dim=1)
#         #print(kwargs)
#         #print(last_output)
#         #set_trace()
#         #last_output = 100*last_output
#         pass
              
#     def on_backward_begin(self, **kwargs)->None:
#         "Called after the forward pass and the loss has been computed, but before backprop."
#         pass
#     def on_backward_end(self, **kwargs)->None:
#         "Called after backprop but before optimizer step. Useful for true weight decay in AdamW."
#         pass
#     def on_step_end(self, **kwargs)->None:
#         "Called after the step of the optimizer but before the gradients are zeroed."
#         pass
#     def on_batch_end(self, **kwargs)->None:
#         "Called at the end of the batch."
#         pass
#     def on_epoch_end(self, **kwargs)->None:
#         "Called at the end of an epoch."
#         pass
#     def on_train_end(self, **kwargs)->None:
#         "Useful for cleaning up things and saving files/models."
#         pass

### Example how to add L2-normalization

In [None]:
# class MyL2Norm(torch.nn.Module):
#     def __init__(self):
#         super(MyL2Norm, self).__init__()

#     def forward(self, input):
#         return torch.nn.functional.normalize(input, dim=1)

# modules = []
# modules.append(learn.model[1][0])
# modules.append(learn.model[1][1])
# modules.append(learn.model[1][2])
# modules.append(learn.model[1][3])
# modules.append(learn.model[1][4])
# modules.append(learn.model[1][5])
# modules.append(MyL2Norm())
# modules.append(learn.model[1][6])
# learn.model[1] = torch.nn.Sequential(*modules)
# learn.model[1]

### HACK TO DEACTIVATE BIAS IN LAST FC LAYER

Hack: edit function bn_drop_lin() in /anaconda/envs/cv/lib/python3.6/site-packages/fastai/layers.py
- Replace
        layers.append(nn.Linear(n_in, n_out))
- With: 
        if n_in != 512:
            layers.append(nn.Linear(n_in, n_out))
        else:
            print("WARNING: SETTING BIAS TO FALSE IN bn_drop_lin")
            layers.append(nn.Linear(n_in, n_out, bias = False))