# Demo for RFIW-2020 (Task III): Search & Retrieval (missing children)
## Overview
This basic demo benchmarks track III-- going from face encodings (assumed extracted) to ranked lists formatted for submission (i.e., [https://competitions.codalab.org/competitions/22152](codalab)).

This demo uses [pandas](https://pandas.pydata.org) for handling data lists.

More about the challenge this benchmark was inspired by see challenge webpage for Recognizing Families In 
the Wild ([RFIW](https://web.northeastern.edu/smilelab/rfiw2020/)).

We will evaluate a set of probes (i.e., search subjects) with a gallery (i.e., all other search subjects). We aim to match all instances in gallery to a given probe, rank by score, and, ideally, rank family members of the respective probe as the top K, where K is the number of true relatives of the probe in the gallery. Note that K varies from probe-to-probe. Thus, the specifications of the challenge require that all gallery subjects be in all ranked lists-- say there are N subjects that make up the gallery, K_1 will be relative of p_1, K_2-->p_2, ..., p_M->K_Mm where M is the number of probes and, hence, the total number of families.

The work flow is as follows:
- Load all features into dictionary.
- Generate M ranked lists, i.e., one per probe.
- For each of the M lists:
  - Calculate the AP for the probe w.r.t. its ranked list.
  - Once M AP scores were collected, take the mean such to yield the final score reported (i.e., mAP).
  - Generate CMC curves.
  - Visualize edge cases: those with hard negatives, positives or exceptionally easy family samples.


In [3]:
!pip install tqdm.auto

Collecting tqdm.auto
[31m  ERROR: Could not find a version that satisfies the requirement tqdm.auto (from versions: none)[0m
[31mERROR: No matching distribution found for tqdm.auto[0m


In [4]:
import pickle
from pathlib import Path
from typing import Iterable

import numpy as np
import pandas as pd
from scipy.spatial import distance
from sklearn.metrics import accuracy_score
from tqdm.auto import tqdm

In [None]:
path_features = Path('../../data/fiwdb/features').resolve()
path_probe_list = Path('/media/yuyin/10THD1/Kinship/fiw-mm/data/lists/test/probes.json').resolve()
path_gallery_set = Path('/media/yuyin/10THD1/Kinship/fiw-mm/data/lists/test/gallery.json').resolve()
batchsize=256
save_name='Rank-k_mAP' 

In [6]:
class Rfiw2020TestSet(Dataset):
    ######################################################################
    # Data
    # ---------
    def __init__(self, x):
        if x == 'gallery':
            with open(opt.test_list_g) as file:
                self.imgs = json.load(file)
        else:
            self.imgs = []
            with open(opt.test_list_p) as file:
                probes = json.load(file)
                for _, family_member_ind in probes.items():
                    self.imgs.append(family_member_ind)

    def __len__(self):
        return len(self.imgs)



def get_gallery_feature_and_id(img_path):
    ######################################################################
    # Load feature
    # ---------
    feat_path = "/media/yuyin/10THD1/Kinship/fiw-mm/data/lists/test/gallery_features.npy"        
    feat_matrix = np.loadtxt(feat_path)

    assert feat_matrix.shape[0] == len(img_path)
    labels = np.zeros((feat_matrix.shape[0], 1))  # size (21951, 1)
    for i, path in enumerate(img_path):
        labels[i] = int(path.split('/')[0].split('F')[1])

    return feat_matrix, labels


def get_probe_feature_and_id(img_path):
    ######################################################################
    # size of probe img_path: 190
    # ---------
    labels = []
    features = []
    for path in img_path:
        label = int(path.split('/')[0].split('F')[1])
        feat_path_per_probe = os.path.join(opt.test_feature_dir, path, "encodings.pkl")
        with open(feat_path_per_probe, 'rb') as f:
            feat = pickle.load(f)
            for _, feats_per_probe in feat.items() :
                features.append(feats_per_probe)
                labels.append(label)

    return np.asarray(features), np.asarray(labels).reshape(-1,1)

NameError: name 'Dataset' is not defined

In [7]:

######################################################################
# Testing
# ---------
# Load data
image_datasets = {x: Rfiw2020TestSet(x) for x in ['gallery', 'query']}

print('-------test-----------')


NameError: name 'Rfiw2020TestSet' is not defined

In [8]:
# Load features
gallery_feature, gallery_label = get_gallery_feature_and_id(
    image_datasets['gallery'].imgs)
print("gallery size:", gallery_feature.shape, gallery_label.shape)


NameError: name 'get_gallery_feature_and_id' is not defined

In [9]:
query_feature, query_label = get_probe_feature_and_id(
    image_datasets['query'].imgs) 
# (4540, 512) (4540,)
print("query size:", query_feature.shape, query_label.shape) 



NameError: name 'get_probe_feature_and_id' is not defined

In [None]:
# Save result
print('-->Save features to gallery_probe_features.npy')
result = {'gallery_f': gallery_feature, 'gallery_label': gallery_label,
          'query_f'  : query_feature, 'query_label': query_label}

np.save("gallery_probe_features.npy", result)


# Run evaluation_gpu.py
result = './%s_result.txt' % opt.save_name
os.system('python utils.py | tee -a %s' % result)