# Compare lists of feature pairs of the Balance Faces in the Wild (BFW) dataset.

Load table in `data/bfw-datatable.pkl` to extract all features and store in the datatable. Overwrites the table to `data/bfw-datatable.pkl`.

## Add project code to PYTHONPATH, if not already there
Check that _path_package_ is set to _code_ directory on respective system

In [1]:
import pathlib
path_package=f'../'
import sys
if path_package not in sys.path:
    sys.path.append(path_package)

In [2]:
%matplotlib inline
import numpy as np
# Load out custom tool for loading and processing the data
from facebias.iotools import load_bfw_datatable, save_bfw_datatable, load_features_from_image_list

scorefun = np.dot # function to compare (or score) pairs of features with

dir_data = '../../data/'
dir_features = f'{dir_data}features/senet50/'
f_datatable = f'{dir_data}bfw-datatable.pkl'
overwrite_pickle = False

## Load the data

Read in the data as a pandas.DataFrame and show the first few rows.

In [3]:
data = load_bfw_datatable(f_datatable, cols=['p1', 'p2'])
data.head()

Unnamed: 0,p1,p2
0,asian_females/n000009/0010_01.jpg,asian_females/n000009/0043_01.jpg
1,asian_females/n000009/0010_01.jpg,asian_females/n000009/0120_01.jpg
2,asian_females/n000009/0010_01.jpg,asian_females/n000009/0122_02.jpg
3,asian_females/n000009/0010_01.jpg,asian_females/n000009/0188_01.jpg
4,asian_females/n000009/0010_01.jpg,asian_females/n000009/0205_01.jpg


## Load features and generate scores
First check if scores were calculated for each pairs; else, load and calculate

In [4]:
# create ali_images list of all faces (i.e., unique set)
li_images = list(np.unique(data.p1.to_list() + data.p2.to_list()))

# read features as a dictionary, with keys set as the filepath of the image with values set as the face encodings
features = load_features_from_image_list(li_images, dir_features, ext_feat='npy')

In [5]:
# score all feature pairs, because L2 norm applied on features dot is same as cosine sim
data['score'] = data.apply(lambda x: (features[x.p1], features[x.p2].T), axis=1)

In [14]:
if not pathlib.Path(f_datatable) or overwrite_pickle:
    save_bfw_datatable(data, fpath=f_datatable)
else:
    print('Scores were in datatable. Will not overwrite by default')