# **Large-Scale Kinship Recognition Data Challenge: Kinship Verification STARTER NOTEBOOK**

We provide framework code to get you started on the competition. The notebook is broken up into three main sections. 
1. Data Loading & Visualizing
2. Data Generator & Model Building
3. Training & Testing Model

We have done the majority of the heavy lifting by making the data easily and readily accessible through Google Drive. Furthermore, we have made the task easier by creating a dataloader and fully trained end-to-end model that predicts a binary label (0 or 1) denoting whether two faces share a kinship relation. 

**WARNING: IF YOU HAVE NOT DONE SO**

Change to GPU:

Runtime --> Change Runtime Type --> GPU

Mount to Google Drive

In [7]:
from google.colab import drive
drive.mount('/gdrive')

Mounted at /gdrive


Install Libraries

In [2]:
%%capture
!pip install keras_vggface
!pip install keras_applications

In [3]:
from collections import defaultdict
from glob import glob
from random import choice, sample

import tensorflow as tf
import keras
import cv2
import numpy as np
import pandas as pd
from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from tensorflow.keras.layers import Input, Dense, GlobalMaxPool2D, GlobalAvgPool2D, Concatenate, Multiply, Dropout, Subtract
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from keras_vggface.utils import preprocess_input
from keras_vggface.vggface import VGGFace

[link text](https://)train_relationships.csv contains pairs of image paths which are positive samples (related to each other).

train-faces contains the images for training itself.

In [8]:
# Modify paths as per your method of saving them
train_file_path = "/gdrive/MyDrive/Kinship Recognition Starter/train_ds.csv"
#!ls /gdrive/MyDrive/
train_folders_path = "/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/"
# All images belonging to families F09** will be used to create the validation set while training the model
# For final submission, you can add these to the training data as well
val_famillies = "F09"

In [9]:
all_images = glob(train_folders_path + "*/*/*.jpg") #all images

train_images = [x for x in all_images if val_famillies not in x] #all images except for F09*
val_images = [x for x in all_images if val_famillies in x] #all images that are F09*

ppl = [x.split("/")[-3] + "/" + x.split("/")[-2] for x in all_images] #family/member/ for all images

train_person_to_images_map = defaultdict(list)
for x in train_images:
    train_person_to_images_map[x.split("/")[-3] + "/" + x.split("/")[-2]].append(x) #add a training person to map

val_person_to_images_map = defaultdict(list)
for x in val_images:
    val_person_to_images_map[x.split("/")[-3] + "/" + x.split("/")[-2]].append(x) #add a validation person to map

In [15]:
all_images

['/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03496_face0.jpg',
 '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03500_face2.jpg',
 '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03497_face0.jpg',
 '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03501_face0.jpg',
 '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03492_face0.jpg',
 '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03499_face5.jpg',
 '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03494_face0.jpg',
 '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03493_face0.jpg',
 '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03495_face0.jpg',
 '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03498_face0.jpg',
 '/gdrive/MyDrive/Kinship Recognition Starter/train/train-fa

In [17]:
relationships = pd.read_csv(train_file_path)
relationships = list(zip(relationships.p1.values, relationships.p2.values, relationships.relationship.values))
relationships = [(x[0],x[1],x[2]) for x in relationships if x[0][:10] in ppl and x[1][:10] in ppl]
print(relationships[0])

train = [x for x in relationships if val_famillies not in x[0]]
print(train[0])
val = [x for x in relationships if val_famillies in x[0]]

('F0123/MID1/P01276_face0.jpg', 'F0644/MID2/P06777_face5.jpg', 0)
('F0123/MID1/P01276_face0.jpg', 'F0644/MID2/P06777_face5.jpg', 0)


In [18]:
from keras.preprocessing import image
def read_img(path):
    img = image.load_img(path, target_size=(224, 224))
    img = np.array(img).astype(np.float)
    return preprocess_input(img, version=2)

Define a data generator. Here our data generator will generate a batch of examples which will be used by our model in training. It will generate two images, one for each in the pair as well as a label associated with it.

In [19]:
def gen(list_tuples, person_to_images_map, batch_size=16):
    ppl = list(person_to_images_map.keys())
    while True:
        batch_tuples = sample(list_tuples, batch_size)
        
        # All the samples are taken from train_ds.csv, labels are in the labels column
        labels = []
        for tup in batch_tuples:
          labels.append(tup[2])

        X1 = [x[0] for x in batch_tuples]
        X1 = np.array([read_img(train_folders_path + x) for x in X1])

        X2 = [x[1] for x in batch_tuples]
        X2 = np.array([read_img(train_folders_path + x) for x in X2])

        yield [X1, X2], np.array(labels)

Here is an ensemble model built with two resnet-50 architectures, pre-trained, with which we can apply transfer leraning on. This model achieves the baseline and the goal is to expand on this work. There have been papers exploring different architectures as well as introducing BatchNormalization among many other techniques to improve how well the model recognizes kinship between two faces.

In [20]:
def baseline_model():
    input_1 = Input(shape=(224, 224, 3))
    input_2 = Input(shape=(224, 224, 3))

    base_model = VGGFace(model='resnet50', include_top=False)

    for x in base_model.layers[:-3]:
        x.trainable = True

    x1 = base_model(input_1) #reshaping input of model to that of image shapes
    x2 = base_model(input_2)

    x1 = Concatenate(axis=-1)([GlobalMaxPool2D()(x1), GlobalAvgPool2D()(x1)]) #not too sure
    x2 = Concatenate(axis=-1)([GlobalMaxPool2D()(x2), GlobalAvgPool2D()(x2)])

    x3 = Subtract()([x1, x2]) #substract x1 and x2
    x3 = Multiply()([x3, x3]) #then square it

    x = Multiply()([x1, x2]) #multiply x1 and x2
    x = Concatenate(axis=-1)([x, x3]) #concatenate (multiply x1 and x2) with (substract x1 and x2, then square)

    x = Dense(100, activation="relu")(x) #output 100-dimension
    x = Dropout(0.05)(x) #adding regularization
    out = Dense(1, activation="sigmoid")(x) #output 1 (classification)

    model = Model([input_1, input_2], out)

    model.compile(loss="binary_crossentropy", metrics=['acc'], optimizer=Adam(0.00001))

    model.summary()

    return model

Save the best model to your drive after each training epoch so that you can come back to it. ReduceLROnPlateau reduces the learning rate when a metric has stopped improving, in this case the validation accuracy. 

In [24]:
file_path = "/gdrive/MyDrive/vgg_face.h5"

checkpoint = ModelCheckpoint(file_path, monitor='val_acc', verbose=1, save_best_only=True, save_weights_only=True, mode='max')

reduce_on_plateau = ReduceLROnPlateau(monitor="val_acc", mode="max", factor=0.1, patience=20, verbose=1)

callbacks_list = [checkpoint, reduce_on_plateau]

model = baseline_model()

Downloading data from https://github.com/rcmalli/keras-vggface/releases/download/v2.0/rcmalli_vggface_tf_notop_resnet50.h5
The following Variables were used a Lambda layer's call (tf.nn.convolution), but
are not present in its tracked objects:
  <tf.Variable 'conv1/7x7_s2/kernel:0' shape=(7, 7, 3, 64) dtype=float32>
It is possible that this is intended behavior, but it is more likely
an omission. This is a strong indication that this layer should be
formulated as a subclassed Layer rather than a Lambda layer.
The following Variables were used a Lambda layer's call (tf.compat.v1.nn.fused_batch_norm), but
are not present in its tracked objects:
  <tf.Variable 'conv1/7x7_s2/bn/gamma:0' shape=(64,) dtype=float32>
  <tf.Variable 'conv1/7x7_s2/bn/beta:0' shape=(64,) dtype=float32>
It is possible that this is intended behavior, but it is more likely
an omission. This is a strong indication that this layer should be
formulated as a subclassed Layer rather than a Lambda layer.
The following Var

In [22]:
#train: 3-tuple of (face 1, face 2, relationship between people of faces)
#ie: ('F0123/MID1/P01276_face0.jpg', 'F0644/MID2/P06777_face5.jpg', 0)
print(train[0])

#train_person_to_images_map: map of (person : path of images of that person)
# ie. ['/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03496_face0.jpg', '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03500_face2.jpg', '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03497_face0.jpg', '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03501_face0.jpg', '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03492_face0.jpg', '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03499_face5.jpg', '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03494_face0.jpg', '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03493_face0.jpg', '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03495_face0.jpg', '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03498_face0.jpg']
print(train_person_to_images_map['F0330/MID1'])


('F0123/MID1/P01276_face0.jpg', 'F0644/MID2/P06777_face5.jpg', 0)
['/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03496_face0.jpg', '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03500_face2.jpg', '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03497_face0.jpg', '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03501_face0.jpg', '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03492_face0.jpg', '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03499_face5.jpg', '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03494_face0.jpg', '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03493_face0.jpg', '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03495_face0.jpg', '/gdrive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03498_face0.jpg']


In [25]:
model.fit(gen(train, train_person_to_images_map, batch_size=16), use_multiprocessing=False,
                validation_data=gen(val, val_person_to_images_map, batch_size=16), epochs=25, verbose=1,
                workers=1, callbacks=callbacks_list, steps_per_epoch=100, validation_steps=50)

Epoch 1/25

Epoch 00001: val_acc improved from -inf to 0.54625, saving model to /gdrive/MyDrive/vgg_face.h5
Epoch 2/25

Epoch 00002: val_acc improved from 0.54625 to 0.61500, saving model to /gdrive/MyDrive/vgg_face.h5
Epoch 3/25

Epoch 00003: val_acc improved from 0.61500 to 0.62000, saving model to /gdrive/MyDrive/vgg_face.h5
Epoch 4/25

Epoch 00004: val_acc improved from 0.62000 to 0.63250, saving model to /gdrive/MyDrive/vgg_face.h5
Epoch 5/25

Epoch 00005: val_acc improved from 0.63250 to 0.64375, saving model to /gdrive/MyDrive/vgg_face.h5
Epoch 6/25

Epoch 00006: val_acc improved from 0.64375 to 0.64875, saving model to /gdrive/MyDrive/vgg_face.h5
Epoch 7/25

Epoch 00007: val_acc improved from 0.64875 to 0.66250, saving model to /gdrive/MyDrive/vgg_face.h5
Epoch 8/25

Epoch 00008: val_acc improved from 0.66250 to 0.66875, saving model to /gdrive/MyDrive/vgg_face.h5
Epoch 9/25

Epoch 00009: val_acc did not improve from 0.66875
Epoch 10/25

Epoch 00010: val_acc improved from 0.668

<tensorflow.python.keras.callbacks.History at 0x7ff8dd774950>

In [41]:
# Modify paths as per your need
test_path = "/gdrive/MyDrive/Kinship Recognition Starter/test/"

#model = baseline_model()
#model.load_weights("/gdrive/MyDrive/vgg_face.h5")

submission = pd.read_csv('/gdrive/MyDrive/Kinship Recognition Starter/test_ds.csv')
predictions = []

for i in range(0, len(submission.p1.values), 32):
    X1 = submission.p1.values[i:i+32]
    X1 = np.array([read_img(test_path + x) for x in X1])

    X2 = submission.p2.values[i:i+32]
    X2 = np.array([read_img(test_path + x) for x in X2])

    pred = model.predict([X1, X2]).ravel().tolist()
    predictions += pred

The following Variables were used a Lambda layer's call (tf.nn.convolution_106), but
are not present in its tracked objects:
  <tf.Variable 'conv1/7x7_s2/kernel:0' shape=(7, 7, 3, 64) dtype=float32>
It is possible that this is intended behavior, but it is more likely
an omission. This is a strong indication that this layer should be
formulated as a subclassed Layer rather than a Lambda layer.
The following Variables were used a Lambda layer's call (tf.compat.v1.nn.fused_batch_norm_106), but
are not present in its tracked objects:
  <tf.Variable 'conv1/7x7_s2/bn/gamma:0' shape=(64,) dtype=float32>
  <tf.Variable 'conv1/7x7_s2/bn/beta:0' shape=(64,) dtype=float32>
It is possible that this is intended behavior, but it is more likely
an omission. This is a strong indication that this layer should be
formulated as a subclassed Layer rather than a Lambda layer.
The following Variables were used a Lambda layer's call (tf.nn.convolution_107), but
are not present in its tracked objects:
  <tf.V

The final predictions will need to be rounded: EG 0.01 rounded to 0 and 0.78 rounded to 1. The simple .round() function is sufficient as below.

In [42]:
d = {'index': np.arange(0, 3000, 1), 'label':predictions}
submissionfile = pd.DataFrame(data=d)
submissionfile = submissionfile.round()

In [44]:
submissionfile.astype("int64").to_csv("/gdrive/MyDrive/tta2117.csv", index=False)

At this point, download the CSV and submit it on Kaggle to score your predictions.


In [45]:
#!chmod -R 777 /gdrive/MyDrive/"Kinship Recognition Starter"/
#!ls /gdrive/MyDrive/'Kinship Recognition Starter'
!ls /gdrive/MyDrive/

'16-17 SEAS Check list (1).xlsx'
'16-17 SEAS Check list.xlsx'
'Abella,Terric Music Composition.wav'
'Abella, Terric Resume (1).pdf'
'Abella, Terric Resume.pdf'
 Assignments
'Calculus Early Transcendentals - James Stewart (Eight Edition).pdf'
'Colab Notebooks'
'Columbia University 16.m4a'
 E-Books
'EE Lab'
'EE Lab #8.gdoc'
'EE Mini Project.gdoc'
'EE Project.gdoc'
 finalB_release.ipynb
'FINAL. Spring 2019 SCHEDULE.gdoc'
 foo.txt
'Getting started.pdf'
'Intro EE Lab #6.gdoc'
'japan review session.m4a'
'Kinship Recognition Starter'
'labeling vouchers1.zip'
'Labeling Vouchers edit.zip'
"Mālama Hawaiʻi ʻ21-'22"
'P3 Sources (Terric, A)'
'Physics Lab'
'PLT Proposal.gdoc'
'RealityMare-Terric Tojo Abella.wav'
'Special Just for Tojo'
'Terric - Labeling Vouchers.zip'
'Terric Tojo Abella.wav'
'tta2117 - Checklist (1).xlsx'
'tta2117 - Checklist.xlsx'
 tta2117.csv
'untitled folder.zip'
 VelocityData.mp4
 vgg_face.h5
 Yoshida-FinalPresentation-v7.pptx
