# Face detection and recognition inference pipeline

The following example illustrates how to use the `facenet_pytorch` python package to perform face detection and recogition on an image dataset using an Inception Resnet V1 pretrained on the VGGFace2 dataset.

The following Pytorch methods are included:
* Datasets
* Dataloaders
* GPU/CPU processing

In [1]:
from facenet_pytorch import MTCNN, InceptionResnetV1
import torch
from torch.utils.data import DataLoader
from torchvision import datasets
import numpy as np
import pandas as pd
import os

workers = 0 if os.name == 'nt' else 4

#### Determine if an nvidia GPU is available

In [2]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print('Running on device: {}'.format(device))

Running on device: cpu


#### Define MTCNN module

Default params shown for illustration, but not needed. Note that, since MTCNN is a collection of neural nets and other code, the device must be passed in the following way to enable copying of objects when needed internally.

See `help(MTCNN)` for more details.

In [3]:
mtcnn = MTCNN(
    image_size=160, margin=0, min_face_size=20,
    thresholds=[0.6, 0.7, 0.7], factor=0.709, post_process=True,
    device=device
)

#### Define Inception Resnet V1 module

Set classify=True for pretrained classifier. For this example, we will use the model to output embeddings/CNN features. Note that for inference, it is important to set the model to `eval` mode.

See `help(InceptionResnetV1)` for more details.

In [4]:
resnet = InceptionResnetV1(pretrained='vggface2').eval().to(device)

#### Define a dataset and data loader

We add the `idx_to_class` attribute to the dataset to enable easy recoding of label indices to identity names later one.

In [5]:
def collate_fn(x):
    return x[0]

dataset = datasets.ImageFolder('../data/test_images')
dataset.idx_to_class = {i:c for c, i in dataset.class_to_idx.items()}
loader = DataLoader(dataset, collate_fn=collate_fn, num_workers=workers)

#### Perfom MTCNN facial detection

Iterate through the DataLoader object and detect faces and associated detection probabilities for each. The `MTCNN` forward method returns images cropped to the detected face, if a face was detected. By default only a single detected face is returned - to have `MTCNN` return all detected faces, set `keep_all=True` when creating the MTCNN object above.

To obtain bounding boxes rather than cropped face images, you can instead call the lower-level `mtcnn.detect()` function. See `help(mtcnn.detect)` for details.

In [6]:
aligned = []
names = []
for x, y in loader:
    x_aligned, prob = mtcnn(x, return_prob=True)
    if x_aligned is not None:
        print('Face detected with probability: {:8f}'.format(prob))
        aligned.append(x_aligned)
        names.append(dataset.idx_to_class[y])

Face detected with probability: 0.888615
Face detected with probability: 0.891250
Face detected with probability: 0.993290
Face detected with probability: 0.889978
Face detected with probability: 0.813680
Face detected with probability: 0.961996
Face detected with probability: 0.837536
Face detected with probability: 0.908983
Face detected with probability: 0.954109
Face detected with probability: 0.850405
Face detected with probability: 0.874452
Face detected with probability: 0.711612
Face detected with probability: 0.984390
Face detected with probability: 0.955520
Face detected with probability: 0.765988


#### Calculate image embeddings

MTCNN will return images of faces all the same size, enabling easy batch processing with the Resnet recognition module. Here, since we only have a few images, we build a single batch and perform inference on it. 

For real datasets, code should be modified to control batch sizes being passed to the Resnet, particularly if being processed on a GPU. For repeated testing, it is best to separate face detection (using MTCNN) from embedding or classification (using InceptionResnetV1), as calculation of cropped faces or bounding boxes can then be performed a single time and detected faces saved for future use.

In [7]:
aligned = torch.stack(aligned).to(device)
embeddings = resnet(aligned).detach().cpu()

#### Print distance matrix for classes

In [8]:
dists = [[(e1 - e2).norm().item() for e2 in embeddings] for e1 in embeddings]
print(pd.DataFrame(dists, columns=names, index=names))

                         Akada Iamong  Apiwat Rattanaphan  Apiwit Buachan  \
Akada Iamong                 0.000000            0.643272        0.650846   
Apiwat Rattanaphan           0.643272            0.000000        0.808083   
Apiwit Buachan               0.650846            0.808083        0.000000   
Ekapoj Suthiwong             0.503320            0.802009        0.741291   
Nattawat Thakhamho           0.632061            0.859346        0.866283   
Nattawat Thakhamho           0.684282            0.898018        0.987381   
Panupong Sitthiprom          0.847214            0.933031        0.734187   
Panupong Sitthiprom          0.605375            0.706237        0.906060   
Peerakarn Phraphinyokul      0.621634            0.821482        0.607766   
Phacharadanai Rossoda        0.585243            0.721139        0.572181   
Phurin Rueannimit            0.705584            0.722743        0.725098   
Sittikorn Thongdeenok        0.742823            0.835879        0.619184   

In [2]:
from facenet_pytorch import MTCNN, InceptionResnetV1
import torch
from torch.utils.data import DataLoader
from torchvision import datasets
import numpy as np
import pandas as pd
import os

# Set device for computation
workers = 0 if os.name == 'nt' else 4  # Adjust workers for OS
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print('Running on device: {}'.format(device))

# Initialize MTCNN for face detection
mtcnn = MTCNN(
    image_size=160, margin=0, min_face_size=20,
    thresholds=[0.6, 0.7, 0.7], factor=0.709, post_process=True,
    device=device
)

# Initialize InceptionResnetV1 for face recognition
resnet = InceptionResnetV1(pretrained='vggface2').eval().to(device)

# Custom collate function for the dataloader
def collate_fn(x):
    return x[0]

# Load dataset from image folder
dataset = datasets.ImageFolder('../data/test_images')
dataset.idx_to_class = {i: c for c, i in dataset.class_to_idx.items()}

# Create a DataLoader
loader = DataLoader(dataset, collate_fn=collate_fn, num_workers=workers)

aligned = []
names = []

# Process each image in the DataLoader
for x, y in loader:
    x_aligned, prob = mtcnn(x, return_prob=True)  # Detect and align faces
    if x_aligned is not None:
        print('Face detected with probability: {:.8f}'.format(prob))
        aligned.append(x_aligned)
        names.append(dataset.idx_to_class[y])

# Stack aligned faces and move to device
if len(aligned) > 0:
    aligned = torch.stack(aligned).to(device)

    # Generate face embeddings
    embeddings = resnet(aligned).detach().cpu()

    # Save embeddings to a .pt file
    torch.save(embeddings, 'embeddings.pt')
    print("Embeddings saved to 'embeddings.pt'")

    # Compute pairwise distances between embeddings
    dists = [[(e1 - e2).norm().item() for e2 in embeddings] for e1 in embeddings]

    # Print the distance matrix as a DataFrame
    print(pd.DataFrame(dists, columns=names, index=names))
else:
    print("No faces detected.")


Running on device: cpu
Face detected with probability: 0.88861471
Face detected with probability: 0.89125007
Face detected with probability: 0.99328953
Face detected with probability: 0.88997757
Face detected with probability: 0.81368047
Face detected with probability: 0.96199566
Face detected with probability: 0.83753639
Face detected with probability: 0.90898311
Face detected with probability: 0.95410937
Face detected with probability: 0.85040462
Face detected with probability: 0.87445188
Face detected with probability: 0.71161187
Face detected with probability: 0.98439008
Face detected with probability: 0.95551974
Face detected with probability: 0.76598847
Embeddings saved to 'embeddings.pt'
                         Akada Iamong  Apiwat Rattanaphan  Apiwit Buachan  \
Akada Iamong                 0.000000            0.643272        0.650846   
Apiwat Rattanaphan           0.643272            0.000000        0.808083   
Apiwit Buachan               0.650846            0.808083        