# Face Detection and Landmark Estimation Workflow

## NVIDIA Data Loading Library (DALI) 

DALI is a collection of highly optimized building blocks and an execution engine that accelerates the data pipeline for computer vision and audio deep learning applications.

Input and augmentation pipelines provided by Deep Learning frameworks fit typically into one of two categories:

- fast, but inflexible - written in C++, they are exposed as a single monolithic Python object with very specific set and ordering of operations it provides
- slow, but flexible - set of building blocks written in either C++ or Python, that can be used to compose arbitrary data pipelines that end up being slow. One of the biggest overheads for this type of data pipelines is Global Interpreter Lock (GIL) in Python. This forces developers to use multiprocessing, complicating the design of efficient input pipelines.

DALI stands out by providing both performance and flexibility of accelerating different data pipelines. It achieves that by exposing optimized building blocks which are executed using simple and efficient engine, and enabling offloading of operations to GPU (thus enabling scaling to multi-GPU systems).

It is a single library, that can be easily integrated into different deep learning training and inference applications.

DALI offers ease-of-use and flexibility across GPU enabled systems with direct framework plugins, multiple input data formats, and configurable graphs. DALI can help achieve overall speedup on deep learning workflows that are bottlenecked on I/O pipelines due to the limitations of CPU cycles. Typically, systems with high GPU to CPU ratio are constrained on the host CPU, thereby under-utilizing the available GPU compute capabilities. DALI significantly accelerates input processing on such dense GPU configurations to achieve the overall throughput.

___
## FaceDetect Model

### Model Overview <a class="anchor" name="model_overview"></a>

The model described in this card detects one or more faces in the given image / video. Compared to the FaceirNet model, this model gives better results on RGB images and smaller faces.

### Model Architecture <a class="anchor" name="model_architecture"></a>

The model is based on NVIDIA DetectNet_v2 detector with ResNet18 as a feature extractor. This architecture, also known as GridBox object detection, uses bounding-box regression on a uniform grid on the input image. Gridbox system divides an input image into a grid which predicts four normalized bounding-box parameters (xc, yc, w, h) and confidence value per output class.

The raw normalized bounding-box and confidence detections needs to be post-processed by a clustering algorithm such as DBSCAN or NMS to produce final bounding-box coordinates and category labels.

___
## Facial Landmark Estimator (FPENet) Model Card

### Model Overview <a class="anchor" name="model_overview"></a>

The FPENet model described in this card is a facial keypoints estimator network, which aims to predict the (x,y) location of keypoints for a given input face image. FPEnet is generally used in conjuction with a face detector and the output is commonly used for face alignment, head pose estimation, emotion detection, eye blink detection, gaze estimation, among others.

This model predicts 68, 80 or 104 keypoints for a given face- Chin: 1-17, Eyebrows: 18-27, Nose: 28-36, Eyes: 37-48, Mouth: 49-61, Inner Lips: 62-68, Pupil: 69-76, Ears: 77-80, additional eye landmarks: 81-104. It can also handle visible or occluded flag for each keypoint. An example of the keypoints is shown as follows:


### Model Architecture <a class="anchor" name="model_architecture"></a>

This is a classification model with a [Recombinator network](https://openaccess.thecvf.com/content_cvpr_2016/papers/Honari_Recombinator_Networks_Learning_CVPR_2016_paper.pdf) backbone. Recombinator networks are a family of CNN architectures that are suited for fine grained pixel level predictions (as oppose to image level prediction like classification). The model recombines the layer inputs such that convolutional layers in the finer branches get inputs from both coarse and fine layers.

In [1]:
import json
import math
from numpy import int32, float32, array
from nvidia.dali.plugin.pytorch import DALIGenericIterator, LastBatchPolicy
import os
from tqdm.auto import tqdm
import tritonclient.grpc as grpcclient

from nvidia.dali.pipeline import pipeline_def
import nvidia.dali.fn as fn
import nvidia.dali.types as types

from preprocessing import coco_pipeline, facedetect_pipeline, get_face_rotation
from schema import COCOModel, Migrator
from utils import index_directory
from viz import show

%matplotlib inline
%config InlineBackend.figure_format='retina'

In [2]:
import psutil


CORE_COUNT = psutil.cpu_count(logical=False)
GLOBAL_SEED = 42

@pipeline_def(num_threads=CORE_COUNT, seed=GLOBAL_SEED, device_id=None, batch_size=32, enable_conditionals=True)
def get_facenet_pipeline(files, check_shape_only=True, device="cpu"):
    raw_image_tensor, _ = fn.readers.file(files=files, random_shuffle=False, name="FacenetEnsemble", dont_use_mmap=True, prefetch_queue_depth=4, read_ahead=True)
    image_tensor = fn.decoders.image(raw_image_tensor, output_type=types.GRAY, device=device, use_fast_idct=True)
    if check_shape_only:
        shape_tensor = fn.shapes(image_tensor)
        return shape_tensor
    else:
        return image_tensor

In [3]:
try:
    with open("symetrical_batches.json", "r") as f:
        image_batches = json.load(f)
    print("Using existing batches json.")
    
except Exception as e:
    print(e)
    root_dir = "/volume1/brandon/pictures"
    formats = (".jpg", ".jpeg")
    filenames = index_directory(root_dir, formats=formats, random_order=False)
    print("{:,} files.".format(len(filenames)))
    
    
    train_data = DALIGenericIterator(
        [get_facenet_pipeline(files=filenames, check_shape_only=True)],
        ["shape_tensor"],
        reader_name='FacenetEnsemble',
        last_batch_policy=LastBatchPolicy.PARTIAL
    )

    i = 0
    image_batches = {}
    for data in train_data:
        shapes = data[0]["shape_tensor"].squeeze()
        for shape in shapes:
            shape = str(shape)
            if shape in image_batches.keys():
                image_batches[shape].append(filenames[i])
            else:
                image_batches[shape] = [filenames[i]]
            i += 1
    with open("symetrical_batches.json", "w") as f:
        json.dump(image_batches, f, indent=1)

[Errno 2] No such file or directory: 'symetrical_batches.json'
22,484 files.


Invalid SOS parameters for sequential JPEG
Invalid SOS parameters for sequential JPEG
Invalid SOS parameters for sequential JPEG
Invalid SOS parameters for sequential JPEG
Invalid SOS parameters for sequential JPEG
Invalid SOS parameters for sequential JPEG
Invalid SOS parameters for sequential JPEG
Invalid SOS parameters for sequential JPEG
Invalid SOS parameters for sequential JPEG
Invalid SOS parameters for sequential JPEG
Invalid SOS parameters for sequential JPEG
Invalid SOS parameters for sequential JPEG
Invalid SOS parameters for sequential JPEG
Invalid SOS parameters for sequential JPEG
Invalid SOS parameters for sequential JPEG
Invalid SOS parameters for sequential JPEG
Invalid SOS parameters for sequential JPEG
Invalid SOS parameters for sequential JPEG
Invalid SOS parameters for sequential JPEG
Invalid SOS parameters for sequential JPEG
Invalid SOS parameters for sequential JPEG
Invalid SOS parameters for sequential JPEG
Invalid SOS parameters for sequential JPEG
Invalid SOS

In [4]:
key = sorted(image_batches, key=lambda k: len(image_batches[k]), reverse=True)[0]
print(f"shape {key} has {len(image_batches[key])} images")

shape tensor([3000, 4000,    1]) has 4168 images


In [5]:
filenames[i]

IndexError: list index out of range

In [None]:
url = os.getenv("TRITON_SERVER_URL")
client = grpcclient.InferenceServerClient(url=url, verbose=False)
facedetect_outputs = [
    grpcclient.InferRequestedOutput("true_boxes"),
    grpcclient.InferRequestedOutput("true_proba"),
    grpcclient.InferRequestedOutput("true_image_size"),
    ]


for k, filenames in image_batches.items():
    facenet_data = DALIGenericIterator(
        [get_facenet_pipeline(files=filenames, check_shape_only=True)],
        ["image_tensor"],
        reader_name="FacenetEnsemble"
    )


    for i, input_image in enumerate(facenet_data):
        input_image_data = input_image[0]["input_image_data"].numpy()
        facedetect_ensemble_inputs = [
                    grpcclient.InferInput("input_image_data", input_image_data.shape, "UINT8"),
                ]
        facedetect_ensemble_inputs[0].set_data_from_numpy(input_image_data)
        facedetect_infer_result = client.infer(
            "facenet_ensemble",
            facedetect_ensemble_inputs,
            model_version="1",
            outputs=facedetect_outputs,
        )


In [None]:
1/0

___

In [None]:
if run_infer:
    url = os.getenv("TRITON_SERVER_URL")
    client = grpcclient.InferenceServerClient(url=url, verbose=False)
    facedetect_outputs = [
        grpcclient.InferRequestedOutput("true_boxes"),
        grpcclient.InferRequestedOutput("true_proba"),
        grpcclient.InferRequestedOutput("true_image_size"),
    ]

    fpenet_outputs = [
        grpcclient.InferRequestedOutput("conv_keypoints_m80"),
        grpcclient.InferRequestedOutput("softargmax"),
        grpcclient.InferRequestedOutput("softargmax:1"),
    ]

    coco_annotations_file = "coco/instances.json"
    face_category_id = 1
    data = {
        "images": [],
        "annotations": [],
        "categories": [
            {"supercategory": "Face", "id": face_category_id, "name": "Face"}
        ],
    }

    facedetect_0_pipe = facedetect_pipeline(
        filenames=filenames,
        device_id=0,
        shard_id=0,
        num_shards=1,
        batch_size=32,
    )
    facedetect_0_pipe.build()

    facedetect_shard_pipelines = [facedetect_0_pipe]

    loader = DALIGenericIterator(
        facedetect_shard_pipelines,
        ["shapes", "images", "encoded"],
        reader_name="Encoder",
        last_batch_policy=LastBatchPolicy.PARTIAL,
    )

    loader_len = int(math.ceil(loader._size / max_batch_size))
    pbar = tqdm(
        total=loader_len,
        desc="Calculating image encoding",
    )

    fidx = 0
    for n, b in enumerate(loader):
        for shard in b:
            shapes = shard["shapes"]
            images = shard["images"]
            encoded = shard["encoded"]
            np_images = images.cpu().numpy()
            np_shapes = shapes.cpu().numpy()
            np_encoded = encoded.cpu().numpy()
            facedetect_inputs = [
                grpcclient.InferInput("input_1", np_images.shape, "FP32"),
                grpcclient.InferInput("true_image_size", np_shapes.shape, "INT64"),
            ]
            facedetect_inputs[0].set_data_from_numpy(np_images)
            facedetect_inputs[1].set_data_from_numpy(np_shapes)
            facedetect_infer_result = client.infer(
                "facenet_resized",
                facedetect_inputs,
                model_version="1",
                outputs=facedetect_outputs,
            )
            true_boxes = facedetect_infer_result.as_numpy(facedetect_outputs[0].name())
            true_proba = facedetect_infer_result.as_numpy(facedetect_outputs[1].name())

            for idx, mb in enumerate(shapes):
                try:
                    bbox = [i.tolist() for i in eval(true_boxes[idx].decode())]
                except:
                    bbox = true_boxes[idx].tolist()

                data["images"].append(
                    {
                        "id": fidx,
                        "height": int(mb[0]),
                        "width": int(mb[1]),
                        "channels": int(mb[2]),
                        "file_name": filenames[fidx],
                    }
                )

                for b in bbox:
                    b = array((eval(b[0]), eval(b[1]), eval(b[2]), eval(b[3])))
                    np_b = array(b).astype(int32).reshape(1, -1)
                    fpenet_inputs = [
                        grpcclient.InferInput(
                            "raw_image_data", np_encoded.shape, "UINT8"
                        ),
                        grpcclient.InferInput("true_boxes", np_b.shape, "INT32"),
                    ]
                    fpenet_inputs[0].set_data_from_numpy(np_encoded)
                    fpenet_inputs[1].set_data_from_numpy(np_b)

                    fpenet_infer_result = client.infer(
                        "fpenet_ensemble",
                        fpenet_inputs,
                        model_version="1",
                        outputs=fpenet_outputs,
                    )
                    segmentation = fpenet_infer_result.as_numpy(
                        fpenet_outputs[1].name()
                    )
                    rotation, center = get_face_rotation(segmentation[0])
                    data["annotations"].append(
                        {
                            "image_id": fidx,
                            "bbox": b.tolist(),
                            "rotation": rotation,
                            "center": center.tolist(),
                            "category_id": face_category_id,
                            "segmentation": segmentation[0].tolist(),
                        }
                    )

                fidx += 1

        pbar.update()

    with open(coco_annotations_file, "w") as f:
        json.dump(data, f)

    # Migrator().run()
    # model = COCOModel(
    #     images=data["images"],
    #     annotations=data["annotations"],
    #     categories=data["categories"],
    # )
    # _ = model.save()

___

In [None]:
coco_annotations_file = "coco/instances.json"
with open(coco_annotations_file, "r") as f:
    data = json.load(f)

In [None]:
coco_pipe = coco_pipeline(
    coco_annotations_file=coco_annotations_file,
    batch_size=4,
    device_id=0,
)
coco_pipe.build()
outputs = coco_pipe.run()

In [None]:
show(outputs, dpi=300)