## MTCNN face detection

A more recently used approach is the MTCNN face detection method. MTCNN stands for Multi-task Cascaded Convolutional Networks and, as the name suggests, is based on Convolution Neural Networks. As the algorithm, like Viola-Jones, has a cascaded structure and can therefore exclude non-face regions at an early stage, the method is suitable for real-time face detection. MTCNN basically consists of 3 different steps:
1. Face Detection
2. Facial Landmark Detection
3. Face Classification

In the following, we will give a brief overview of the steps in MTCNN.

### MTCNN Steps

#### Step 1 - Face Detection

In the first step, the MTCNN recognizes potential candidate faces in the input image. It uses a cascade of convolutional networks to filter out regions that are unlikely to contain a face and focuses on regions with a higher probability of containing a face.
The cascade structure comprises several stages consisting of different CNNs. At each stage, the network limits the number of eligible regions by the result of the CNN.
The end result of this step is a series of bounding boxes that represent the potential face regions in the image.

#### Step 2 - Facial Landmark Detection
Once the potential facial regions are identified, the second step of MTCNN is responsible for locating facial keypoints within each bounding box.
Facial keypoints are specific points on the face, such as the eyes, nose and mouth. These landmarks are critical for tasks such as facial alignment and detection.
The network at this step is designed to regress the coordinates of these facial features for each recognized face.

#### Step 3 - Face Classification

The third step of MTCNN deals with the classification of each bounding box as face or non-face. This step helps to eliminate false positives and improves the accuracy of the overall face recognition system.
A classifier is trained to distinguish between faces and non-faces by extracting features from the candidate regions. 
The result of this step is a refined set of bounding boxes, with the corresponding face keypoints, which are more likely to contain actual faces.

## Practical Application

Our python implementation for MTCNN is using the following Code in the backend:

In [5]:
def highlight_face_mtcnn(img: Image):
    img = numpy.array(img)
    # Disable printing
    with io.StringIO() as dummy_stdout:
        with redirect_stdout(dummy_stdout):
            faces = mtcnn_detector.detect_faces(img)

    confidence = 100

    for face in faces:
        confidence = round(face['confidence'] * 100, 3)
        x, y, w, h = face['box'][0], face['box'][1], face['box'][2], face['box'][3]
        cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 255), 6)

    return Image.fromarray(img), len(faces), confidence

NameError: name 'Image' is not defined

This implementation of MTCNN face detection processes a given PIL Image object. It converts the image to a numpy array. Using a pre-trained mtcnn detector, it detects faces in the image. This pre-trained detector is already provided by the library, so we do not need to train or create our own. Detected faces are outlined with pink rectangles, and are then being returned.

Here is an example output of the algorithm:

![Example of a detected face](images/detected-faces-examples/detected_face_mtcnn.png)

## SSD face detection

TODO!

## Practical Application

Our python implementation for SSD is using the following Code in the backend:

In [None]:
def highlight_face_ssd(img: Image):
    img = cv2.cvtColor(numpy.array(img), cv2.COLOR_RGB2BGR)
    resized_rgb_image = cv2.resize(img, (300, 300))
    imageBlob = cv2.dnn.blobFromImage(image=resized_rgb_image)
    ssd_detector.setInput(imageBlob)
    detections = ssd_detector.forward()

    confidence = 100
    number_of_faces = 0

    # only show detections over 80% certainty
    for row in detections[0][0]:
        if row[2] > 0.80:
            confidence = round(row[2] * 100, 3)
            number_of_faces += 1
            x1, y1, x2, y2 = int(row[3] * img.shape[1]), int(row[4] * img.shape[0]), int(row[5] * img.shape[1]), int(
                row[6] * img.shape[0])
            cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 255), 6)

    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    return Image.fromarray(img_rgb), number_of_faces, confidence

This implementation of SSD face detection processes a given PIL Image object. It converts the image to BGR format and then resized to 300x300 pixels. Using a pre-trained ssd detector, it detects faces in the image. This pre-trained detector is already provided by the library, so we do not need to train or create our own. The detector provides a result for each sliding window position, so we have to sort out unwanted results. Detected faces are outlined with yellow rectangles, and are then being returned.

Here is an example output of the algorithm:

![Example of a detected face](images/detected-faces-examples/detected_face_ssd.png)