# Face Detection

In this section, the state-of-the-art face detection algorithms will be introduced. Afterwards, the filters that prevent the algorithms from detecting faces, the filters that increase the false positive rate of face detection, and artitstic filters will be investigated.

## Face Detection Algorithms

### Viola Jones

The Viola-Jones face detection algorithm is a widely used and efficient method for detecting faces in images.
It was proposed by Paul Viola and Michael Jones in their 2001 paper, "Rapid Object Detection using a Boosted Cascade of Simple Features."

The Viola-Jones algorithm employs a machine learning approach, specifically a variant of the AdaBoost algorithm, to train a cascade of classifiers for face detection. The training process involves selecting a set of Haar-like features, which are simple rectangular patterns that can be computed quickly. These features capture local intensity variations in the image.

In the following, we will give a brief overview of the steps in Viola-Jones.

#### Step 1: Selecting Haar-like features

Haar-like features are essential building blocks in the Viola-Jones face detection algorithm,
capturing distinctive patterns in faces. These features are rectangular and can take various forms,
such as edges, lines, or rectangles with different orientations.

For example, a Haar-like feature might capture the contrast between the eyes and the nose. The choice
of these features is crucial as they serve as the basis for distinguishing between positive (faces) and
negative (non-faces) examples during the training phase.

Here's a simple example image illustrating a Haar-like feature capturing the vertical contrast
between the left and right sides of a face:

![Haar-like Feature Example](images/haar-like-features.png)

#### Step 2 - Creating an integral image

To efficiently compute Haar-like features, the Viola-Jones algorithm uses an integral image. The integral
image is a transformed version of the original image, where each pixel represents the cumulative sum of
all pixels above and to the left of it.

![Integral Image Example](images/integral-image.png)

The integral image enables rapid calculation of the sum of pixel values within any rectangular region,
which is essential for evaluating Haar-like features in constant time.

#### Step 3 - Running AdaBoost training

AdaBoost is a machine learning algorithm employed by the Viola-Jones face detection method to create
a robust and accurate classifier. In this context, the weak classifiers are decision stumps based on
Haar-like features.

The AdaBoost training process involves iteratively selecting the best weak classifiers while assigning
higher weights to misclassified examples from the previous iteration. This iterative process continues
until a predefined number of weak classifiers are trained.

Consider an example image dataset with positive examples (faces) and negative examples (non-faces).
During AdaBoost training, the algorithm learns to focus on the features that effectively discriminate
between the two classes, building a strong classifier that is adept at face detection.

#### Step 4 - Creating classifier cascades

The trained AdaBoost classifier is organized into a cascade of stages in the Viola-Jones algorithm.
Each stage consists of multiple weak classifiers applied sequentially. The cascade structure allows
for the rapid rejection of non-face regions, contributing to the algorithm's efficiency.

![Classifier Cascade Example](images/cascade-classifier.png)

The cascade of classifiers is constructed in such a way that a region of the image must pass all
the classifiers in a stage to be considered a potential face region. If at any stage a region fails
to pass a classifier, it is promptly rejected, saving computational resources. This cascade structure
enhances the Viola-Jones algorithm's speed, making it well-suited for real-time face detection applications.

#### Practical Applications
Our python implementation for Viola-Jones is using the following Code in the backend:


In [9]:
def highlight_face_viola_jones(img: Image):
    img = cv2.cvtColor(numpy.array(img), cv2.COLOR_RGB2BGR)
    gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    faces = viola_jones_detector.detectMultiScale(gray_image, scaleFactor=1.1, minNeighbors=5, minSize=(40, 40))

    for (x, y, w, h) in faces:
        cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 0), 6)

    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    return Image.fromarray(img_rgb), len(faces), '?'

NameError: name 'Image' is not defined

This implementation of Viola-Jones face detection processes a given PIL Image object. It converts the image to BGR format, then to grayscale. Using a pre-trained Haar cascade classifier for frontal faces, it detects faces in the grayscale image. This pre-trained classifier is already provided by the library, so we do not need to train or create our own. Detected faces are outlined with blue rectangles, and the modified image is converted back to RGB format before being returned. The algorithm provides a visual representation of the input image with highlighted face regions.

Here is an example output of the algorithm:

![Example of a detected face](images/detected-faces-examples/detected_face_viola_jones.png)

### HOG and SVM
Another approach to detect faces is using a Histogram of Oriented Gradients (HOG) in combination with a support vector machine as classifier.
HOG is a feature descriptor and is commonly used in image processing that was published by Dalal and Triggs. The algorithm typically consists of the following steps:
1. Image Preprocessing
2. Calculate Gradient
3. Create Histogram of Oriented Gradients
4. Normalise Histogram Vectors

In the following, we will give a brief overview of the steps in HOG.

#### Step 1 - Image Preprocessing

HOG (Histogram of Oriented Gradients) is only used on the part of the image that is relevant for examining a particular subject (a face in our case). For this purpose, it is necessary to first crop this part of the image. For the subsequent calculation of gradients in blocks, the cropped image is resized to a width-to-height ratio of 1:2. In the publication by Dalal and Triggs, 64:128 was chosen because it provided enough information for pedestrian recognition, which was the primary focus of the publication. This image is then divided into blocks of size 8x8, as features are extracted from blocks of pixels rather than individual pixels. Graphically, one can envision this as an 8x16 grid of 8x8 blocks drawn on the image.

#### Step 2 - Calculate Gradient

Since edges represent the boundaries between regions in the image with a significant change in intensity, they are essential to determine the contours of an image. The contours of an image often suffice to classify the objects in the image (a face in our case). To determine the edges, the gradient vector is used because it indicates the direction of the greatest local change in intensity, and its magnitude represents the extent of the change. The gradient vector of a 2-dimensional image is mathematically the partial derivative in the x and y directions. Since the colors of images in computers are represented by discrete color values and are not continuous as in reality, the change in the x and y directions is calculated as follows:
Let $I$ be a function that takes as input the x and y positions of a pixel in the image and outputs the intensity (between 0 and 255). Then, the partial derivative in the x-direction and hence the gradient component $G_x$ is calculated as follows: $G_x = I(x+1, y) - I(x-1, y)$. Similarly, the partial derivative in the y-direction and hence the gradient component $G_y$ is calculated as follows: $G_y = I(x, y+1) - I(x, y-1)$. Thus, the changes in intensity are calculated by considering the horizontal and vertical neighbors of a pixel.
Consider the following image as an example for the pixel with intensity 60, where only 4 out of the 64 values of the 8x8 block are displayed:

![Alt text](images/Hog8x8Grid.png)

For the pixel 60 on the image, the gradient in x-direction will be:
$$G_x = I(x+1, y) - I(x-1, y) = 70 - 40 = 30$$

and for the y-direction:
$$G_y = I(x, y+1) - I(x, y-1) = 70 - 20 = 50$$

Using the gradient in x and y direction, the magnitude and direction of the gradient vector will be calculated using:
$$\text{magnitude} = \sqrt{G_x^2 + G_y^2}$$
$$\text{direction} = \arctan\left(\frac{G_y}{G_x}\right)$$

Here, it should be noted that arctan has a range of values from -90 to 90 degrees, which does not cover a full circle of 360 degrees. In practice, the function arctan2 is often used, which has a range of values from -180 degrees to 180 degrees, thus allowing a bijective mapping to 0-360 degrees. For the example, the calculation looks as follows:
$$\sqrt{G_x^2 + G_y^2} = \sqrt{30^2 + 50^2} \approx 58.31 $$

and the direction would be:
$$\arctan\left(\frac{50}{30}\right) \approx 59.04 \degree $$

This calculation is performed for each pixel in the 8x8 grid, resulting in an 8x8 matrix for the magnitude and an 8x8 matrix for the direction of the gradient vectors. The border is a special case that needs to be addressed (i.e. by using padding). If the image has colors, the calculation is performed for each color channel of a pixel, and the gradient vector with the greatest magnitude is selected from the color channels. The direction of the selected vector is then assigned to the 8x8 direction matrix, and the magnitude of the selected vector is assigned to the 8x8 magnitude matrix for this pixel.

#### Step 3 - Create Histogram of Oriented Gradients

The next step involves creating histograms from the 8x8 matrices of magnitude and direction for all 8x8 blocks obtained in Step 2.

![Alt text](images/HOGDia.png) 

On the x-axis are the various directions of the gradient vectors of the respective pixels within an 8x8 block, and on the y-axis is the sum of the magnitude of the gradient vectors for each direction. Usually, only directions between 0-180 degrees are considered, and anything beyond is reduced to this interval due to the symmetry of the gradient. The symmetry of the gradient implies that a strong change in intensity within the range of 180-360 degrees only differs in sign from a strong change in intensity within the range of 0-180 degrees. This means that angles greater than 180 degrees can be brought into the interval between 0-180 degrees by subtracting 180 degrees beforehand without losing important information. The 180 degrees are divided into 9 different bins (0, 20, 40, 60, 80, 100, 120, 140, 160) on the x-axis, and the calculation of the magnitude for these bins is as follows:

Case 1) Precise Allocation Possible
If precise allocation into a bin is possible (e.g., if a pixel has a magnitude of 50 and a direction of 20 degrees), then 50 is added to the sum of the 20-degree bin.

Case 2) Precise Allocation Not Possible
If precise allocation into a bin is not possible (e.g., if a pixel has a magnitude of 50 and a direction of 30 degrees), then the proximity of the pixel to the classes between which it lies (here 20 and 40) is taken as a weight (here, $\frac{1}{2}$ each, as 30 is exactly between 20 and 40). The weight is multiplied by the magnitude and added to the sum of the respective bin. In this example, $\frac{1}{2} \cdot 50 = 25$ is added to the sum of both the 20-degree and 40-degree bins.

Case 3) Angle Between 160 and 180 Degrees:
In this case, everything operates similarly to Case 2), with the difference that even though the proximity of the pixel between the classes of 160 and 180 is calculated, the result of $\text{weight for 180} \cdot \text{magnitude of the vector}$ is added to the sum of the bin in class 0 due to symmetry. However, the result of $\text{weight for 160} \cdot \text{magnitude of the vector}$ is added to the sum of the bin in class 160, similar to Case 2).

When performing this calculation for each pixel of the 8x8 block, the resulting output is the histogram. This histogram can be transformed into a 9x1 vector containing the weighted sum of magnitudes as entries. For an image with dimensions of 64x128, divided into an 8x16 grid of 8x8 blocks, there would then be $8 \cdot 16 = 128$ such 9x1 vectors.

#### Step 4 - Normalise Histogram Vectors

The gradient of an image is sensitive to the overall illumination of the image. When darkening the image (e.g., by halving the intensity values), the length of the gradient vector also halves, resulting in the values in the histogram being halved as well. However, a face should not have different features with half the intensity, which is why the vector needs to be normalized. For normalization, Dalal and Triggs tested various methods. A typical method frequently used for HOG nowadays constructs a 16x16 block from four 8x8 blocks and combines the information into a 36x1 vector (four 9x1 vectors). This vector with 36 entries ($v_1$ to $v_{36}$) is normalized using the L2-norm:

$$\text{magnitude} = \sqrt{v_1^2 + v_2^2 + ..... + v_{36}^2}$$
$$\text{normalised vector} = [\frac{v_1}{\text{magnitude}}, \frac{v_2}{\text{magnitude}}, ....., \frac{v_{36}}{\text{magnitude}}]$$

To extract information from the entire image with dimensions of 64x128, divided into an 8x16 grid of 8x8 blocks, the 16x16 block is first placed at the top left of the image. Then, the block is moved from left to right with a step size of 1 through the entire row of the image. Once a row is completed, the process continues with the next row, iterating until the block traverses the entire image (similar to a sliding window). The block can be shifted a total of 7 times per row and 15 times downwards, resulting in performing $7 \cdot 15$ computations that yield a 36x1 vector as a result. Thus, a total of $7 \cdot 15 \cdot 36 \cdot 1 = 3780$ different entries are obtained, which are transformed into a 3780x1 vector and then passed on to a classifier (e.g., a Support Vector Machine (SVM)). Before passing it to an SVM, this vector probably has to be reduced (to prevent overfitting) using for instance PCA (Principal Component Analysis). However, this will not be explained in this article.

#### SVM with HOG features

The resulting vector from the HOG algorithm, that was potentially reduced using PCA, is often fed to an SVM. The SVM tries to find a hyperplane that best separates the datapoints of different classes in a high-dimensional space. On a basic level, the datapoints can be classified into images that contain a face (positive samples), and images that don't contain a face (negative samples). The HOG features extracted from negative and positive samples can then be used to train the SVM so that it learns to distinguish between images that contain faces and ones that don't. Additionally, a trained SVM can be used as a sliding window that analyses a small part of a predefined size of the image to determine whether this part contains a face or not. This allows to not only classify images with faces correctly, but also to detect faces on the image.

#### Practical Application

Our python implementation for Hog-SVM is using the following Code in the backend:

In [None]:
def highlight_face_hog_svm(img: Image):
    img = cv2.cvtColor(numpy.array(img), cv2.COLOR_RGB2BGR)
    gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    faces = hog_svm_detector(gray_image)

    for face in faces:
        x, y, w, h = face.left(), face.top(), face.width(), face.height()
        cv2.rectangle(img, (x, y), (x + w, y + h), (255, 255, 0), 6)

    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    return Image.fromarray(img_rgb), len(faces), '?'

This implementation of Hog-SVM face detection processes a given PIL Image object. It converts the image to BGR format, then to grayscale. Using a pre-trained hog_svm_detector, it detects faces in the grayscale image. This pre-trained detector is already provided by the library, so we do not need to train or create our own. Detected faces are outlined with turquoise rectangles, and the modified image is converted back to RGB format before being returned. 

Here is an example output of the algorithm:

![Example of a detected face](images/detected-faces-examples/detected_face_hog_svm.png)

### MTCNN

A more recently used approach is the MTCNN face detection method. MTCNN stands for Multi-task Cascaded Convolutional Networks and, as the name suggests, is based on Convolution Neural Networks. As the algorithm, like Viola-Jones, has a cascaded structure and can therefore exclude non-face regions at an early stage, the method is suitable for real-time face detection. MTCNN basically consists of 3 different steps:
1. Face Detection
2. Facial Landmark Detection
3. Face Classification

In the following, we will give a brief overview of the steps in MTCNN.

#### Step 1 - Face Detection

In the first step, the MTCNN recognizes potential candidate faces in the input image. It uses a cascade of convolutional networks to filter out regions that are unlikely to contain a face and focuses on regions with a higher probability of containing a face.
The cascade structure comprises several stages consisting of different CNNs. At each stage, the network limits the number of eligible regions by the result of the CNN.
The end result of this step is a series of bounding boxes that represent the potential face regions in the image.

#### Step 2 - Facial Landmark Detection
Once the potential facial regions are identified, the second step of MTCNN is responsible for locating facial keypoints within each bounding box.
Facial keypoints are specific points on the face, such as the eyes, nose and mouth. These landmarks are critical for tasks such as facial alignment and detection.
The network at this step is designed to regress the coordinates of these facial features for each recognized face.

#### Step 3 - Face Classification

The third step of MTCNN deals with the classification of each bounding box as face or non-face. This step helps to eliminate false positives and improves the accuracy of the overall face recognition system.
A classifier is trained to distinguish between faces and non-faces by extracting features from the candidate regions. 
The result of this step is a refined set of bounding boxes, with the corresponding face keypoints, which are more likely to contain actual faces.

#### Practical Application

Our python implementation for MTCNN is using the following Code in the backend:

In [None]:
def highlight_face_mtcnn(img: Image):
    img = numpy.array(img)
    # Disable printing
    with io.StringIO() as dummy_stdout:
        with redirect_stdout(dummy_stdout):
            faces = mtcnn_detector.detect_faces(img)

    confidence = 100

    for face in faces:
        confidence = round(face['confidence'] * 100, 3)
        x, y, w, h = face['box'][0], face['box'][1], face['box'][2], face['box'][3]
        cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 255), 6)

    return Image.fromarray(img), len(faces), confidence

This implementation of MTCNN face detection processes a given PIL Image object. It converts the image to a numpy array. Using a pre-trained mtcnn detector, it detects faces in the image. This pre-trained detector is already provided by the library, so we do not need to train or create our own. Detected faces are outlined with pink rectangles, and are then being returned.

Here is an example output of the algorithm:

![Example of a detected face](images/detected-faces-examples/detected_face_mtcnn.png)

### SSD

Another approach that has been used more recently is the SSD face detection method. SSD stands for Single-Shot Multibox Detector. It is actually an approach for object recognition, but can also be used for face detection. Just like MTCNN, SSD is based on a CNN that is used for feature extraction. The ability to perform all steps of face detection in a single pass makes this method suitable for real-time face detection. The result of SSD is multiple bounding boxes with potential faces that need to be evaluated against a confidence threshold.

#### Practical Application

Our python implementation for SSD is using the following Code in the backend:

In [None]:
def highlight_face_ssd(img: Image):
    img = cv2.cvtColor(numpy.array(img), cv2.COLOR_RGB2BGR)
    resized_rgb_image = cv2.resize(img, (300, 300))
    imageBlob = cv2.dnn.blobFromImage(image=resized_rgb_image)
    ssd_detector.setInput(imageBlob)
    detections = ssd_detector.forward()

    confidence = 100
    number_of_faces = 0

    # only show detections over 80% certainty
    for row in detections[0][0]:
        if row[2] > 0.80:
            confidence = round(row[2] * 100, 3)
            number_of_faces += 1
            x1, y1, x2, y2 = int(row[3] * img.shape[1]), int(row[4] * img.shape[0]), int(row[5] * img.shape[1]), int(
                row[6] * img.shape[0])
            cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 255), 6)

    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    return Image.fromarray(img_rgb), number_of_faces, confidence

This implementation of SSD face detection processes a given PIL Image object. It converts the image to BGR format and then resized to 300x300 pixels. Using a pre-trained ssd detector, it detects faces in the image. This pre-trained detector is already provided by the library, so we do not need to train or create our own. The detector provides a result for each box position, so we have to sort out unwanted results. Detected faces are outlined with yellow rectangles, and are then being returned.

Here is an example output of the algorithm:

![Example of a detected face](images/detected-faces-examples/detected_face_ssd.png)

## Filter

This part of the documentation will introduce the filters and investigate how precise filters can prevent HOG and SVM from detecting faces successfully. HOG and SVM was chosen because we currently determine the keypoints of the faces with this approach. The images, which were used during this section, stem from the Labeled Faces in the Wild (LFW) dataset. The dataset contains more than 13,000 images of 5,749 people. However, we focused exclusively on individuals for whom a minimum of 100 facial images were available, leading to 1140 final images. Without any modifications, the algorithm detects 1099/1140 faces.

### Cow Face

This filter applies a cow pattern overlay to the facial area of the image. It calculates the bounding box for the face using facial keypoints and resizes the cow pattern to fit this area. The pattern is then applied over the face with a specified level of transparency (alpha_of_cow_pattern), creating a cow-patterned effect on the facial area.

In [None]:
def apply_cow_pattern(image: Image, keypoints, alpha_of_cow_pattern: int = 85) -> Image:
    foreground = Image.open('../backend/filters/cow_pattern.png').convert("RGBA")
    foreground_parts = Image.new('RGBA', image.size)
    for (box, face_keypoints, face_shape_landmarks, _) in keypoints:
        (minX, maxX), (minY, maxY), (width, height) = calculate_face_shape_landmarks_box_positions(face_shape_landmarks)
        new_foreground_part = foreground.resize((width, height), resample=Image.LANCZOS)
        foreground_parts.paste(new_foreground_part, (minX, minY), new_foreground_part)
    foreground_parts.putalpha(alpha_of_cow_pattern)
    image = apply_filter_on_faces(image, keypoints, foreground_parts)
    return image

The idea behind the Cow Face Filter is that it falsifies the magnitude and direction of the gradient vectors in the face, which HOG uses to extract features. The pattern has strong intensity changes (black and white) that will create gradient vectors with a significantly bigger magnitude than the vectors of the original face. We used different alpha values for the pattern, to see how it affects the detection. An example with an alpha value of 45 can be seen below:

![Cow Mask with Alpha of 45](images/CowMaskwithAlphaof45.png)

Using the LFW dataset with different alpha values yields the following result:

![Cow Mask Modification](images/CowMaskModification.png)

This approach does work for preventing face detection, although it also significantly alters the facial features.

### Salt and Pepper Filter

This filter generates a 'salt and pepper' noise effect and applies it over the facial area. The noise is created by randomly assigning black and white pixels in equal proportions and then resizing this noise pattern to fit the face. The pattern is applied with a specified alpha value (alpha_of_salt_n_pepper), overlaying the face with this distinctive noise effect.

In [None]:
def apply_salt_n_pepper(image: Image, keypoints, alpha_of_salt_n_pepper: int = 90) -> Image:
    foreground_parts = Image.new('RGBA', image.size)
    for (box, face_keypoints, face_shape_landmarks, _) in keypoints:
        (minX, maxX), (minY, maxY), (width, height) = calculate_face_shape_landmarks_box_positions(face_shape_landmarks)
        pixels = np.zeros(width * height, dtype=np.uint8)
        pixels[:width * height // 2] = 255  # Set first half to white (value 255)
        np.random.shuffle(pixels)
        rgb_box = np.stack((pixels, pixels, pixels), axis=-1)
        rgb_box_reshaped = np.reshape(rgb_box, (height, width, 3))
        rgb_box_image = Image.fromarray(rgb_box_reshaped)
        rgb_box_image.putalpha(255)
        foreground_parts.paste(rgb_box_image, (minX, minY), rgb_box_image)
    foreground_parts.putalpha(alpha_of_salt_n_pepper)
    image = apply_filter_on_faces(image, keypoints, foreground_parts)
    return image

The Salt and Pepper Filter, which has an additional alpha value for the transparency of the salt and pepper pattern, was based on the same idea as the Cow Face Filter. An example with an alpha value of 45 can be seen below:

![Salt and Pepper with Alpha of 45](images/SaltandPepperwithalphaof45.png)

Using the LFW dataset with different alpha values yields the following result:

![Salt and Pepper Modification](images/SaltandPepperModification.png)

Since the salt and pepper is very similar to the cow pattern, the resulting diagram is almost identical.

### Pixelate

This filter creates a pixelation effect by initially reducing the image's resolution and then scaling it back to its original size. The initial downscaling reduces detail, creating larger 'blocks' of color, and the subsequent upscaling maintains this blocky appearance. The degree of pixelation is dictated by the pixel_size parameter, with larger values producing more pronounced pixelation. The effect can be applied to the whole image or focused on a specific area, like the face, as determined by keypoints.

In [None]:
def apply_pixelate(image: Image, keypoints, only_face=True, pixel_size=10) -> Image:
    small = image.resize((image.size[0] // pixel_size, image.size[1] // pixel_size), Image.NEAREST)
    modified_image = small.resize(image.size, Image.NEAREST)
    if only_face:
        return swap_images_at_face_position(image, keypoints, modified_image)
    else:
        return modified_image

The Pixelate filter pixelates the image. On the website, it can be applied only to the face or to the entire image. Since the LFW dataset only contains faces, the filter was applied to the entire image. An example with a pixel size of 2 can be seen below:

![Pixelate with pixel size of 2](images/Pixelate.png)

Using the LFW dataset with different pixel size values yields the following result:

![Pixelate Modification](images/PixelateModification.png)

For low pixel size values, the number of detected faces is high and the face is still recognizable. With increasing pixel size values, the face is barely recognizable and the number of detected faces is declining.


### Blur

The function takes a NumPy array representing an image as input. It converts the array to a PIL Image, applies a Box Blur filter with a radius of 10 to the image using the filter() method, and then converts the modified PIL Image back to a NumPy array before returning the result.

In [10]:
def apply_blur(image: np.ndarray):
    pilImage = Image.fromarray(image)
    pilImage = pilImage.filter(ImageFilter.BoxBlur(10))
    return np.array(pilImage)

NameError: name 'np' is not defined

The Blur Filter applies a box blur effect to the image. Similarly to the Pixelate filter, it can be applied only to the face or to the entire image. Since the LFW dataset only contains faces, the filter was applied to the entire image. An example with a radius of 1 can be seen below:

![Box Blur with radius of 1](images/BoxBlur.png)

Using the LFW dataset with a radius of 1 yields the following result:

![Box Blur Modification](images/BoxBlurModification.png)

The algorithm still detects 1090/1140 faces after the modification.

### Sunglasses

This filter overlays sunglasses onto the face in an image. It identifies the positions of the left and right eyes using facial keypoints and calculates the angle between them to rotate the sunglasses image accordingly. The distance between the eyes is used to dynamically scale the sunglasses' size, ensuring they fit the face proportionally. The sunglasses image is resized and rotated before being superimposed onto the face, creating the appearance of the subject wearing sunglasses.

In [None]:
def apply_sunglasses(image: Image, keypoints, scale_factor: float = 2.5) -> Image:
    foreground = Image.open('filters/sunglasses.png')
    for (box, face_keypoints, face_shape_landmarks, _) in keypoints:
        left_eye = face_keypoints['left_eye']
        right_eye = face_keypoints['right_eye']
        dx = right_eye[0] - left_eye[0]
        dy = right_eye[1] - left_eye[1]
        angle_radians = math.atan2(-dy, dx)
        angle_degrees = math.degrees(angle_radians)
        eye_distance = math.dist((right_eye[0], right_eye[1]), (left_eye[0], left_eye[1]))
        foreground_width_to_height_ratio = foreground.size[0] / foreground.size[1]
        foreground = foreground.resize(size=(
            int(scale_factor * eye_distance), int(scale_factor * eye_distance / foreground_width_to_height_ratio)))
        rotated_overlay = foreground.rotate(angle_degrees, expand=True)
        left_part = (scale_factor - 1) / 2
        left_upper_sunglasses = (int(left_eye[0] - eye_distance * left_part),
                                 int(left_eye[1] - eye_distance * left_part / foreground_width_to_height_ratio))
        left_upper_paste = (left_upper_sunglasses[0], int(left_upper_sunglasses[1] - math.fabs(
            math.cos(math.radians(90 - angle_degrees)) * scale_factor * eye_distance)))
        image.paste(rotated_overlay, left_upper_paste, rotated_overlay)
    return image

A more artistic approach, is the application of sunglasses on the eyes of a given face:

![Apply Sunglasses](images/SunglassesonFace.png)

Using the LFW dataset with different alpha values yields the following result:

![Apply Sunglasses on Face](images/SunglassesonFaceModification.png)

Since this approach does not nearly cover enough area of the face and is rather artistic, the result is not surprising to us. 1000 of the 1140 faces were successfully discovered after the modification.

### Medicine Mask

This filter overlays a medical-style mask image onto the face. It locates the position of the mouth and nose using facial keypoints and adjusts the size and rotation of the mask image to align with these features. The mask is placed to cover the lower half of the face, resembling the appearance of wearing a medical mask.

In [None]:
def apply_medicine_mask(image: Image, keypoints) -> Image:
    foreground = Image.open('filters/medicine_mask.png').convert("RGBA")
    for (box, face_keypoints, face_shape_landmarks, _) in keypoints:
        left_mouth = face_keypoints['left_mouth']
        right_mouth = face_keypoints['right_mouth']
        dx = right_mouth[0] - left_mouth[0]
        dy = right_mouth[1] - left_mouth[1]
        angle_radians = math.atan2(-dy, dx)
        angle_degrees = math.degrees(angle_radians)
        face_width = box[2]
        foreground_width_to_height_ratio = foreground.size[0] / foreground.size[1]
        foreground = foreground.resize(size=(face_width, int(face_width / foreground_width_to_height_ratio)))
        rotated_overlay = foreground.rotate(angle_degrees, expand=True)
        left_upper_face_mask = (box[0], face_keypoints['nose'][1])
        left_upper_paste = (left_upper_face_mask[0], int(left_upper_face_mask[1] - math.fabs(
            math.cos(math.radians(90 - angle_degrees)) * face_width)))
        image.paste(rotated_overlay, left_upper_paste, rotated_overlay)
    return image

Another artistic approach is the application of a medicine mask to the face:

![Medicine Mask Example](images/MedicineMaskExample.png)

Using the LFW dataset with different alpha values yields the following result:

![Medicine Mask Example](images/MedicineMaskModification.png)

This approach covers more area of the face. It leads to a considerable change (120 out of 1140 were detected) in comparison to the Sunglasses approach.

### Hiding Faces

This filter randomly places multiple small face mask images over the image. It ensures that these masks do not overlap with the facial areas identified by the keypoints. Each mask is resized according to specified dimensions (`face_mask_width` and `face_mask_height`) and applied with a certain level of transparency (`alpha_of_masks`). This creates a scattered mask effect across the image, avoiding the actual facial areas.

In [None]:
def apply_hide_with_masks(img: Image, keypoints, number_of_masks: int = 40,
                          face_mask_width: int = 75, face_mask_height: int = 75,
                          alpha_of_masks: int = 45) -> Image:
    foreground = Image.open('../backend/filters/whole_face_mask.png').convert('RGBA')
    foreground_alpha = apply_alpha_to_transparent_image(foreground, alpha_of_masks)
    face_and_mask_coordinates = find_face_rectangles_mtcnn(keypoints)
    mask_cords = find_free_coordinates_outside_of_rectangles(img, number_of_masks, face_mask_width, face_mask_height,
                                                             face_and_mask_coordinates)
    for mask_coords in mask_cords:
        resized_foreground = foreground_alpha.resize((face_mask_width, face_mask_height), resample=Image.LANCZOS)
        img.paste(resized_foreground, (mask_coords[0], mask_coords[1]), resized_foreground)

    return img

The goal of this approach is to add additional artifacts to the image, that should erroneously be classified as face by classifiers. An example would be adding face masks to the image and making them barely visible with a low alpha value:

![Hide with Mask Example](images/HideWithMaskExample.png)

This approach only slightly modifies the face (for low alpha values), and it increases the false positives of the classifiers.