# Face Mask Detection System: A Deep Learning Approach

## Project Overview
In the context of global health crises, automated systems for monitoring public health compliance are vital. This project implements a real-time **Face Mask Detection System** using Computer Vision and Deep Learning.

### System Architecture
The pipeline consists of two distinct stages:
1.  **Face Localization**: We use the **Viola-Jones Algorithm** (Haar Cascades) to rapidly identify face regions within an image or video stream. This method is chosen for its high computational efficiency, suitable for CPU-based real-time processing.
2.  **Mask Classification**: Detected face regions are passed to a **MobileNetV2** Convolutional Neural Network (CNN). This model is effectively a binary classifier distinguishing between 'Mask' and 'No Mask' states.

### Key Technical Concepts
-   **Haar-like Features**: Digital image features used in object recognition.
-   **Integral Images**: A data structure allowing specific image features to be computed in constant time $O(1)$.
-   **AdaBoost**: A machine learning meta-algorithm that selects the most critical features from a large set.
-   **Depthwise Separable Convolutions**: A key component of MobileNetV2 that reduces the number of parameters and computation cost compared to standard convolutions.

## 1. Environment Setup & Library Imports

We rely on three core libraries:
-   **OpenCV (`cv2`)**: An open-source computer vision library. We use it for accessing the camera feed, image manipulation (resizing, color conversion), and utilizing the pre-trained Haar Cascades.
-   **TensorFlow/Keras**: The deep learning backend. Keras provides the high-level API to load and run our pre-trained `.h5` model.
-   **NumPy**: The backbone of numerical computing in Python. Images are loaded as multi-dimensional NumPy arrays (Height x Width x Channels), allowing us to perform fast matrix operations.

In [1]:
import cv2
from tensorflow.keras.models import load_model
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
import numpy as np

## 2. Loading and Visualizing Test Data

Before running the full video pipeline, we load a static sample image (`masked.jpeg`) to verify our data loading and basic processing.

### Understanding Image Data
-   **Color Spaces**: OpenCV loads images in **BGR** (Blue-Green-Red) format by default, distinct from the standard **RGB** often used in other libraries. This is why you often see `cv2.cvtColor(img, cv2.COLOR_BGR2RGB)` in visualization code.
-   **Array Structure**: The loaded image is a NumPy array. Its shape `(Height, Width, 3)` tells us the resolution and the 3 color channels.

In [17]:
img = cv2.imread("masked.jpeg")

In [23]:
img

array([[[ 30,  26,  15],
        [ 32,  28,  17],
        [ 31,  28,  20],
        ...,
        [ 54,  52,  42],
        [ 55,  53,  43],
        [ 55,  53,  43]],

       [[ 30,  26,  15],
        [ 30,  28,  17],
        [ 31,  29,  19],
        ...,
        [ 54,  52,  42],
        [ 55,  53,  43],
        [ 55,  53,  43]],

       [[ 28,  27,  13],
        [ 30,  28,  17],
        [ 31,  29,  19],
        ...,
        [ 54,  52,  42],
        [ 55,  53,  43],
        [ 55,  53,  43]],

       ...,

       [[ 32,  32,  20],
        [ 25,  25,  13],
        [ 29,  28,  18],
        ...,
        [ 79, 146, 209],
        [ 82, 146, 210],
        [ 82, 146, 210]],

       [[ 39,  39,  27],
        [ 27,  27,  15],
        [ 28,  27,  17],
        ...,
        [ 82, 149, 212],
        [ 84, 148, 212],
        [ 85, 149, 213]],

       [[ 46,  46,  34],
        [ 30,  30,  18],
        [ 26,  25,  15],
        ...,
        [ 84, 151, 214],
        [ 86, 150, 214],
        [ 87, 151, 215]]

## 3. Face Detection: The Viola-Jones Algorithm

We use a **Haar Cascade Classifier** for face detection. This is a machine learning-based approach where a cascade function is trained from a lot of positive and negative images.

### How it Works (The Math)
1.  **Haar Features**: The algorithm looks for specific features (like the bridge of the nose being lighter than the eyes). These are effectively contrast checks calculated on rectangular regions.
2.  **Integral Image**: To calculate these features rapidly (milliseconds), the algorithm uses an 'Integral Image' representation, which allows calculating the sum of pixels in any rectangle using just 4 array lookups.
3.  **Cascading**: Instead of running a complex detector on every window, it runs a series of simple ones. If the first simple check fails (e.g., "Is there something looking like an eye pair?"), the region is discarded immediately. This **Attentional Cascade** allows processing the majority of the image (background) extremely quickly.

In [18]:
face_model = cv2.CascadeClassifier("haarcascade_frontalface_alt2.xml")

In [24]:
faces = face_model.detectMultiScale(img,minNeighbors = 2)
print(faces)
for (x,y,w,h) in faces:
    # cv2.rectangle(img,(x,y),(x+w,y+h),(0,0,225),3)
    print("hai")
    face_region = img[y:y+h, x:x+w]


[[100  44  65  65]]
hai


In [28]:
cv2.imshow("Face", face_region)
cv2.waitKey(1)

-1

In [29]:
cv2.destroyAllWindows()

## 4. The Classification Model

We load our pre-trained model `mask_recog.h5`. 

###  Architecture Highlights
-   **Goal**: Efficient inference on mobile/embedded devices.
-   **Depthwise Separable Convolutions**: It splits a standard convolution into two separate layers:
    1.  **Depthwise Convolution**: Applies a single filter per input channel.
    2.  **Pointwise Convolution**: A 1x1 convolution to create a linear combination of the output of the depthwise layer.
-   **Impact**: This drastically reduces the number of parameters and computation (FLOPs) with minimal loss in accuracy.

In [30]:
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_alt2.xml')
model = load_model('mask_recog.h5')



## 5. Building the Inference Function

The `face_mask_detector` function encapsulates the entire pipeline.

### Step-by-Step Logic
1.  **Grayscale**: We convert the frame to grayscale for the Haar Cascade (faces are structural, color often doesn't matter for detection).
2.  **Detection**: `detectMultiScale` finds faces. `minNeighbors` parameter handles Quality vs. Quantity.
3.  **ROI Extraction**: We slice the array `frame[y:y+h, x:x+w]` to get the face.
4.  **Preprocessing**: The model expects specific input:
    -   **RGB**: Convert BGR to RGB.
    -   **Resize**: Resize to `(224, 224)` pixels.
    -   **Batching**: Reshape to `(1, 224, 224, 3)`.
    -   **Preprocessing**: `preprocess_input` scales pixel values (usually -1 to 1) to match training conditions.
5.  **Inference**: `model.predict()` returns mask probabilities.
6.  **Annotation**: We draw the box and label based on the prediction.

In [36]:
def face_mask_detector(frame):
    gray=cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)
    faces=face_cascade.detectMultiScale(gray,minNeighbors=2)
    for (x,y,w,h) in faces:
        face_frame=frame[y:y+h,x:x+w]
        face_frame=cv2.cvtColor(face_frame,cv2.COLOR_BGR2RGB)
        face_frame=cv2.resize(face_frame,(224,224))
        print(face_frame.shape)
        face_frame=face_frame.reshape(1,224,224,3)
        print(face_frame.shape)
        face_frame=preprocess_input(face_frame)
        pred=model.predict(face_frame)[0]
        mask,without_mask=pred
        label='Mask' if mask>without_mask else 'No Mask'
        cv2.putText(frame,label,(x,y),cv2.FONT_HERSHEY_SIMPLEX,1,(0,255,0),2)
        cv2.rectangle(frame,(x,y),(x+w,y+h),(0,255,0),2)
    return frame

## 6. Verification on Static Images

Before capturing video, we verify the function on a single static image (`unmasked.jpeg`).
We expect to see the output image with a bounding box labeled "No Mask".

In [33]:
input_image = cv2.imread('unmasked.jpeg')
output=face_mask_detector(input_image)
cv2.imshow("OUTPUT",output)
cv2.waitKey(0)
cv2.destroyAllWindows()

(224, 224, 3)
(1, 224, 224, 3)
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1s/step


## 7. Real-Time Video Processing Loop

Finally, we connect to the webcam (`cv2.VideoCapture(0)`) and run the pipeline in a loop.

### The Loop
1.  **Capture**: Read a frame from the camera.
2.  **Process**: Pass it to `face_mask_detector`.
3.  **Display**: Show the annotated frame using `cv2.imshow`.
4.  **Exit Condition**: We listen for the 'q' key to break the loop and release resources (`release()`, `destroyAllWindows()`).

In [38]:
video = cv2.VideoCapture(0)
while True :
    suc,img = video.read()
    img = cv2.flip(img, 1)

    
    img1 = face_mask_detector(img)
       
    cv2.imshow("web Cam",img1)
    if cv2.waitKey(1) & 0XFF == ord("q"):
        break

video.release()
cv2.destroyAllWindows()

        


(224, 224, 3)
(1, 224, 224, 3)
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 46ms/step
(224, 224, 3)
(1, 224, 224, 3)
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 77ms/step
(224, 224, 3)
(1, 224, 224, 3)
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step
(224, 224, 3)
(1, 224, 224, 3)
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 63ms/step
(224, 224, 3)
(1, 224, 224, 3)
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step
(224, 224, 3)
(1, 224, 224, 3)
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 55ms/step
(224, 224, 3)
(1, 224, 224, 3)
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 50ms/step
(224, 224, 3)
(1, 224, 224, 3)
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 50ms/step
(224, 224, 3)
(1, 224, 224, 3)
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 47ms/step
(224, 224, 3)
(1, 224, 224, 3)
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m