> # **Facial Landmark Detection project**

# **Dataset link:-**
https://www.kaggle.com/code/janvichokshi/facial-landmark-detection-tensorflow/input

**Project Outline**:

Facial Keypoint Detection from Video Frames
1. Import Required Libraries
Import essential libraries for:

Data Handling: NumPy, Pandas

Visualization: Matplotlib

Image Processing: OpenCV

Deep Learning: TensorFlow / Keras

2. Load and Prepare Dataset
Load metadata from a CSV file containing video IDs and facial keypoint annotations.

Use glob to locate and list all .npz video files.

Map video IDs to file paths and filter out entries with missing video data.

3. Data Preprocessing
Extract the first frame from each video and resize it to a uniform resolution (e.g., 90x90 pixels).

Extract corresponding 2D facial keypoints and scale them to match the resized images.

Convert both image and keypoint data into NumPy arrays suitable for model training.

4. Data Visualization
Display sample frames with facial landmarks overlaid to verify keypoint alignment.

Compare image-keypoint samples before and after normalization for sanity check.

5. Model Architecture Design
Build a Convolutional Neural Network (CNN) using the Keras Sequential API.

Architecture includes:

Multiple Conv2D layers with ReLU activation

BatchNormalization and Dropout for regularization

Final output layer predicts normalized facial keypoints

6. Model Training
Compile the model using:

Loss Function: Mean Squared Error (MSE)

Optimizer: Adam

Train the model on the preprocessed images and keypoints over several epochs with appropriate batch size and validation.

7. Model Saving and Reusability
Save the trained model to disk for future inference.

Implement functionality to load the saved model for evaluation or prediction.

8. Model Evaluation and Testing
Create a test set by sampling and processing new frames.

Use the trained model to predict facial landmarks on test images.

Overlay predicted keypoints on images for visual assessment of model accuracy.



# **Importing libraries**

***Description: Import essential libraries for data handling, visualization, and image processing***.

In [None]:
import numpy as np
import pandas as pd
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import matplotlib
import glob
# from sklearn import cluster
import cv2

# *Load dataset*
**Description: Load CSV file into a DataFrame and display the first few rows**.

In [None]:
videoDF = pd.read_csv('../input/youtube-faces-with-facial-keypoints/youtube_faces_with_keypoints_full.csv')
videoDF.head()

# **Dataset summary**
**Description: Print the total number of videos and unique individuals in the dataset.**

In [None]:
print('Number of Videos is %d' %(videoDF.shape[0]))
print('Number of Unique Individuals is %d' %(len(videoDF['personName'].unique())))

# **Map video IDs to file paths**
**Description: Collect .npz file paths, map them to video IDs, and filter the dataset to only include videos with available files. Prints updated video and individual counts.**

In [None]:
# create a dictionary that maps videoIDs to full file paths
npzFilesFullPath = glob.glob('../input/youtube-faces-with-facial-keypoints/youtube_faces_with_keypoints_full_1/youtube_faces_with_keypoints_full_1/*.npz')
npzFilesFullPath=np.append(npzFilesFullPath,glob.glob('../input/youtube-faces-with-facial-keypoints/youtube_faces_with_keypoints_full_2/youtube_faces_with_keypoints_full_2/*.npz'))
npzFilesFullPath=np.append(npzFilesFullPath,glob.glob('../input/youtube-faces-with-facial-keypoints/youtube_faces_with_keypoints_full_3/youtube_faces_with_keypoints_full_3/*.npz'))
npzFilesFullPath=np.append(npzFilesFullPath,glob.glob('../input/youtube-faces-with-facial-keypoints/youtube_faces_with_keypoints_full_4/youtube_faces_with_keypoints_full_4/*.npz'))

print(npzFilesFullPath[0])
videoIDs = [x.split('/')[-1].split('.')[0] for x in npzFilesFullPath]
fullPaths = {}
for videoID, fullPath in zip(videoIDs, npzFilesFullPath):
    fullPaths[videoID] = fullPath

videoDF = videoDF.loc[videoDF.loc[:,'videoID'].isin(fullPaths.keys()),:].reset_index(drop=True)
print('Number of Videos is %d' %(videoDF.shape[0]))
print('Number of Unique Individuals is %d' %(len(videoDF['personName'].unique())))

# **Preview filtered dataset**
**Description: Display the first few rows of the dataset after filtering based on available video files.**

In [None]:
videoDF.head()

# **Display sample video frames with 2D keypoints**
**Description: Randomly select a few videos and display selected frames with 2D facial keypoints overlaid for visualization.**

In [None]:
# show several frames from each video and overlay 2D keypoints
np.random.seed(0)
numVideos = 4
framesToShowFromVideo = np.array([0.1,0.3,0.6,0.9])
numFramesPerVideo = len(framesToShowFromVideo)

# select a random subset of 'numVideos' from the available videos
randVideoIDs = videoDF.loc[np.random.choice(videoDF.index,size=numVideos,replace=False),'videoID']
# print(listOfAllConnectedPoints.shape)
fig, axArray = plt.subplots(nrows=numVideos,ncols=numFramesPerVideo,figsize=(14,18))
for i, videoID in enumerate(randVideoIDs):
    # load video
    videoFile = np.load(fullPaths[videoID])
    colorImages = videoFile['colorImages']
    boundingBox = videoFile['boundingBox']
    landmarks2D = videoFile['landmarks2D']

    selectedFrames = (framesToShowFromVideo*(colorImages.shape[3]-1)).astype(int)
    for j, frameInd in enumerate(selectedFrames):
        axArray[i][j].imshow(colorImages[:,:,:,frameInd])
        axArray[i][j].scatter(x=landmarks2D[:,0,frameInd],y=landmarks2D[:,1,frameInd],s=5,c='b')
        axArray[i][j].set_title('"%s" (t=%d)' %(videoID,frameInd), fontsize=12)
        axArray[i][j].set_axis_off()

# Preparing Dataset

# **Count video entries**
**Description: Display the total number of video entries in the DataFrame**

In [None]:
len(videoDF['videoID'])

# **Extract first frame per video**
**Description: Extract and resize the first frame of each video to a fixed size, normalizing pixel values. Tracks progress during processing.**

In [None]:
#selecting first frame of each person
images=[]
img_size=90
for i, videoID in enumerate(videoDF['videoID']):
    if (i%500)==0:
        print(i*10, ' images saved')
    videoFile = np.load(fullPaths[videoID])
    colorImages = videoFile['colorImages']
    landmarks = videoFile['landmarks2D']
    images.append(cv2.resize(colorImages[:,:,:,0],(img_size,img_size))/255)

# **Convert image list to array**
**Description: Convert the list of resized video frames into a NumPy array for further processing or model input.**

In [None]:
images=np.array(images)

# **Extract and scale keypoints**
**Description: Extract facial keypoints from the first frame of each video and scale them to match the resized image dimensions. Progress is printed periodically.**

In [None]:
key_pts=[]
for i, videoID in enumerate(videoDF['videoID']):
    if (i%500)==0:
        print(i)
    videoFile = np.load(fullPaths[videoID])
    org_h,org_w=videoFile['colorImages'][:,:,:,0].shape[:2]
    scale_h,scale_w=img_size/org_h,img_size/org_w
    landmarks = videoFile['landmarks2D']
    keyPts=landmarks[:,:,0]
    keyPts[:,0]=keyPts[:,0]*scale_w
    keyPts[:,1]=keyPts[:,1]*scale_h
    key_pts.append(keyPts)

# **Convert keypoints list to array**
**Description: Convert the list of 2D facial keypoints into a NumPy array for structured analysis or modeling.**

In [None]:
keypts=np.array(key_pts)

# **Visualize sample images with keypoints**
**Description: Display a few sample images with their corresponding facial keypoints overlaid for visual verification.**

In [None]:
fig,ax=plt.subplots(nrows=1,ncols=4,figsize=(15,15))
for i in range(4):
    ax[i].imshow(images[i])
    ax[i].scatter(keypts[i,:,0],keypts[i,:,1],s=5)

# **Prepare keypoints for training**
**Description: Reshape and normalize keypoints data for model input, scaling coordinates relative to image size.** 

In [None]:
# img_size=90
y_data=keypts.reshape(keypts.shape[0],-1)
y_train = np.reshape( y_data , ( -1 , 1 , 1 , 136 ))/img_size

# **Visualize normalized keypoints on images**
**Description: Plot sample images with denormalized facial keypoints overlaid to verify keypoint scaling and alignment.**

In [None]:
fig,ax=plt.subplots(nrows=1,ncols=4,figsize=(15,15))
for i in range(4):
    ax[i].imshow(images[i])
    x=np.reshape(y_train[i,:,:,np.arange(0,136,2)],(68))*img_size
    y=np.reshape(y_train[i,:,:,np.arange(1,136,2)],(68))*img_size
    ax[i].scatter(x,y,s=5)

# **Check keypoints array shape**
***Description: Display the shape of the prepared keypoints array used for training.***

In [None]:
y_train.shape

# Model(CNN)

# **Import TensorFlow and Keras**
***Description: Import TensorFlow and its Keras API for building and training deep learning models.***

In [None]:
import tensorflow
from tensorflow import keras

# **Build and compile CNN model**
***Description: Define a deep convolutional neural network with multiple Conv2D, BatchNormalization, and Dropout layers. Compile it using mean squared error loss and Adam optimizer. Display the model summary.***

In [None]:
model_layers=[
    keras.layers.Conv2D( 256, input_shape=( img_size , img_size , 3 ) , kernel_size=( 5 , 5 ) , strides=1 , activation='relu',name="input_layer"),
    keras.layers.Conv2D( 256 , kernel_size=( 5 , 5 ) , strides=1 , activation='relu' ),
    keras.layers.BatchNormalization(),
    keras.layers.Dropout(0.3),
    
    keras.layers.Conv2D( 256, kernel_size=( 5 , 5 ) , strides=1 , activation='relu' ),
    keras.layers.Conv2D( 256, kernel_size=( 5 , 5 ) , strides=1 , activation='relu' ),
    keras.layers.BatchNormalization(),
    keras.layers.Dropout(0.3),
    
    keras.layers.Conv2D( 200, kernel_size=( 5 , 5 ) , strides=2 , activation='relu'),
    keras.layers.Conv2D( 200 , kernel_size=( 5 , 5 ) , strides= 1, activation='relu'),
    keras.layers.BatchNormalization(),
    keras.layers.Dropout(0.3),
    
    keras.layers.Conv2D( 200, kernel_size=( 5 , 5 ) , strides=1 , activation='relu'),
    keras.layers.Conv2D( 200 , kernel_size=( 5 , 5 ) , strides=1 , activation='relu' ),
    keras.layers.BatchNormalization(),
    keras.layers.Dropout(0.3),
    
    keras.layers.Conv2D( 170, kernel_size=( 3 , 3 ) , strides=1 , activation='relu' ),
    keras.layers.Conv2D( 170, kernel_size=( 3 , 3 ) , strides=1 , activation='relu' ),
    keras.layers.BatchNormalization(),
    keras.layers.Dropout(0.3),
    
    keras.layers.Conv2D( 136, kernel_size=( 3 , 3 ) , strides=1 , activation='relu'),
    keras.layers.Conv2D( 136, kernel_size=( 3 , 3 ) , strides=2 , activation='relu'),
    keras.layers.BatchNormalization(),
    keras.layers.Dropout(0.3),
    
    keras.layers.Conv2D( 136, kernel_size=( 3 , 3 ) , strides=2 , activation='relu'),
    keras.layers.Conv2D( 136 , kernel_size=( 3 , 3 ) , strides=1 , activation='sigmoid'),

    
]
model=keras.Sequential(model_layers)
model.compile( loss= keras.losses.mean_squared_error , optimizer= keras.optimizers.Adam( lr=0.001 ) )
model.summary()

# **Train model**
***Description: Train the model on image and keypoint data for 3 epochs with a batch size of 32.***

In [None]:
train=model.fit(images,y_train,epochs=3,batch_size=32)

# **Load pre-trained model**
***Description: Load a saved Keras model from the file model.pb for further use or evaluation***

In [None]:
m=keras.models.load_model('model.pb')

# Preparing test dataset

# **Prepare test images and bounding boxes**
***Description: Extract and resize the last frame from the first 30 videos, scale bounding boxes accordingly, and store them for testing.***

In [None]:
test_images=[]
test_bbox=[]
for i, videoID in enumerate(videoDF['videoID']):
    if i==30:
        break
    videoFile = np.load(fullPaths[videoID])
    colorImages = videoFile['colorImages']
    bbox=videoFile['boundingBox']
    org_h,org_w=colorImages[:,:,:,0].shape[:2]
    scale_h,scale_w=img_size/org_h,img_size/org_w
    
    box=bbox[:,:,-1]
    box[:,0]=box[:,0]*scale_w
    box[:,1]=box[:,1]*scale_h
    test_bbox.append(box)
    test_images.append(cv2.resize(colorImages[:,:,:,-1],(img_size,img_size))/255)


# **Convert test bounding boxes to array**
***Description: Convert the list of scaled bounding boxes into a NumPy array and display its shape.***


In [None]:
test_bbox=np.array(test_bbox)
test_bbox.shape

# **Visualize test images with bounding boxes**
***Description: Display sample test images with their corresponding bounding boxes overlaid for verification.***

In [None]:
test_images=np.array(test_images)
test_keypts=np.array(test_keyPts)
fig,ax=plt.subplots(nrows=1,ncols=4,figsize=(15,15))
for i in range(4):
    ax[i].imshow(test_images[i])
    ax[i].scatter(test_bbox[i,:,0],test_bbox[i,:,1],s=5)

# **Visualize model predictions on test images**
***Description: Predict facial keypoints on test images and plot the results overlaid on the images for visual evaluation.***

In [None]:
fig,ax = plt.subplots(6,5,figsize=(20,20))

for i in range(1,30):
    r=i//5
    c=i%5
    sample_image = test_images[i]
    pred = m.predict( test_images[ i : i +1  ] ) 
    x=np.reshape(pred[:,:,:,np.arange(0,136,2)],(68))*img_size
    y=np.reshape(pred[:,:,:,np.arange(1,136,2)],(68))*img_size
    ax[r][c].imshow(sample_image)
    ax[r][c].scatter( x,y, c='yellow',s=6)

# **Final Summary**


This project presents a deep learning-based approach for facial landmark detection using TensorFlow and Keras. The objective is to accurately predict 2D facial keypoints from video frames, leveraging a large-scale, video-based facial dataset.

The workflow begins with loading metadata and corresponding .npz video files. From each video, the first frame is extracted, resized to a uniform resolution, and paired with scaled ground truth facial keypoints. These preprocessed image-keypoint pairs form the input to the model.

A Convolutional Neural Network (CNN) is designed using the Keras Sequential API. The architecture comprises multiple convolutional layers activated with ReLU, along with batch normalization and dropout layers to ensure generalization and reduce overfitting. The final layer outputs normalized coordinates of facial landmarks.

The pipeline also includes data visualization, allowing for verification of the keypoint annotations and model predictions. The model is trained using the mean squared error loss and the Adam optimizer, and is evaluated by overlaying predicted landmarks on test images for qualitative assessment.

This solution provides a robust and scalable framework for facial landmark localization, supporting downstream applications such as facial recognition, emotion detection, AR filters, and human-computer interaction.