# **Probability and Random Processes: Assignment 2**
This Assignment is based on two algorithms.
1. Viola Jones Algorithm for Face Detection
2. Method of Eigenfaces for Face Recognition

To setup the data and the supporting functions, run the cells to import the libraries and clone the github repository.

In [1]:
### IMPORTING THE LIBRARIRES ###

from PIL import Image
import cv2
import numpy as np
import os
import shutil
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
from google.colab.patches import cv2_imshow
from tqdm import tqdm

!git clone --quiet https://github.com/uditvyas/Face_Recognition.git

# Part 1: Viola Jones Algorithm for Face Detection (Training Haar Classifier)

For this part, we need to extract both the positive and the negative images from the cloned repository. After this, we need to create the info files for both the positive and the negative files. The following code block produce the result.

**NOTE**: Run the next code cell **ONLY if** you wish to reproduce the results of training the Haar Classifier (training takes approx. 2 hours). Otherwise, **the Classifier file is already available in the cloned github repository**.*

In [None]:
# Checking if already exists
if os.path.exists('/content/pos'):
    shutil.rmtree('/content/pos')
if os.path.exists('/content/neg'):
    shutil.rmtree('/content/neg')

# Unzipping the data files
!unzip -q /content/Face_Recognition/negative.zip
!unzip -q /content/Face_Recognition/positive.zip

# Creating the data information files
def create_pos_n_neg():
    for file_type in ['neg','pos']:
        
        for img in os.listdir(file_type):

            if file_type == 'pos':
                line = file_type+'/'+img+' 1 0 0 62 47\n'
                with open('pos.lst','a') as f:
                    f.write(line)
            elif file_type == 'neg':
                line = file_type+'/'+img+'\n'
                with open('neg.dat','a') as f:
                    f.write(line)
create_pos_n_neg()

if os.path.exists('data'):
    shutil.rmtree('data')
os.mkdir('data')

# This command needs to installed in, if implementing in a local machine
# !apt-get install libopencv-dev

# Creating the Vector File
!opencv_createsamples -info pos.lst -num 3240 -w 100 -h 100 -vec positives.vec
# Training the Haar Classifier
!opencv_traincascade -data data -vec positives.vec -bg neg.dat -numPos 1500 -numNeg 750 -numStages 4 -w 100 -h 100

# Part 2: Detecting faces using the Haar Classifier

The below Code block extracts the images from the zip files.

After this, the Opencv Cascade Classifier function uses the Haar Classifier (.xml file), which was trained in the above code block.

All the images are then passed through the face detector and then are cropped based on the classifier's output. The cropped images are then appended to the final dataset of images and corresponding labels, which are ready to be classified using eigen faces algorithm

In [None]:
### EXTRACTING THE IMAGES ###

### MIT FACE DATASET ###
if os.path.exists('/content/pos'):
    shutil.rmtree('/content/pos')

print("Downloading Data....")
!unzip -q /content/Face_Recognition/positive.zip

################################################################################

images = []
labels = []

h = 100
w = 100

face_cascade = cv2.CascadeClassifier('/content/Face_Recognition/haarcascade_frontalface_default.xml')

for name in tqdm(os.listdir('/content/pos')):
    img = cv2.imread("/content/pos/"+name,cv2.IMREAD_GRAYSCALE)
    face = list(face_cascade.detectMultiScale(img,1.3,5))
    if face:    
        face = face[0]
        x,y,h,w = face[0],face[1],face[2],face[3],
        img = img[y:y+h,x:x+w]
        img = cv2.resize(img,(50,50))
        img = img.flatten()
        images.append(img)
        label = int(name.split(".")[0])//324
        labels.append(label)
images = np.array(images)
labels = np.array(labels)

print("\nFaces Detected in Images Array: {}".format(images.shape))
print("Labels of Detected Images: {}".format(labels.shape))

## THE DATASET IS READY FOR DETECTION

NOTE: The program involves an inherent assumption that the Haar Classifier used is 100% accurate. This approach is not a robust method. However, evaluating the accuracy of the Haar classifier is beyond the scope of this assignment.

# Part 2: Classification of Detected Faces using Eigenfaces

From the detected faces in the above step, we split the images and labels into training and testing set.

In [None]:
### SPLITTING THE IMAGES INTO TRAIN AND TEST DATASETS

X_train, X_test, y_train, y_test = train_test_split(images, labels, test_size=0.20)

print("X_train dimension: {}".format(X_train.shape))
print("y_train dimension: {}".format(y_train.shape))
print("X_test dimension: {}".format(X_test.shape))
print("y_test dimension: {}".format(y_test.shape))

The below code block does the following -

1. Find the mean face
2. Normalise the training data
3. Find the covariance matrix
4. Find the eigenvalues and eigenvectors
5. Sort the eigenvectors in the order of decreasing eigenalues

In [None]:
### CALCULTAING MEAN FACE AND IMPLEMENTING PRINCIPLE COMPONENT ANALYSIS

mean_face = np.mean([X_train[i] for i in range(len(X_train))],axis=0)
print("Mean Face Dimensions:{}".format(mean_face.shape))

normalised_X_train = X_train - mean_face
print("Normalised Faces Dimensions: {}".format(normalised_X_train.shape))

# cov_matrix = (1/normalised_X_train.shape[0])*np.cov(normalised_X_train)
cov_matrix = np.cov(normalised_X_train)
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
print("Cov Matrix Dimensions: {}".format(cov_matrix.shape))
print("Eigenvalues of Cov Dimensions: {}".format(eigenvalues.shape))
print("Eigenvectors of Cov Dimensions: {}".format(eigenvectors.shape))

eig_pairs = [(eigenvalues[index], eigenvectors[:,index]) for index in range(len(eigenvalues))]

# Sort the eigen pairs in descending order:
eig_pairs.sort(reverse=True)
sorted_values  = np.array([eig_pairs[index][0] for index in range(len(eigenvalues))])
sorted_vectors = np.array([eig_pairs[index][1] for index in range(len(eigenvalues))])

print("Sorted Eigenvectors Dimensions: {}".format(sorted_values.shape))
print("Sorted Eigenvalues Dimensions: {}".format(sorted_vectors.shape))

After the sorting of eigenvalues and eigenvectors is completed, we now plot the cumulative variance of the eigenvalues to decide upon the number of eigenvectors to be retained after Principle Component Analysis (PCA)

In [None]:
### DECIDING THE NUMBER OF PRINCIPLE COMPONENTS

# Finding Cumulative Sum of eigenvalues
cummulative_sum = np.cumsum(sorted_values)/np.sum(sorted_values)

x_axis = range(1,len(sorted_values)+1)
plt.figure(figsize=(10,10))
plt.scatter(x_axis, cummulative_sum)
plt.xlabel("Number of Principle Components")
plt.ylabel("Cummulative Variance ")
plt.show()

Based on the graph, choose an appropriate value of K. Then, reduce the data by finally applying the last step of PCA. At the end of this block, we have the final reduced eigenvectors from the data.

In [23]:
### REDUCING THE DATA ###
K = 300

reduced_data = np.array(sorted_vectors[:K]).transpose()
print("Reduced Data Dimensions: {}".format(reduced_data.shape))

final_eigen_vectors = np.dot(normalised_X_train.transpose(),reduced_data)
print("Final Eigen Vectors Dimensions: {}".format(final_eigen_vectors.shape))

# Finding weights for each training image
train_weights = np.array([np.dot(i,final_eigen_vectors) for i in normalised_X_train])
print("Train Weights Dimensions: {}".format(train_weights.shape))

final_eigen_vectors = np.transpose(final_eigen_vectors)

### FINAL EIGENVECTORS ARE READY ###

Reduced Data Dimensions: (2538, 300)
Final Eigen Vectors Dimensions: (2500, 300)
Train Weights Dimensions: (2538, 300)


The below is a helper function for calculating the accuracy of the eigenfaces approach

In [24]:
def display_compare(test_img,predicted_index,test_index):
    test_img = np.reshape(test_img,(-1,100))
    actual_label = y_test[test_index]
    # print("Actual Label: {}".format(actual_label))
    # cv2_imshow(test_img)

    predicted_image = np.reshape(X_train[predicted_index],(-1,100))
    predicted_label = y_train[predicted_index]
    # print("Predicted Label: {}".format(predicted_label))
    # cv2_imshow(predicted_image)

    if predicted_label == actual_label:
        return 1
    else:
        return 0   

# Testing the algorithm for accuracy

In [26]:
correct = 0
for i in tqdm(range(len(X_test))):
    img = X_test[i]
    norm_img = img - mean_face
    weights = np.array([np.dot(norm_img,final_eigen_vectors[k]) for k in range(final_eigen_vectors.shape[0])])
    weight_error = train_weights - weights
    norms = np.linalg.norm(weight_error,axis = 1)
    predicted_index = np.argmin(norms)
    correct = correct + display_compare(img,predicted_index,i)

accuracy = correct*100/(i+1)
print("Accuracy: {:.2f}%".format(accuracy))

100%|██████████| 635/635 [00:10<00:00, 62.41it/s]

557
Accuracy: 87.72%





In [None]:
import os
import shutil
import cv2
import numpy as np
import urllib.request

def prepare_pos_images():
    if not os.path.exists("pos"):
        os.mkdir("pos")
    else:
        print("Pos Exists. Deleting Directory")
        shutil.rmtree("pos")
        os.mkdir("pos")

    img_num = 0
    for f in os.listdir('training-synthetic'):
        try:
            print(f)
            img = cv2.imread('training-synthetic/'+f,cv2.IMREAD_GRAYSCALE)
            resized_img = cv2.resize(img,(100,100))
            cv2.imwrite("pos/"+str(img_num)+".jpg",resized_img)
            img_num += 1
        except Exception as e:
            print(str(e))

# prepare_pos_images()

def prepare_neg_images():
    if not os.path.exists("neg"):
        os.mkdir("neg")
    # else:
    #     print("Neg Exists. Deleting Directory")
    #     shutil.rmtree("neg")
    #     os.mkdir("neg")

    # neg_link = 'http://image-net.org/api/text/imagenet.synset.geturls?wnid=n03183080'
    # neg_link = 'http://image-net.org/api/text/imagenet.synset.geturls?wnid=n03563967'
    # neg_link = 'http://www.image-net.org/api/text/imagenet.synset.geturls?wnid=n04576211'
    neg_link = 'http://www.image-net.org/api/text/imagenet.synset.geturls?wnid=n01905661'


    urls = urllib.request.urlopen(neg_link).read().decode()

    img_num = 1520

    for i in urls.split("\n"):
        try:
            print(i)
            dir = '/content/drive/My Drive/:p Sem ki naiya hai Ram ke bharose!!/ES331: Probability and Random Processes/Assignment2_Udit/neg/'
            urllib.request.urlretrieve(i,dir+str(img_num)+".jpg")
            img = cv2.imread(dir+str(img_num)+".jpg",cv2.IMREAD_GRAYSCALE)
            resized_img = cv2.resize(img,(100,100))
            cv2.imwrite(dir+str(img_num)+".jpg",resized_img)
            img_num+=1
        except Exception as e:
            print(str(e))
prepare_neg_images()

In [None]:

# y_pred = clf.predict(X_test_pca)
# print(classification_report(y_test, y_pred, target_names=target_names))

# # Visualization
# def plot_gallery(images, titles, h, w, rows=3, cols=4):
#     plt.figure(figsize = (10,10))
#     for i in range(rows * cols):
#         plt.subplot(rows, cols, i + 1)
#         plt.imshow(images[i].reshape((h, w)), cmap=plt.cm.gray)
#         plt.title(titles[i])
#         plt.xticks(())
#         plt.yticks(())
 
# def titles(y_pred, y_test, target_names):
#     for i in range(y_pred.shape[0]):
#         pred_name = target_names[y_pred[i]].split(' ')[-1]
#         true_name = target_names[y_test[i]].split(' ')[-1]
#         yield 'predicted: {0}\ntrue: {1}'.format(pred_name, true_name)
 
# prediction_titles = list(titles(y_pred, y_test, target_names))
# plot_gallery(X_test, prediction_titles, h, w)

# eigenface_titles = ["eigenface %d" % i for i in range(eigenfaces.shape[0])]
# plot_gallery(eigenfaces, eigenface_titles, h, w)
# plt.show()

In [None]:
### EXTRACTING THE IMAGES IN A USABLE FORMAT ###
### YALE FACE DATASET ###

!wget https://vismod.media.mit.edu/vismod/classes/mas622-00/datasets/YALE.tar.gz
!tar -xvf  'YALE.tar.gz'

################################################################################

if os.path.exists('faces'):
    shutil.rmtree('faces')

images = []
labels = []
os.mkdir('faces')

for name in os.listdir('YALE/faces'):
    if not '.pgm' in name:
      img = Image.open('YALE/faces/'+name)
      h = int(img.size[0]/3)
      w = int(img.size[1]/3)
      img = img.resize((h,w))
      img.save('faces/'+name+'.jpg')
      img = cv2.imread('faces/'+name+'.jpg',cv2.IMREAD_GRAYSCALE)
      images.append(img.flatten())
      labels.append(int(name.split('.')[0][-2:]))
images = np.array(images)
labels = np.array(labels)
print("Images Array: {}".format(images.shape))
print("Labels Array: {}".format(labels.shape))

## THE DATASET IS READY FOR DETECTION