<div align='center'><img src='images/ou.jpeg' /></div>

# Facial Recognition Systems
Project proposal for data science workshop  
Yossi Cohen: 022446819

## Motivation
Face recognition is the latest trend when it comes to user authentication.  
Facebook has developed the ability to recognize your friends in your photographs.  
iPhone X uses Face ID to authenticate users.  
Baidu is using face recognition instead of ID cards to allow their employees to enter their offices.  
And the list goes on...  

My aim in this project is to study the theory behind Face Recognition and implement a simplified version of a face recognition system in Python.  

## References

- [FaceNet: A Unified Embedding for Face Recognition and Clustering](http://www.cv-foundation.org/openaccess/content_cvpr_2015/app/1A_089.pdf)
  - Florian Schroff, Dmitry Kalenichenko, and James Philbin at Google. 

- [DeepFace: Closing the Gap to Human-Level Performance in Face Verification](https://research.fb.com/wp-content/uploads/2016/11/deepface-closing-the-gap-to-human-level-performance-in-face-verification.pdf) 
  - Yaniv Taigman, Ming Yang and Marc’Aurelio Ranzato - Facebook AI Research
  - Lior Wolf - Tel Aviv University, Israel 

## Background

Face recognition problems commonly fall into two categories:
- **Face Verification** - is this the claimed person? 
  - Examples:
    - At some airports, you can pass through customs by letting a system scan your passport and then verifying that you (the person carrying the passport) are the correct person.
    - A mobile phone that unlocks using your face is also using face verification.
  - This is a $1:1$ matching problem.
- **Face Recognition** - who is this person?
    - Example:
      - Baidu employees entering the office without needing to otherwise identify themselves. 
    - This is a $1:K$ matching problem.

**Face Recognition** is really a series of several related problems:  

1. First, find all faces in a picture
2. Second, for each face, be able to identify the person even if the face is turned in a weird direction or in bad lighting.
3. Third, pick out unique features of the face that can be used to tell it apart from other people.
4. Finally, compare the unique features of that face to all the known people in the database, to determine the person’s name.


Before we get into the details of the implementation I want to discuss the details of [FaceNet](http://www.cv-foundation.org/openaccess/content_cvpr_2015/app/1A_089.pdf) which is the network that will be used in this work.

### FaceNet
FaceNet is a neural network that learns a mapping from face images to a compact [Euclidean space](https://en.wikipedia.org/wiki/Euclidean_space) where distances correspond to a measure of face similarity. Hence, the more similar two face images are the lesser the distance between them.  

FaceNet learns a neural network that encodes a face image into a vector of 128 numbers.  
By comparing two such vectors, you can then determine if two pictures are of the same person.  

The embedding is a generic representation for anybody's face.  
Unlike other face representations, this embedding has the nice property that a larger distance  
between two face embeddings means that the faces are likely not of the same person.   
This property makes clustering, similarity detection, and classification tasks easier than other  
face recognition techniques where the Euclidean distance between features is not meaningful.

### Triplet Loss
FaceNet uses a distinct loss method called **Triplet Loss** to calculate loss.  
Triplet Loss minimises the distance between an anchor and a positive, images that contain same identity,  
and maximises the distance between the anchor and a negative, images that contain different identities.  

FaceNet uses a distinct loss method called __*Triplet Loss*__ to calculate loss.  
Triplet Loss minimises the distance between an anchor and a positive, images that contain same identity,  
and maximises the distance between the anchor and a negative, images that contain different identities.  

<p>
<font size=6>
    $$Loss = \Sigma_{i=1}^{n}max(0, \lVert f_i^a - f_i^p \rVert_2^2 - \lVert f_i^a - f_i^n \rVert_2^2 + \alpha)$$
</font>


- $f(a)$ refers to the output encoding of the anchor
- $f(p)$ refers to the output encoding of the positive
- $f(n)$ refers to the output encoding of the negative
- $\alpha$ is a constant used to make sure the network does not try to optimise towards $f(a)-f(p)=f(a)-f(n)=0$

<img src='images/triplet-loss.png' />

### Siamese Networks
FaceNet is a Siamese Network.  
A Siamese Network is a type of neural network architecture that learns how to differentiate between two inputs.  
This allows them to learn which images are similar and which are not. These images could be contain faces.  

Siamese networks consist of two identical neural networks, each with the same exact weights.  
First, each network take one of the two input images as input.  
Then, the outputs of the last layers of each network are sent to a function that determines whether the images contain the same identity.

In FaceNet, this is done by calculating the distance between the two outputs.

<img src='images/siamese-networks.jpeg' />

## Project data
- [LFW dataset -Labeled Faces in the Wild](http://vis-www.cs.umass.edu/lfw/)
- Pre trained models
- My family pictures folder

## Approach and Project Outline

In this project, I will implement the following steps of Facial Recognition.  
Along the way I will study and present the theory behind the various implementation parts.

1. Detect all faces in a picture (using pre-trained models from [dlib](http://blog.dlib.net/2014/02/dlib-186-released-make-your-own-object.html))

2. Transform the face for the neural network (using [dlib's real-time pose estimation](http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html) with OpenCV's [affine transformation](https://docs.opencv.org/2.4/doc/tutorials/imgproc/imgtrans/warp_affine/warp_affine.html) to try to make the eyes and bottom lip appear in the same location on each image).

3. Use a deep neural network to embed the face on a 128-dimensional unit hypersphere.  

4. Use these encodings to perform face verification.  

5. Apply clustering or classification techniques to compare the unique features of that face to all the known people in the database, to determine the person’s name.  

<img src='images/face-recognition-steps.jpg' />

## Implementation

We need to build a pipeline where we solve each step of face recognition separately and pass the result of the current step to the next step.  

### Step 1: Finding all the Faces
The first step in our pipeline is face detection, so we can try to tell them apart.
<br>We will use this step for finding the areas of the image we want to pass on to the next step in our pipeline.

<br>We’re going to use a method invented in 2005 called [Histogram of Oriented Gradients](http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf) — or just HOG for short.
<img src='images/face-detection.png' />

<font style="background-color:yellow;">**TODO**</font>: Explain details and theoretical background...
<img src='images/HOG.png' />


## Step 2: Posing and Projecting Faces
<font style="background-color:yellow;">**TODO**</font>: explain landmarks...
<img src='images/landmarks.png' />
<img src='images/landmarks-2.png' />

## Step 3: Encoding Faces

<font style="background-color:yellow;">**TODO**</font>: Explain encoding...

## Step 4: Apply clustering /classification to find the person’s name from the encoding
<img src='images/compare-encoding.jpeg' />