# Documentation of the project Interaction by Facial Expressions
**Group members:** Lea Krawczyk, Thi Mai Linh Nguyen, Sebastian Vittinghoff

# Eye Detection with Dlib (06.12.2023)

## Changing the goal
Before our goal was to use facial expressions as well as gestures to navigate our application. After being given some feedback we decided to focus only on the facial expressions and make our application hands free. This helped us to narrow down our focus. 
We began researching what can be achieved by only using the face and wether there are any standards for using facial expressions for navigation.

## Research results
We started a generally researching standards for hands free applications. We did not find a universal gesture/facial expression guideline for navigation, however we found a few examples of hands free applications.  
The first idea we came across, because he is a professor at our university was [hands free coding via voice control by Wolfram Wingerath](https://vsis-www.informatik.uni-hamburg.de/getDoc.php/publications/642/2021-ix-wingerath-handsfree-coding.pdf). This article states, that even though there is voice control or rather a dictation function on most of our devices it is often disregarded as a special "fun" feature, even though it is essential to a group of people.

However since we are focusing on image processing by facial expressions and gestures, the idea wasn't really applicable to our project. We came across another hands free coding approach by Charlie Gerard. She developed an [extension for JetBrains IDE's which makes use of gaze detection](https://www.youtube.com/watch?v=0ISXpNJ5iNs). Even though the use case differs quite a lot from our use case we added gaze detection to our stack of navigation commands, since it seemed to be tried and tested and worked really well in her demonstration.  

Another approach we came across revolves around eye tracking. Dr. Oualid S. wrote an [article on LinkedIn](https://www.linkedin.com/pulse/navigate-your-screen-blink-eye-oualid-soula) showing how he navigates the mouse using eye tracking. The special thing about this technique is, that the eye tracking is done without the use of external devices. He himself states, that it's not really the most precise and responsive way to navigate the computer. Since we were looking for high accuracy in our project we did not use anything from his approach but it was still interesting to see that this is actually possible without external devices. 

## Tests with Dlib models 
We started out by getting familiar with Dlib and learning how to utilize the landmark points to detect certain actions. For that we used the [shape_predictor_68_face_landmarks](https://github.com/davisking/dlib-models/blob/master/shape_predictor_68_face_landmarks.dat.bz2) by davisking to help us detect the eyes of a face. For the face detection itself,  we used the haarcascade_frontalface_default model from OpenCV. The first thing we built was a blink counter, using a [tutorial provided by Adrian Rosebrock](https://pyimagesearch.com/2017/04/24/eye-blink-detection-opencv-python-dlib/) and this article [Eye Aspect Ratio(EAR) and Drowsiness detector using Dlib](https://medium.com/analytics-vidhya/eye-aspect-ratio-ear-and-drowsiness-detector-using-dlib-a0b2c292d706) by Dhruv Pandey. 
Since we basically just copied and tried to understand the code from the tutorial we are not going to provide it here. The output we achieved was this:
<br/>
<div>
    <img src="./assets/blink-counter.gif" alt="blink counter output" width="640">
</div>

## Eye Detection
Dlib detects 68 landmarks in a face with 37-42 being the landmarks of the right eye and 43-48 of the left eye. This means that each eye is represented by 6 landmark points. Those 6 points are put into a formula which calculates the Eye-Aspect-Ratio (EAR). 
The EAR works by using the euclidian distance between the upper and lower landmarks points of the eye. So we calculate (for the right eye): 
<p>$\[ \text{EAR} = \frac{\|38 - 42\| + \|39 - 41\|}{2\|37 - 40\|} \]$</p>

The euclidian distance of the vertical points of the eye added together, divided by twice the euclidian distance of the horizontal points of the eye. How this formula works is also documented in these images:
<br/>
<div>
    <img src="./assets/eye-aspect-ratio.png" alt="EAR visually explained" width="640">
</div>

In the image it is illustrated, that the colored vertical lines approach zero. The value of the EAR is higher when the eye is open and lower when it's closed. So the idea is to use a value as a threshold which we then check in an if-condition to determine whether an eye is closed or not.

In [None]:
!pip install dlib
!pip install imutils
!pip install scipy

import cv2
import dlib
from imutils import face_utils


def eye_aspect_ratio(eye):
    A = dist.euclidean(eye[1], eye[5])
    B = dist.euclidean(eye[2], eye[4])
    C = dist.euclidean(eye[0], eye[3])
    ear = (A + B) / (2.0 * C)
    return ear


EYE_AR_THRESH = 0.18
EYE_AR_CONSEC_FRAMES = 15

detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor("./models/shape_predictor_68_face_landmarks.dat")

(lStart, lEnd) = face_utils.FACIAL_LANDMARKS_IDXS["left_eye"]
(rStart, rEnd) = face_utils.FACIAL_LANDMARKS_IDXS["right_eye"]

vs = cv2.VideoCapture(1)

while True:
    ret, frame = vs.read()
    frame = cv2.flip(frame, 1)
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    rects = detector(gray, 0)

    for rect in rects:

        shape = predictor(gray, rect)
        shape = face_utils.shape_to_np(shape)
        (x, y, w, h) = face_utils.rect_to_bb(rect)
        for (x, y) in shape:
            cv2.circle(frame, (x, y), 3, (0, 0, 255), -1)

        leftEye = shape[lStart:lEnd]
        rightEye = shape[rStart:rEnd]
        leftEAR = eye_aspect_ratio(leftEye)
        rightEAR = eye_aspect_ratio(rightEye)

        ear = (leftEAR + rightEAR)

        if (rightEAR < EYE_AR_THRESH) & (leftEAR > rightEAR):
            cv2.putText(frame, "Left eye closed", (10, 50),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)

        elif (leftEAR < EYE_AR_THRESH) & (rightEAR > leftEAR):
            cv2.putText(frame, "Right eye closed", (200, 50),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)

        elif ear < EYE_AR_THRESH:
            cv2.putText(frame, "Eye: {}".format("both closed"), (10, 30),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)

    cv2.imshow("Frame", frame)
    key = cv2.waitKey(1) & 0xFF

    if key == ord("q"):
        break

vs.release()
cv2.destroyAllWindows()

The results we got from this were not particularly stable. We had a lot of wrong results when trying to detect which eye is closed. We tried different thresholds:

In [None]:
!pip install dlib
!pip install imutils
!pip install scipy

import cv2
import dlib
from imutils import face_utils


def eye_aspect_ratio(eye):
    A = dist.euclidean(eye[1], eye[5])
    B = dist.euclidean(eye[2], eye[4])
    C = dist.euclidean(eye[0], eye[3])
    ear = (A + B) / (2.0 * C)
    return ear


EYE_AR_THRESH = 0.3
EYE_AR_CONSEC_FRAMES = 15

detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor("./models/shape_predictor_68_face_landmarks.dat")

(lStart, lEnd) = face_utils.FACIAL_LANDMARKS_IDXS["left_eye"]
(rStart, rEnd) = face_utils.FACIAL_LANDMARKS_IDXS["right_eye"]

vs = cv2.VideoCapture(1)

while True:
    ret, frame = vs.read()
    frame = cv2.flip(frame, 1)
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    rects = detector(gray, 0)

    for rect in rects:

        shape = predictor(gray, rect)
        shape = face_utils.shape_to_np(shape)
        (x, y, w, h) = face_utils.rect_to_bb(rect)
        for (x, y) in shape:
            cv2.circle(frame, (x, y), 3, (0, 0, 255), -1)

        leftEye = shape[lStart:lEnd]
        rightEye = shape[rStart:rEnd]
        leftEAR = eye_aspect_ratio(leftEye)
        rightEAR = eye_aspect_ratio(rightEye)

        ear = (leftEAR + rightEAR)

        if (rightEAR < EYE_AR_THRESH) & (leftEAR > rightEAR):
            cv2.putText(frame, "Left eye closed", (10, 50),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)

        elif (leftEAR < EYE_AR_THRESH) & (rightEAR > leftEAR):
            cv2.putText(frame, "Right eye closed", (200, 50),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)

        elif ear < EYE_AR_THRESH:
            cv2.putText(frame, "Eye: {}".format("both closed"), (10, 30),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)

    cv2.imshow("Frame", frame)
    key = cv2.waitKey(1) & 0xFF

    if key == ord("q"):
        break

vs.release()
cv2.destroyAllWindows()

We also tried different thresholds for the two eyes:

In [None]:
!pip install dlib
!pip install imutils
!pip install scipy

import cv2
import dlib
from imutils import face_utils
from scipy.spatial import distance as dist


def eye_aspect_ratio(eye):
    A = dist.euclidean(eye[1], eye[5])
    B = dist.euclidean(eye[2], eye[4])
    C = dist.euclidean(eye[0], eye[3])
    ear = (A + B) / (2.0 * C)
    return ear


EYE_AR_THRESH = 0.35
EYE_AR_CONSEC_FRAMES = 15

detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor("./models/shape_predictor_68_face_landmarks.dat")

(lStart, lEnd) = face_utils.FACIAL_LANDMARKS_IDXS["left_eye"]
(rStart, rEnd) = face_utils.FACIAL_LANDMARKS_IDXS["right_eye"]

vs = cv2.VideoCapture(1)

while True:
    ret, frame = vs.read()
    frame = cv2.flip(frame, 1)
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    rects = detector(gray, 0)

    for rect in rects:

        shape = predictor(gray, rect)
        shape = face_utils.shape_to_np(shape)
        (x, y, w, h) = face_utils.rect_to_bb(rect)
        for (x, y) in shape:
            cv2.circle(frame, (x, y), 3, (0, 0, 255), -1)

        leftEye = shape[lStart:lEnd]
        rightEye = shape[rStart:rEnd]
        leftEAR = eye_aspect_ratio(leftEye)
        rightEAR = eye_aspect_ratio(rightEye)

        ear = (leftEAR + rightEAR)

        if (rightEAR < 0.23) & (leftEAR > rightEAR):
            cv2.putText(frame, "Left eye closed", (10, 50),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)

        elif (leftEAR < 0.27) & (rightEAR > leftEAR):
            cv2.putText(frame, "Right eye closed", (200, 50),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)

        elif ear < EYE_AR_THRESH:
            cv2.putText(frame, "Eye: {}".format("both closed"), (10, 30),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)

    cv2.imshow("Frame", frame)
    key = cv2.waitKey(1) & 0xFF

    if key == ord("q"):
        break

vs.release()
cv2.destroyAllWindows()

The different thresholds seemed to have worked better but still not accurate enough. The accuracy was very dependent on environmental light and distance to the webcam. In the demo one can also see that the landmark points stick to the eyelid and do not move, even when closing an eye. Or in the other case the eye is not detected in its accurate position at all anymore. This can also be seen in the following image: 
<br/>
<div>
    <img src="./assets/eye-detection-dlib.png" alt="eye detection inaccuracies using dlib" width="640"/>
</div>