Improving Gaze Detection #124

vladmandic · 2021-05-25T16:33:38Z

vladmandic
May 25, 2021
Maintainer

In short, Human way of detemining gaze direction is very very similar to what you wrote in your email

Find bounding box of a face, crop it
Use cropped face to to run mesh model
Use same cropped face to run iris model
Combine results (mesh returns 468 keypoints, iris returns 5 keypoints per eye, so total is now 478)
Then in gesture.ts, there is a simple math (this is a small part of it):

  const rightIrisCenterX = Math.abs(res[i].mesh[33][0] - res[i].annotations.rightEyeIris[0][0]) / res[i].annotations.rightEyeIris[0][0];
  const leftIrisCenterX = Math.abs(res[i].mesh[263][0] - res[i].annotations.leftEyeIris[0][0]) / res[i].annotations.leftEyeIris[0][0];
  if (leftIrisCenterX > 0.033 || rightIrisCenterX > 0.033) center = false;
  if (leftIrisCenterX > 0.033) gestures.push({ iris: i, gesture: 'looking right' });
  if (rightIrisCenterX > 0.033) gestures.push({ iris: i, gesture: 'looking left' });

(this is just looking left/right, math for looking up/down is separate)

Where res[i].mesh[33][0] and res[i].mesh[263][0] are x coordinates outside of points of each eye
(you can see point indexes in https://github.com/vladmandic/human/blob/main/assets/facemesh.png)

So if a ratio is more than 3% to left, it will say looking <direction> (i put 3% as empiric value)

I tested the gaze estimation of the Human in very different situations (light conditions, positions and distances) and the performance drops dramatically. I mean the function works fine only in the predefined position, the position where the x and y location of the webcam is exactly in the front of the face. other than that it makes wrong estimation for gaze in every frame (this is actually the problem1).

Can it be improved? Of course. Feel free to suggest improvements

Couple of ideas from the top of my mind:

It should look at how confident is iris detection to start with
For example, if face is turned slightly to the right, it should only look at left eye results and not both eyes since right eye is clearly less visible
It should look at diagonal distance to eye edge, not x distance only since head can be titled slightly
After performing face detection, it would be benefitial to normalize brightness of cropped face before passing it to mesh detection and iris detection
It could even be converted to black & white and then reconstructed as RBG to avoid coloring effects
I've done a lot of work on that when analyzing face descriptors, that can be re-used for face mesh and iris analysis as well
Additionally image preprocessing can help a lot
Btw, sometimes blurring is better than sharpening as it reduces jitter returned by argmax functions when analyzing heatmaps
Human already has functionality to do that, its a question of playing and finding best values
Haven't tried with iris model, but likely doing a tighter crop around detected face before passing it to iris model
and then reconstructing coordinates aftward would result in better precision of iris model
Or even more precise, do a double-pass: BlazeFace detects faces very loosely and then I crop the face pass it to mesh and iris model
It could be done so first pass of mesh model is used to find extreme points of the face,
then re-crop the face and pass it again and now with more precision again to mesh and iris model and take those keypoints as a result instead
Alternatively, replace BlazeFace (it's blazing fast and small, but not accurate) in the pipeline with a more precise model so initial crop is more precise
I'm actively looking at alternatives at the moment

All this would be welcome contributions for Human

Also, I took a quick look at https://github.com/david-wb/gaze-estimation
It's a PyTorch model with pretrained checkpoint - and with several reasons why I wouldn't integrate it in Human:

It's very large for small thing it does
30MB. I could probably strip and quantize it down to ~12MB,
but that would still be double the size of the second largest model I'm using
In my other projects, I use very large models (over 1GB stripped),
but one of key design goals for Human is to be fully portable which means keeping overall size to minimum
It's not in what I'd consider a finished stage
Output layers are fetched from model internals by name, etc
Alkso, key operation which analyzes heatmap returned by model is done in Python,
not in the model itself: for example, softargmax https://github.com/david-wb/gaze-estimation/blob/master/util/softargmax.py
I can handle all that by freezing model and doing a static signature definition and implenting external functions in JS,
but for me it's not worth it since this is only a small item in Human

CC: @lghasemzadeh

vladmandic · 2021-05-30T21:18:44Z

vladmandic
May 30, 2021
Maintainer Author

I just added code to calculate gaze vector:

human/src/face.ts

Line 15 in e0374f0

const calculateGaze = (mesh): { angle: number, strength: number } => {

Results are definitely more useful than simple gaze recognition such as "looking center" although same precision issues remain

See sample image where you can see gaze direction as vector

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving Gaze Detection #124

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Improving Gaze Detection #124

vladmandic May 25, 2021 Maintainer

Replies: 1 comment

vladmandic May 30, 2021 Maintainer Author

vladmandic
May 25, 2021
Maintainer

vladmandic
May 30, 2021
Maintainer Author