End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context (aka. Multi Clue Gaze)
Yiran Guan*, Zhuoguang Chen*, Wenzheng Zeng†, Zhiguo Cao, Yang Xiao†
Huazhong University of Science and Technology
*: equal contribution, †: corresponding author
Inspired by gaze360-demo and yolov5-crowdhuman. We use gaze estimation for each person in a video and visualize it. You can see MCGaze_demo
for more details.
This repository contains the official implementation of the paper "End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context".
We propose to facilitate video gaze estimation via capturing spatial-temporal interaction context among head, face, and eye in an end-to-end learning way, which has not been well concerned yet. Experiments on the challenging Gaze360 dataset verify the superiority of our proposition.
In our work, we test our model in two different dataset settings (Gaze360-setting and l2CS-setting(i.e., only consider face detectable samples)) for fair comparison with the previous methods.
You can download the checkpoint for the model from the link inside the table.
Setting | Backbone | MAE-Front180 | Weight |
---|---|---|---|
Gaze360-setting | R-50 | 10.74 | Google Drive |
l2cs-setting | R-50 | 9.81 | Google Drive |
-
Create a new conda environment:
conda create -n MCGaze python=3.9 conda activate MCGaze
-
Install Pytorch (1.7.1 is recommended).
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
-
Install MMDetection.
-
Install MMCV-full first. 1.4.8 is recommended.
pip install mmcv-full==1.4.8 -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.1/index.html
-
cd MCGaze pip install -v -e .
If you encounter difficulties during use, please open a new issue or contact us.
-
-
Download Gaze360 dataset from official.
-
Download the train.txt and test.txt in Gaze360's official GitHub repo.
-
Using our code to reorganize the file structure. You should modify the paths first.
-
python tools/gaze360_img_reorganize.py
-
-
Download the COCO format annotation from this annotations, and put them into corresponding folders.
Here is the right hierarchy of folder MCGaze/data
below:
└── data
|
├── gaze360
| ├── train_rawframes
| | ├── 1
| | | ├── 00000.png
| | | ├── 00001.png
| | | └── ...
| | ├── 2
| | └── ...
| |
| ├── test_rawframes
| | ├── 1
| | | ├── 00000.png
| | | ├── 00001.png
| | | └── ...
| |
| ├── train.json
| └── test.json
|
├── l2cs
| ├── train_rawframes
| | ├── 1
| | | ├── 00000.png
| | | └── ...
| | ├── 2
| | └── ...
| |
| ├── test_rawframes
| ├── train.json
| └── test.json
└──
- Run the commands below for inference and evaluation in different settings.
If you want to evaluate the model without training by yourself, you need to download our checkpoints (we recommend that you can create a new folder "ckpts" and put the files in it).
And remember to check if the file paths of shells are right.
bash tools/test_gaze360.sh
bash tools/test_l2cs.sh
- Run the commands below to begin training in different settings.
bash tools/train_gaze360.sh
bash tools/train_l2cs.sh
This code is inspired by MPEblink, TeViT and MMDetection. Thanks for their great contributions to the computer vision community.
If MCGaze is useful or relevant to your research, please kindly recognize our contributions by citing our paper:
@article{guan2023end,
title={End-to-End Video Gaze Estimation via Capturing Head-Face-Eye Spatial-Temporal Interaction Context},
author={Guan, Yiran and Chen, Zhuoguang and Zeng, Wenzheng and Cao, Zhiguo and Xiao, Yang},
journal={IEEE Signal Processing Letters},
volume={30},
pages={1687--1691},
year={2023},
publisher={IEEE}
}