End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context (aka. Multi Clue Gaze)

Yiran Guan*, Zhuoguang Chen*, Wenzheng Zeng^†, Zhiguo Cao, Yang Xiao^†

Huazhong University of Science and Technology

*: equal contribution, †: corresponding author

🥰Our work has been accepted by IEEE Signal Processing Letters！

✨Demo code has been added to this repo！

Inspired by gaze360-demo and yolov5-crowdhuman. We use gaze estimation for each person in a video and visualize it. You can see MCGaze_demo for more details.

Introduction

This repository contains the official implementation of the paper "End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context".

We propose to facilitate video gaze estimation via capturing spatial-temporal interaction context among head, face, and eye in an end-to-end learning way, which has not been well concerned yet. Experiments on the challenging Gaze360 dataset verify the superiority of our proposition.

Results and models

In our work, we test our model in two different dataset settings (Gaze360-setting and l2CS-setting(i.e., only consider face detectable samples)) for fair comparison with the previous methods.

You can download the checkpoint for the model from the link inside the table.

Setting	Backbone	MAE-Front180	Weight
Gaze360-setting	R-50	10.74	Google Drive
l2cs-setting	R-50	9.81	Google Drive

Get Started

Prepare your python environment

Create a new conda environment:

conda create -n MCGaze python=3.9
conda activate MCGaze

Install Pytorch (1.7.1 is recommended).

pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

Install MMDetection.
- Install MMCV-full first. 1.4.8 is recommended.
```
pip install mmcv-full==1.4.8 -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.1/index.html
```
- ```
cd MCGaze
pip install -v -e .
```
If you encounter difficulties during use, please open a new issue or contact us.

Prepare your dataset

Download Gaze360 dataset from official.
Download the train.txt and test.txt in Gaze360's official GitHub repo.
Using our code to reorganize the file structure. You should modify the paths first.
- ```
python tools/gaze360_img_reorganize.py
```
Download the COCO format annotation from this annotations, and put them into corresponding folders.

Here is the right hierarchy of folder MCGaze/data below:

 └── data
     |
     ├── gaze360
     |   ├── train_rawframes
     |   |   ├── 1
     |   |   |   ├── 00000.png
     |   |   |   ├── 00001.png
     |   |   |   └── ...
     |   |   ├── 2
     |   |   └── ...
     |   |     
     |   ├── test_rawframes
     |   |   ├── 1
     |   |   |   ├── 00000.png
     |   |   |   ├── 00001.png
     |   |   |   └── ...
     |   |    
     |   ├── train.json
     |   └── test.json
     |
     ├── l2cs
     |   ├── train_rawframes
     |   |   ├── 1
     |   |   |   ├── 00000.png
     |   |   |   └── ...
     |   |   ├── 2
     |   |   └── ...
     |   |     
     |   ├── test_rawframes
     |   ├── train.json
     |   └── test.json
     └──

Inference & Evaluation

Run the commands below for inference and evaluation in different settings.

If you want to evaluate the model without training by yourself, you need to download our checkpoints (we recommend that you can create a new folder "ckpts" and put the files in it).

And remember to check if the file paths of shells are right.

Gaze360-setting

bash tools/test_gaze360.sh

l2cs-setting

bash tools/test_l2cs.sh

Training

Run the commands below to begin training in different settings.

Gaze360-setting

bash tools/train_gaze360.sh

l2cs-setting

bash tools/train_l2cs.sh

Acknowledgement

This code is inspired by MPEblink, TeViT and MMDetection. Thanks for their great contributions to the computer vision community.

Citation

If MCGaze is useful or relevant to your research, please kindly recognize our contributions by citing our paper:

@article{guan2023end,
  title={End-to-End Video Gaze Estimation via Capturing Head-Face-Eye Spatial-Temporal Interaction Context},
  author={Guan, Yiran and Chen, Zhuoguang and Zeng, Wenzheng and Cao, Zhiguo and Xiao, Yang},
  journal={IEEE Signal Processing Letters},
  volume={30},
  pages={1687--1691},
  year={2023},
  publisher={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
MCGaze_demo		MCGaze_demo
configs		configs
mmdet		mmdet
pictures		pictures
requirements		requirements
results		results
tests		tests
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README_zh-CN.md		README_zh-CN.md
model-index.yml		model-index.yml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context (aka. Multi Clue Gaze)

🥰Our work has been accepted by IEEE Signal Processing Letters！

✨Demo code has been added to this repo！

Introduction

Results and models

Get Started

Prepare your python environment

Prepare your dataset

Inference & Evaluation

Gaze360-setting

l2cs-setting

Training

Gaze360-setting

l2cs-setting

Acknowledgement

Citation

About

Releases

Packages

Contributors 3

Languages

License

zgchen33/MCGaze

Folders and files

Latest commit

History

Repository files navigation

End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context (aka. Multi Clue Gaze)

🥰Our work has been accepted by IEEE Signal Processing Letters！

✨Demo code has been added to this repo！

Introduction

Results and models

Get Started

Prepare your python environment

Prepare your dataset

Inference & Evaluation

Gaze360-setting

l2cs-setting

Training

Gaze360-setting

l2cs-setting

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages