[arxiv], [project page], [online demo], [Huggingface paper page],
Jiawei Qin1, Xucong Zhang2, Yusuke Sugano1,
1The University of Tokyo, 2Delft University of Technology
This repository contains the official PyTorch implementation of both MAE pre-training and unigaze.
- ✅ Release pre-trained MAE checkpoints (B, L, H) and gaze estimation training code.
- ✅ Release UniGaze models for inference.
- ✅ Code for predicting gaze from videos
- ✅ (2025 June 08 updated) Release the MAE pre-training code.
- ✅ (2025 August 25 updated) Online demo is available.
- ✅ Easier pip installation.
- ✅ (2026 March updated) Release the gaze dataset normalization code.
You can install UniGaze with the pip command:
pip install torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/cu118
pip install timm==0.3.2
pip install unigaze
pip install -r requirements.txt You can find our UniGaze on the PyPI page: https://pypi.org/project/unigaze/
| Model name | Backbone | Training Data |
|---|---|---|
unigaze_b16_joint |
UniGaze-B | Joint Datasets |
unigaze_l16_joint |
UniGaze-L | Joint Datasets |
unigaze_h14_joint |
UniGaze-H | Joint Datasets |
unigaze_h14_cross_X |
UniGaze-H | ETH-XGaze |
import unigaze
model = unigaze.load("unigaze_h14_joint", device="cuda") # downloads weights from HF on first useTo predict gaze direction from videos, use the following script:
projdir=<...>/UniGaze/unigaze
cd ${projdir}
python predict_gaze_video.py \
--model_name "unigaze_h14_joint" \
-i ./input_video Please refer to MAE Pre-Training.
For detailed training instructions, please refer to UniGaze Training.
- You can refer to load_mae.ipynb for instructions on loading the model and integrating it into your own codebase.
- If you want to load the MAE, use
custom_pretrained_patharguments.
- If you want to load the MAE, use
## Loading MAE-backbone only - this will not load the gaze_fc
mae_h14 = MAE_Gaze(model_type='vit_h_14', custom_pretrained_path='checkpoints/mae_h14/mae_h14_checkpoint-299.pth')If you find our work useful for your research, please consider citing:
@article{qin2025unigaze,
title={UniGaze: Towards Universal Gaze Estimation via Large-scale Pre-Training},
author={Qin, Jiawei and Zhang, Xucong and Sugano, Yusuke},
journal={IEEE Winter Conference on Applications of Computer Vision (WACV)},
year={2025}
}
We also acknowledge the excellent work on MAE.
This model is licensed under the ModelGo Attribution-NonCommercial-ResponsibleAI License, Version 2.0 (MG-NC-RAI-2.0); you may use this model only in compliance with the License. You may obtain a copy of the License at
https://github.com/Xtra-Computing/ModelGo/blob/main/MGL/V2/MG-BY-NC-RAI/LICENSE
A comprehensive introduction to the ModelGo license can be found here: https://www.modelgo.li/
Our method also works for different "faces":
If you have any questions, feel free to contact Jiawei Qin at jqin@iis.u-tokyo.ac.jp.

