Skip to content

yihuacheng/IVGaze

Repository files navigation

What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation

Yihua Cheng , Yaning Zhu, Zongji Wang, Hongquan Hao, Yongwei Liu, Shiqing Cheng, Xi Wang, Hyung Jin Chang, CVPR 2024

Description

This repository provides offical code of the paper titled What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation, accepted at CVPR24. Our contribution includes:

  • We provide a dataset IVGaze collected on vehicles containing 44k images of 125 subjects.
  • We propose a gaze pyramid transformer (GazePTR) that leverages transformer-based multilevel features integration.
  • We introduce the dual-stream gaze pyramid transformer (GazeDPTR). Employing perspective transformation, we rotate virtual cameras to normalize images, utilizing camera pose to merge normalized and original images for accurate gaze estimation.

Please visit our project page for details. The dataset is available on this page .

Requirement

  1. Install Pytorch and torchvision. This code is written in Python 3.8 and utilizes PyTorch 1.13.1 with CUDA 11.6 on Nvidia GeForce RTX 3090. While this environment is recommended, it is not mandatory. Feel free to run the code on your preferred environment.
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
  1. Install other packages.
pip install opencv-python PyYAML easydict warmup_scheduler

If you have any issues due to missing packages, please report them. I will update the requirements. Thank you for your cooperation.

Training

Step 1: Choose the model file.

We provide three models GazePTR.py, GazeDPTR.py and GazeDPTR_v2.py. (We will update pretrained weights ASAP.)

Name Description Input Output Accuracy Pretrained Weights
1 GazePTR This method leverages multi-level feature. Normalized Images Gaze Directions 7.04° Link
2 GazeDPTR This method integrates feature from two images. Normalized Images Original Images Gaze Directions 6.71° Link
3 GazeDPTR_V2 This method contains a diffierential projection for gaze zone prediction. Normalized Images Original Images Gaze Directions Gaze Zone 6.71° 81.8% Link

Please choose one model and rename it as model.py, e.g.,

cp GazeDPTR.py model.py

Step 2: Modify the config file

Please modify config/train/config_iv.yaml according to your environment settings.

  • The Save attribute specifies the save path, where the model will be stored atos.path.join({save.metapath}, {save.folder}). Each saved model will be named as Iter_{epoch}_{save.model_name}.pt
  • The data attribute indicates the dataset path. Update the image and label to match your dataset location.

Step 3: Training models

Run the following command to initiate training. The argument 3 indicates that it will automatically perform three-fold cross-validation:

python trainer/leave.py config/train/config_iv.yaml 3

Once the training is complete, you will find the weights saved at os.path.join({save.metapath}, {save.folder}). Within the checkpoint directory, you will find three folders named train1.txt, train2.txt, and train3.txt, corresponding to the three-fold cross-validation. Each folder contains the respective trained model."

Testing

Run the following command for testing.

python tester/leave.py config/train/config_iv.yaml config/test/config_iv.yaml 3

Similarly,

  • Update the image and label in config/test/config_iv.yaml based on your dataset location.
  • The savename attribute specifies the folder to save prediction results, which will be stored at os.path.join({save.metapath}, {save.folder}) as defined in config/train/config_iv.yaml.
  • The code tester/leave.py provides the gaze zone prediction results. Remove it if you do not require gaze zone prediction.

Evaluation

We provide evaluation.py script to assess the accuracy of gaze direction estimation. Run the following command:

python evaluation.py {PATH}

Replace {PATH} with the path of {savename} as configured in your settings.

Contact

Please send email to y.cheng.2@bham.ac.uk if you have any questions.

About

What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages