Learning Quality-aware Dynamic Memory for Video Object Segmentation

ECCV 2022 | Paper

Abstract

Previous memory-based methods mainly focus on better matching between the current frame and the memory frames without explicitly paying attention to the quality of the memory. Therefore, frames with poor segmentation masks are prone to be memorized, which leads to a segmentation mask error accumulation problem and further affect the segmentation performance. In addition, the linear increase of memory frames with the growth of frame number also limits the ability of the models to handle long videos. To this end, we propose a Quality-aware Dynamic Memory Network (QDMN) to evaluate the segmentation quality of each frame, allowing the memory bank to selectively store accurately segmented frames to prevent the error accumulation problem. Then, we combine the segmentation quality with temporal consistency to dynamically update the memory bank to make the model have ability to hande videos of arbitray length.

Framework

Visualization Results

Long Video Comparison

(a) is the results of retaining the most recent memory frames and (b) is applying our updating strategy.

Results (S012)

Dataset	Split	J&F	J	F
DAVIS 2016	val	92.0	90.7	93.2
DAVIS 2017	val	85.6	82.5	88.6
DAVIS 2017	test-dev	81.9	78.1	85.4

Dataset	Split	Overall Score	J-Seen	F-Seen	J-Unseen	F-Unseen
YouTubeVOS 18	validation	83.8	82.7	87.5	78.4	86.4

Pretrained Model

Please download the pretrained s012 model here.

Requirements

The following packages are used in this project.

Pytorch 1.8.1 (or higher version)
torchvision 0.9.1 (or higher version)
opencv
pillow
progressbar2
thinspline for training (https://github.com/cheind/py-thin-plate-spline)
gitpython
gdown

For installing Pytorch and torchvision, please refer to the official guideline.

For others, you can install them by pip install -r requirements.txt.

Data Preparation

Please refer to MiVOS to prepare the datasets and put all datasets in /data.

Code Structure

├── data/: here are train and test datasets.
│   ├── static
│   ├── DAVIS
│   ├── YouTube
│   ├── BL30K
├── datasets/: transform and dataloader for train and test datasets
├── model/: here are the code of the network and training engine(model.py)
├── saves/: here are the checkpoint obtained from training
├── scripts/: some function used to process dataset
├── util/: here are the config(hyper_para.py) and some utils
├── train.py
├── inference_core.py: test engine for DAVIS
├── inference_core_yv.py: test engine for YouTubeVOS
├── eval_*.py
├── requirements.txt

If you encounter the problem of prediction score is 0 in the pre-training stage, please change the ReLU activation function of FC layer in QAM to sigmoid, which will solve the above problem. The corresponding code is on line 174 and 175 of the model/modules.py file.

Training

For pretraining:

To train on the static image datasets, use the following command:

CUDA_VISIBLE_DEVICES=[GPU_ids] OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port [cccc] --nproc_per_node=GPU_num train.py --id [save_name] --stage 0

For example, if we use 2 GPU for training and use 's0-QDMN' as ckpt name, the command is:

CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 12345 --nproc_per_node=2 train.py --id s0-QDMN --stage 0

For main training:

To train on DAVIS and YouTube, use this command:

CUDA_VISIBLE_DEVICES=[GPU_ids] OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port [cccc] --nproc_per_node=GPU_num train.py --id [save_name] --stage 2 --load_network path_to_pretrained_ckpt

Samely, if using 2 GPU, the command is:

CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 12345 --nproc_per_node=2 train.py --id s03-QDMN --stage 2 --load_network saves/s0-QDMN/**.pth

Resume training

Besides, if you want to resume interrupted training, you can run the command with --load_model and using the *_checkpoint.pth, for example:

CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 12345 --nproc_per_node=2 train.py --id s0-QDMN --stage 0 --load_model saves/s0-QDMN/s0-QDMN_checkpoint.pth

Inference

Run the following file to perform inference on the corresponding dataset.

eval_davis_2016.py used for DAVIS 2016 val set.
eval_davis.py used for DAVIS 2017 val and test-dev set (controlled by --split).
eval_youtube.py used for YouTubeVOS 2018/19 val and test set.

Evaluation

For the evaluation metric on DAVIS 2016/2017 val set, we refer to the repository DAVIS_val. For DAVIS 2017 test-dev set, you can get the metric results by submitting masks to the Codalab website DAVIS_test For YouTube2019 val set, please submit your results to YouTube19 For YouTube2018 val set, please submit to YouTube18

Citation

If you find this work useful for your research, please cite:

@inproceedings{liu2022learning,
  title={Learning quality-aware dynamic memory for video object segmentation},
  author={Liu, Yong and Yu, Ran and Yin, Fei and Zhao, Xinyuan and Zhao, Wei and Xia, Weihao and Yang, Yujiu},
  booktitle={ECCV},
  pages={468--486},
  year={2022}
}

Acknowledgement

Code in this repository is built upon several public repositories. Thanks to STCN, MiVOS, Mask Scoring RCNN for sharing their code.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data		data
dataset		dataset
docs		docs
model		model
scripts		scripts
util		util
README.md		README.md
eval_davis.py		eval_davis.py
eval_davis_2016.py		eval_davis_2016.py
eval_youtube.py		eval_youtube.py
inference_core.py		inference_core.py
inference_core_yv.py		inference_core_yv.py
requirements.txt		requirements.txt
train.py		train.py

yongliu20/QDMN

Folders and files

Latest commit

History

Repository files navigation