This is an official implementation of "Online Unsupervised Video Object Segmentation via Contrastive Motion Clustering" (IEEE TCSVT).
The training and testing experiments are conducted using PyTorch 1.8.1 with a single NVIDIA TITAN RTX GPU with 24GB Memory.
- python 3.8
- pytorch 1.8.1
- torchvision 0.9.1
conda create -n ClusterNet python=3.8
conda activate ClusterNet
conda install pytorch==1.8.1 torchvision==0.9.1 cudatoolkit=10.2 -c pytorch
Other minor Python modules can be installed by running
pip install opencv-python einops
- DAVIS16: We perform online clustering and evaluation on the validation set. However, please download DAVIS17 (Unsupervised 480p) to fit the code.
- FBMS: This dataset contains videos of multiple moving objects, providing test cases for multiple object segmentation.
- SegTrackV2: Each sequence contains 1-6 moving objects.
Following the evaluation protocol in CIS, we combine multiple objects as a single foreground and use the region similarity
- Path configuration: Dataset path settings is
--data_dir
inmain.py
.
parser.add_argument('--data_dir', default=None, type=str, help='dataset root dir')
- The datasets directory structure will be as follows:
|--DAVIS2017
| |--Annotations_unsupervised
| | |--480p
| |--ImageSets
| | |--2016
| |--Flows_gap_1_${flow_method}
| |--Full-Resolution
|--FBMS
| |--Annotations_Binary
| |--Flows_gap_1_${flow_method}
|--SegTrackv2
|--Annotations_Binary
|--Flows_gap_1_${flow_method}
-
The optical flow is estimated by using the PWCNet, RAFT and FlowFormer. In datasets directory, the variable
flow_method
isPWC
,RAFT
andFlowFormer
, respectively. -
The flows are resized to the size of the original image (same as Motion Grouping), with each input frame having a size of
$480\times854$ for the DAVIS16 and$480\times640$ for the FBMS and SegTrackV2. We convert the optical flow to 3-channel images with the standard visualization used for the optical flow and normalize it to$[-1, 1]$ , and use only the previous frames for the optical flow estimation in the online setting.
To train the ClusterNet model on a GPUs, you can use:
bash scripts/main.sh
In the main.sh
file, first activate your Python environment and set gpu_id
and data_dir
. Then set the hyperparameters batch_size
, n_clusters
, and threshold
to 16, 30, and 0.1, respectively.
The model files and checkpoints will be saved in ./checkpoints/${exp_id}
.
.pth
files with _${sequence_name}
store the network weights that initialize our autoencoder to train on DAVIS16 through the loss of optical flow reconstruction.
The segmentation results will be saved in ./results/${exp_id}
. The evaluation criterion is the mean region similarity
Optical flow prediction | Method | Mean |
---|---|---|
PWC-Net |
MG ClusterNet |
63.7 67.9(+4.2) |
RAFT |
MG ClusterNet |
68.3 72.0(+3.7) |
FlowFormer |
MG ClusterNet |
70.3 75.4(+5.1) |
If you find our work useful in your research please consider citing our paper!
@ARTICLE{ClusterNet,
author={Xi, Lin and Chen, Weihai and Wu, Xingming and Liu, Zhong and Li, Zhengguo},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
title={Online Unsupervised Video Object Segmentation via Contrastive Motion Clustering},
year={2023}
}
If you have any questions, please feel free to contact Lin Xi (xilin1991@buaa.edu.cn).
This project would not have been possible without relying on some awesome repos: Motion Grouping, PWCNet, RAFT and FlowFormer. We thank the original authors for their excellent work.