Skip to content

Online Unsupervised Video Object Segmentation via Contrastive Motion Clustering (TCSVT2023)

License

Notifications You must be signed in to change notification settings

xilin1991/ClusterNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ClusterNet

This is an official implementation of "Online Unsupervised Video Object Segmentation via Contrastive Motion Clustering" (IEEE TCSVT).

Papers: Static Badge

Prerequisites

The training and testing experiments are conducted using PyTorch 1.8.1 with a single NVIDIA TITAN RTX GPU with 24GB Memory.

  • python 3.8
  • pytorch 1.8.1
  • torchvision 0.9.1
conda create -n ClusterNet python=3.8
conda activate ClusterNet
conda install pytorch==1.8.1 torchvision==0.9.1 cudatoolkit=10.2 -c pytorch

Other minor Python modules can be installed by running

pip install opencv-python einops

Datasets

  • DAVIS16: We perform online clustering and evaluation on the validation set. However, please download DAVIS17 (Unsupervised 480p) to fit the code.
  • FBMS: This dataset contains videos of multiple moving objects, providing test cases for multiple object segmentation.
  • SegTrackV2: Each sequence contains 1-6 moving objects.

Following the evaluation protocol in CIS, we combine multiple objects as a single foreground and use the region similarity $\mathcal{J}$ to measure the segmentation performance for the FBMS and SegTrackV2. Binary Mask: [FBMS][SegTrackV2]

  • Path configuration: Dataset path settings is --data_dir in main.py.
parser.add_argument('--data_dir', default=None, type=str, help='dataset root dir')
  • The datasets directory structure will be as follows:
|--DAVIS2017
|   |--Annotations_unsupervised
|   |   |--480p
|   |--ImageSets
|   |   |--2016
|   |--Flows_gap_1_${flow_method}
|       |--Full-Resolution
|--FBMS
|   |--Annotations_Binary
|   |--Flows_gap_1_${flow_method}
|--SegTrackv2
    |--Annotations_Binary
    |--Flows_gap_1_${flow_method}

Precompute optical flow

  • The optical flow is estimated by using the PWCNet, RAFT and FlowFormer. In datasets directory, the variable flow_method is PWC, RAFT and FlowFormer, respectively.

  • The flows are resized to the size of the original image (same as Motion Grouping), with each input frame having a size of $480\times854$ for the DAVIS16 and $480\times640$ for the FBMS and SegTrackV2. We convert the optical flow to 3-channel images with the standard visualization used for the optical flow and normalize it to $[-1, 1]$, and use only the previous frames for the optical flow estimation in the online setting.

Train & Inference

To train the ClusterNet model on a GPUs, you can use:

bash scripts/main.sh

In the main.sh file, first activate your Python environment and set gpu_id and data_dir. Then set the hyperparameters batch_size, n_clusters, and threshold to 16, 30, and 0.1, respectively.

Outputs

The model files and checkpoints will be saved in ./checkpoints/${exp_id}.

.pth files with _${sequence_name} store the network weights that initialize our autoencoder to train on DAVIS16 through the loss of optical flow reconstruction.

The segmentation results will be saved in ./results/${exp_id}. The evaluation criterion is the mean region similarity $\mathcal{J}$.

Optical flow prediction Method Mean $\mathcal{J}\uparrow$
PWC-Net MG
ClusterNet
63.7
67.9(+4.2)
RAFT MG
ClusterNet
68.3
72.0(+3.7)
FlowFormer MG
ClusterNet
70.3
75.4(+5.1)

Citation

If you find our work useful in your research please consider citing our paper!

@ARTICLE{ClusterNet,
  author={Xi, Lin and Chen, Weihai and Wu, Xingming and Liu, Zhong and Li, Zhengguo},
  journal={IEEE Transactions on Circuits and Systems for Video Technology}, 
  title={Online Unsupervised Video Object Segmentation via Contrastive Motion Clustering}, 
  year={2023}
}  

Contact

If you have any questions, please feel free to contact Lin Xi (xilin1991@buaa.edu.cn).

Acknowledgement

This project would not have been possible without relying on some awesome repos: Motion Grouping, PWCNet, RAFT and FlowFormer. We thank the original authors for their excellent work.

About

Online Unsupervised Video Object Segmentation via Contrastive Motion Clustering (TCSVT2023)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages