Open-Vocabulary Segmentation with Semantic-Assisted Calibration [CVPR 2024]

Yong Liu*, Sule Bai*, Guanbin Li, Yitong Wang, Yansong Tang (*equal contribution)

The repository contains the official implementation of "Open-Vocabulary Segmentation with Semantic-Assisted Calibration"

Paper

📖 Pipeline & Results

If you find any bugs due to carelessness on our part in organizing the code, feel free to contact us and point that!

Installation

Please see installation guide.

Data Preparation

Please follow the instruction of ov-seg to prepare the training and test data. The data should be organized like:

$DETECTRON2_DATASETS/
  coco/                 # COCOStuff-171
  ADEChallengeData2016/ # ADE20K-150
  ADE20K_2021_17_01/    # ADE20K-847
  VOCdevkit/
    VOC2012/            # PASCALVOC-20
    VOC2010/            # PASCALContext-59, PASCALContext-459

Usage

Pretrained Weight
We have provided the pretrained SCAN-VitL weights and the finetuned Contextual-shifted CLIP weights. Please download them from here.

Evaluation

python train_net.py --eval-only --config-file <CONFIG_FILE> --num-gpus <NUM_GPU> OUTPUT_DIR <OUTPUT_PATH> MODEL.WEIGHTS <TRAINED_MODEL_PATH>

Here is an example:

python train_net.py --num-gpu 8 --eval-only --config-file configs/scan_vitL.yaml MODEL.WEIGHTS ./SCAN.pth DATASETS.TEST \(\"ade20k_sem_seg_val\",\) MODEL.CLIP_ADAPTER.REPLACE_RATIO 0.05 MODEL.CLIP_ADAPTER.CLIP_ENSEMBLE_WEIGHT 0.75 MODEL.CLIP_ADAPTER.MASK_THR 0.55

Training

Train the segmentation model:

python train_net.py  --config-file <CONFIG_FILE> --num-gpus <NUM_GPU>

Here is an example:

python train_net.py  --num-gpu 8 --config-file configs/scan_vitL.yaml

Fuse segmentation model with finetuned CLIP.

We have provided the finetuned CLIP weights. You can directly fuse the pretrained weights with the segmentation model to get the final model. The fuse command is:

cd tools
python replace_clip.py

You need to specify the "clip_ckpt" and "ovseg_model" in the file according to your CLIP path and segmentation model path.

(Optional) If you want to finetune the CLIP model from scratch, please follow ov-seg to prepare the corresponding data. The finetued command is:

cd open_clip_training
cd src
bash scripts/finetune_VitL_with_mask.sh

Cite

If you find our work helpful, we'd appreciate it if you could cite our paper in your work.

@article{liu2023open,
  title={Open-Vocabulary Segmentation with Semantic-Assisted Calibration},
  author={Liu, Yong and Bai, Sule and Li, Guanbin and Wang, Yitong and Tang, Yansong},
  journal={arXiv preprint arXiv:2312.04089},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
datasets		datasets
imgs		imgs
open_clip_training		open_clip_training
scan		scan
tools		tools
.gitignore		.gitignore
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
requirements.txt		requirements.txt
train_net.py		train_net.py

License

yongliu20/SCAN

Folders and files

Latest commit

History

Repository files navigation

Open-Vocabulary Segmentation with Semantic-Assisted Calibration [CVPR 2024]

📖 Pipeline & Results

Tab of Content

Installation

Data Preparation

Usage

Pretrained Weight

Evaluation

Training

Cite

About

Resources

License

Stars

Watchers

Forks

Languages