Official repository for CVPR 2024 paper: Class Tokens Infusion for Weakly Supervised Semantic Segmentation by Sung-Hoon Yoon, Hoyong Kwon, Hyeonseong Kim, and Kuk-Jin Yoon.
-
Tested on Ubuntu 20.04, with Python 3.9, PyTorch 1.8.2, CUDA 11.7, multi gpus(2) - Nvidia RTX 3090.
-
If you encounter OOM, try to reduce the batchsize (32) - but not checked.
-
You can create conda environment with the provided yaml file.
conda env create -f environment.yaml
- The PASCAL VOC 2012 development kit: You need to specify place VOC2012 under ./data folder.
-
Download MS COCO images from the official COCO website here.
-
Download semantic segmentation annotations for the MS COCO dataset here. (Refer RIB)
-
Directory hierarchy
./data
├── VOC2012
└── COCO2014
├── SegmentationClass # GT dir
├── train2014 # train images downloaded from the official COCO website
└── val2014 # val images downloaded from the official COCO website
- ImageNet-pretrained weights for ViT are from deit_small_imagenet.pth.
You need to place the weights as "./pretrained/deit_small_imagenet.pth. "
With the following code, you can generate CAMs (seeds) to train the segmentation network. For the further refinement, refer RIB.
We will also update the RIB (transformer version) soon (July,2024)
- Please specify the name of your experiment.
- Training results are saved at ./experiment/[exp_name]
For PASCAL:
python train_trm.py --name [exp_name] --exp cti_cvpr24
For COCO:
python train_trm_coco.py --name [exp_name] --exp cti_coco_cvpr24
Note that the mIoU in COCO training set is evaluated on the subset (5.2k images, not the full set of 80k images) for fast evaluation
- Pretrained weight (PASCAL, seed: 69.5% mIoU) can be downloaded here (69.5_pascal.pth).
For pretrained model (69.5%):
python infer_trm.py --name [exp_name] --load_pretrained [DIR_of_69.5%_ckpt] --load_epo 100 --dict
For model you trained:
python infer_trm.py --name [exp_name] --load_epo [EPOCH] --dict
python evaluation.py --name [exp_name] --task cam --dict_dir dict
If our code be useful for you, please consider citing our CVPR 2024 paper using the following BibTeX entry.
@inproceedings{yoon2024class,
title={Class Tokens Infusion for Weakly Supervised Semantic Segmentation},
author={Yoon, Sung-Hoon and Kwon, Hoyong and Kim, Hyeonseong and Yoon, Kuk-Jin},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={3595--3605},
year={2024}
}
You can also check our earlier works published on ICCV 2021 (OC-CSE) , ECCV 2022 (AEFT), CVPR 2023 (ACR)
Beside, in ECCV 24, "Diffusion-Guided Weakly Supervised Semantic Segmentation" and "Phase Concentration and Shortcut Suppression for Weakly Supervised Semantic Segmentation" will be published. Check our github! :)
We heavily borrow the work from MCTformer and RIB repository. Thanks for the excellent codes!
[1] Xu, Lian, et al. "Multi-class token transformer for weakly supervised semantic segmentation." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.
[2] Lee, Jungbeom, et al. "Reducing information bottleneck for weakly supervised semantic segmentation." Advances in neural information processing systems 34 (2021): 27408-27421.