[ICCV2023] Global Knowledge Calibration for Fast Open-Vocabulary Segmentation

Kunyang Han*, Yong Liu*, Jun Hao Liew, Henghui Ding, Yunchao Wei, Jiajun Liu, Yitong Wang, Yansong Tang, Yujiu Yang, Jiashi Feng, Yao Zhao (*equal contribution)

The repository contains the official implementation of "Global Knowledge Calibration for Fast Open-Vocabulary Segmentation"

Paper

📖 Abstract

Recent advancements in pre-trained vision-language models, such as CLIP, have enabled the segmentation of arbitrary concepts solely from textual inputs, a process commonly referred to as open-vocabulary semantic segmentation (OVS). However, existing OVS techniques confront a fundamental challenge: the trained classiﬁer tends to overﬁt on the base classes observed during training, resulting in suboptimal generalization performance to unseen classes. To mitigate this issue, recent studies have proposed the use of an additional frozen pre-trained CLIP for classiﬁcation. Nonetheless, this approach incurs heavy computational overheads as the CLIP vision encoder must be repeatedly forward-passed for each mask, rendering it impractical for real-world applications. To address this challenge, our objective is to develop a fast OVS model that can perform comparably or better without the extra computational burden of the CLIP image encoder during inference. To this end, we propose a core idea of preserving the generalizable representation when ﬁne-tuning on known classes. Speciﬁcally, we introduce a text diversiﬁcation strategy that generates a set of synonyms for each training category, which prevents the learned representation from collapsing onto speciﬁc known category names. Additionally, we employ a textguided knowledge distillation method to preserve the generalizable knowledge of CLIP. Extensive experiments demonstrate that our proposed model achieves robust generalization performance across various datasets. Furthermore, we perform a preliminary exploration of open-vocabulary video segmentation and present a benchmark that can facilitate future open-vocabulary research in the video domain.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs/coco/panoptic-segmentation		configs/coco/panoptic-segmentation
datasets		datasets
demo		demo
imgs		imgs
mask2former		mask2former
pretrained		pretrained
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluation.py		evaluation.py
ipython_commanc.py		ipython_commanc.py
nltk_install.py		nltk_install.py
prepare_panoptic_fpn.py		prepare_panoptic_fpn.py
syn_CLIP_sim.py		syn_CLIP_sim.py
train_net.py		train_net.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ICCV2023] Global Knowledge Calibration for Fast Open-Vocabulary Segmentation

📖 Abstract

📖 Pipeline

📖 Visualization

📖 Results

🎤🎤🎤 Todo

About

Releases

Packages

Languages

License

yongliu20/GKC

Folders and files

Latest commit

History

Repository files navigation

[ICCV2023] Global Knowledge Calibration for Fast Open-Vocabulary Segmentation

📖 Abstract

📖 Pipeline

📖 Visualization

📖 Results

🎤🎤🎤 Todo

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages