Skip to content

tfzhou/ContrastiveSeg

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
July 28, 2021 12:45
February 7, 2021 04:36
lib
July 28, 2021 12:45
July 28, 2021 12:45
July 28, 2021 12:45
February 7, 2021 04:36
February 7, 2021 04:36
October 13, 2022 21:22
February 7, 2021 04:36

Exploring Cross-Image Pixel Contrast for Semantic Segmentation

Exploring Cross-Image Pixel Contrast for Semantic Segmentation,
Wenguan Wang, Tianfei Zhou, Fisher Yu, Jifeng Dai, Ender Konukoglu and Luc Van Gool
ICCV 2021 (Oral) (arXiv 2101.11939)

News

Abstract

Current semantic segmentation methods focus only on mining “local” context, i.e., dependencies between pixels within individual images, by context-aggregation modules (e.g., dilated convolution, neural attention) or structureaware optimization criteria (e.g., IoU-like loss). However, they ignore “global” context of the training data, i.e., rich semantic relations between pixels across different images. Inspired by the recent advance in unsupervised contrastive representation learning, we propose a pixel-wise contrastive framework for semantic segmentation in the fully supervised setting. The core idea is to enforce pixel embeddings belonging to a same semantic class to be more similar than embeddings from different classes. It raises a pixel-wise metric learning paradigm for semantic segmentation, by explicitly exploring the structures of labeled pixels, which are long ignored in the field. Our method can be effortlessly incorporated into existing segmentation frameworks without extra overhead during testing.

We experimentally show that, with famous segmentation models (i.e., DeepLabV3, HRNet, OCR) and backbones (i.e., ResNet, HRNet), our method brings consistent performance improvements across diverse datasets (i.e., Cityscapes, PASCALContext, COCO-Stuff).

Installation

This implementation is built on openseg.pytorch. Many thanks to the authors for the efforts.

Please follow the Getting Started for installation and dataset preparation.

Performance

Cityscapes Dataset

Backbone Model Train Set Val Set Iterations Batch Size Contrast Loss Memory mIoU Log CKPT Script
ResNet-101 DeepLab-V3 train val 40000 8 N N 72.75 log ckpt scripts/cityscapes/deeplab/run_r_101_d_8_deeplabv3_train.sh
ResNet-101 DeepLab-V3 train val 40000 8 Y N 77.67 log ckpt scripts/cityscapes/deeplab/run_r_101_d_8_deeplabv3_contrast_train.sh
HRNet-W48 HRNet-W48 train val 40000 8 N N 79.27 log ckpt scripts/cityscapes/hrnet/run_h_48_d_4.sh
HRNet-W48 HRNet-W48 train val 40000 8 Y N 80.18 log ckpt scripts/cityscapes/hrnet/run_h_48_d_4_contrast.sh

It seems that the DeepLab-V3 baseline does not produce the expected performance on the new codebase. I will tune this later.

Study of the temperature

Backbone Train Set Val Set Iterations Batch Size Temperature mIoU
HRNet-W48 train val 40000 8 0.05 79.80
HRNet-W48 train val 40000 8 0.07 79.59
HRNet-W48 train val 40000 8 0.10 80.18
HRNet-W48 train val 40000 8 0.20 80.01
HRNet-W48 train val 40000 8 0.30 79.27
HRNet-W48 train val 40000 8 0.40 79.40

t-SNE Visualization

  • Pixel-wise Cross-Entropy Loss

  • Pixel-wise Contrastive Learning Objective

Citation

@inproceedings{Wang_2021_ICCV,
    author    = {Wang, Wenguan and Zhou, Tianfei and Yu, Fisher and Dai, Jifeng and Konukoglu, Ender and Van Gool, Luc},
    title     = {Exploring Cross-Image Pixel Contrast for Semantic Segmentation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    year      = {2021},
    pages     = {7303-7313}
}