Skip to content

yunxiangfu2001/SegMAN

Repository files navigation

[CVPR 2025] SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation

Official Pytorch implementation of SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation

SegMAN

Main Results

Installation and data preparation

Step 1: Create a new environment

conda create -n segman python=3.10
conda activate segman

pip install torch==2.1.2 torchvision==0.16.2

Step 2: Install MMSegmentation v0.30.0 by following the installation guidelines and prepare segmentation datasets by following data preparation. The following installation commands works for me:

pip install -U openmim
mim install mmcv-full
cd segmentation
pip install -v -e .

To support torch>=2.1.0, you also need to import from packaging import version and replace Line 75 of /miniconda3/envs/segman/lib/python3.10/site-packages/mmcv/parallel/_functions.py with the following:

if version.parse(torch.__version__) >= version.parse('2.1.0'):
    streams = [_get_stream(torch.device("cuda", device)) for device in target_gpus]
else:
    streams = [_get_stream(device) for device in target_gpus]

Step 3: Install dependencies using the following commands.

To install Natten, you should modify the following with your PyTorch and CUDA versions accordingly.

pip install natten==0.17.3+torch210cu121 -f https://shi-labs.com/natten/wheels/

The Selective Scan 2D can be install with:

cd kernels/selective_scan && pip install .

Install other requirements:

pip install -r requirements.txt

Training

Download the ImageNet-1k pretrained weights here and put them in a folder pretrained/. Navigate to the segmentation directory:

cd segmentation

Scripts to reproduce our paper results are provided in ./scripts Example training script for SegMAN-B on ADE20K:

# Single-gpu
python tools/train.py local_configs/segman/base/segman_b_ade.py --work-dir outputs/EXP_NAME

# Multi-gpu
bash tools/dist_train.sh local_configs/segman/base/segman_b_ade.py <GPU_NUM> --work-dir outputs/EXP_NAME

Evaluation

Download trained weights for segmentation models at google drive. Navigate to the segmentation directory:

cd segmentation

Example for evaluating SegMAN-B on ADE20K:

# Single-gpu
python tools/test.py local_configs/segman/base/segman_b_ade.py /path/to/checkpoint_file

# Multi-gpu
bash tools/dist_test.sh local_configs/segman/base/segman_b_ade.py /path/to/checkpoint_file <GPU_NUM>

ADE20K

Model Backbone (ImageNet-1k Top1 Acc) mIoU Params FLOPs Config Download
SegMAN-T SegMAN Encoder-T (76.2) 43.0 6.4M 6.2G config Google Drive
SegMAN-S SegMAN Encoder-S (84.0) 51.3 29.4M 25.3G config Google Drive
SegMAN-B SegMAN Encoder-B (85.1) 52.6 51.8M 58.1G config Google Drive
SegMAN-L SegMAN Encoder-L (85.5) 53.2 92.6M 97.1G config Google Drive

Cityscapes

Model Backbone (ImageNet-1k Top1 Acc) mIoU Params FLOPs Config Download
SegMAN-T SegMAN Encoder-T 76.2) 80.3 6.4M 52.5G config Google Drive
SegMAN-S SegMAN Encoder-S (84.0) 83.2 29.4M 218.4G config Google Drive
SegMAN-B SegMAN Encoder-B (85.1) 83.8 51.8M 479.0G config Google Drive
SegMAN-L SegMAN Encoder-L (85.5) 84.2 92.6M 769.0G config Google Drive

COCO-Stuff

Model Backbone (ImageNet-1k Top1 Acc) mIoU Params FLOPs Config Download
SegMAN-T SegMAN Encoder-T (76.2) 41.3 6.4M 6.2G config Google Drive
SegMAN-S SegMAN Encoder-S (84.0) 47.5 29.4M 25.3G config Google Drive
SegMAN-B SegMAN Encoder-B (85.1) 48.4 51.8M 58.1G config Google Drive
SegMAN-L SegMAN Encoder-L (85.5) 48.8 92.6M 97.1G config Google Drive

Encoder Pre-training

We provide scripts for pre-training the encoder from scratch.

Step 1: Download ImageNet-1k and using this script to extract it.

Step 2: Start training with

bash scripts/train_segman-s.sh

Visualization

You can visualize segmentation results using pre-trained checkpoints with the following (under segmentation directory):

python image_demo.py \
img_path \
config_file \
checkpoint_file \
--palette 'ade20k' \
--out-file segman_demo.png \
--device 'cuda:0'

Replace img_path, config_file, and checkpoint_file with the image and model you want to visualize. Select a palette from {ade20k, coco_stuff164k, cityscapes}

Acknowledgements

Our implementation is based on MMSegmentaion, Natten, VMamba, and SegFormer. We gratefully thank the authors.

Citation

@inproceedings{SegMAN,
    title={SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation},
    author={Yunxiang Fu and Meng Lou and Yizhou Yu},
    booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    year={2025}
}

About

[CVPR 2025] SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published