[CVPR 2025] SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
Official Pytorch implementation of SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
Step 1: Create a new environment
conda create -n segman python=3.10
conda activate segman
pip install torch==2.1.2 torchvision==0.16.2Step 2: Install MMSegmentation v0.30.0 by following the installation guidelines and prepare segmentation datasets by following data preparation. The following installation commands works for me:
pip install -U openmim
mim install mmcv-full
cd segmentation
pip install -v -e .
To support torch>=2.1.0, you also need to import from packaging import version and replace Line 75 of /miniconda3/envs/segman/lib/python3.10/site-packages/mmcv/parallel/_functions.py with the following:
if version.parse(torch.__version__) >= version.parse('2.1.0'):
streams = [_get_stream(torch.device("cuda", device)) for device in target_gpus]
else:
streams = [_get_stream(device) for device in target_gpus]
Step 3: Install dependencies using the following commands.
To install Natten, you should modify the following with your PyTorch and CUDA versions accordingly.
pip install natten==0.17.3+torch210cu121 -f https://shi-labs.com/natten/wheels/The Selective Scan 2D can be install with:
cd kernels/selective_scan && pip install .Install other requirements:
pip install -r requirements.txtDownload the ImageNet-1k pretrained weights here and put them in a folder pretrained/. Navigate to the segmentation directory:
cd segmentationScripts to reproduce our paper results are provided in ./scripts
Example training script for SegMAN-B on ADE20K:
# Single-gpu
python tools/train.py local_configs/segman/base/segman_b_ade.py --work-dir outputs/EXP_NAME
# Multi-gpu
bash tools/dist_train.sh local_configs/segman/base/segman_b_ade.py <GPU_NUM> --work-dir outputs/EXP_NAMEDownload trained weights for segmentation models at google drive. Navigate to the segmentation directory:
cd segmentationExample for evaluating SegMAN-B on ADE20K:
# Single-gpu
python tools/test.py local_configs/segman/base/segman_b_ade.py /path/to/checkpoint_file
# Multi-gpu
bash tools/dist_test.sh local_configs/segman/base/segman_b_ade.py /path/to/checkpoint_file <GPU_NUM>
| Model | Backbone (ImageNet-1k Top1 Acc) | mIoU | Params | FLOPs | Config | Download |
|---|---|---|---|---|---|---|
| SegMAN-T | SegMAN Encoder-T (76.2) | 43.0 | 6.4M | 6.2G | config | Google Drive |
| SegMAN-S | SegMAN Encoder-S (84.0) | 51.3 | 29.4M | 25.3G | config | Google Drive |
| SegMAN-B | SegMAN Encoder-B (85.1) | 52.6 | 51.8M | 58.1G | config | Google Drive |
| SegMAN-L | SegMAN Encoder-L (85.5) | 53.2 | 92.6M | 97.1G | config | Google Drive |
| Model | Backbone (ImageNet-1k Top1 Acc) | mIoU | Params | FLOPs | Config | Download |
|---|---|---|---|---|---|---|
| SegMAN-T | SegMAN Encoder-T 76.2) | 80.3 | 6.4M | 52.5G | config | Google Drive |
| SegMAN-S | SegMAN Encoder-S (84.0) | 83.2 | 29.4M | 218.4G | config | Google Drive |
| SegMAN-B | SegMAN Encoder-B (85.1) | 83.8 | 51.8M | 479.0G | config | Google Drive |
| SegMAN-L | SegMAN Encoder-L (85.5) | 84.2 | 92.6M | 769.0G | config | Google Drive |
| Model | Backbone (ImageNet-1k Top1 Acc) | mIoU | Params | FLOPs | Config | Download |
|---|---|---|---|---|---|---|
| SegMAN-T | SegMAN Encoder-T (76.2) | 41.3 | 6.4M | 6.2G | config | Google Drive |
| SegMAN-S | SegMAN Encoder-S (84.0) | 47.5 | 29.4M | 25.3G | config | Google Drive |
| SegMAN-B | SegMAN Encoder-B (85.1) | 48.4 | 51.8M | 58.1G | config | Google Drive |
| SegMAN-L | SegMAN Encoder-L (85.5) | 48.8 | 92.6M | 97.1G | config | Google Drive |
We provide scripts for pre-training the encoder from scratch.
Step 1: Download ImageNet-1k and using this script to extract it.
Step 2: Start training with
bash scripts/train_segman-s.sh
You can visualize segmentation results using pre-trained checkpoints with the following (under segmentation directory):
python image_demo.py \
img_path \
config_file \
checkpoint_file \
--palette 'ade20k' \
--out-file segman_demo.png \
--device 'cuda:0'
Replace img_path, config_file, and checkpoint_file with the image and model you want to visualize. Select a palette from {ade20k, coco_stuff164k, cityscapes}
Our implementation is based on MMSegmentaion, Natten, VMamba, and SegFormer. We gratefully thank the authors.
@inproceedings{SegMAN,
title={SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation},
author={Yunxiang Fu and Meng Lou and Yizhou Yu},
booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2025}
}


