
University of Science and Technology of China1, Alibaba Cloud2, University of Oxford3, The Alan Turing Institute4, Trinity College Dublin5, Alibaba DAMO Academy6
- (April 10, 2025): We're thrilled to share that Zig-RiR has been accepted to IEEE TMI-2025! 🎊.
Abstract: Medical image segmentation has made signiffcant strides with the development of basic models. Speciffcally, models that combine CNNs with transformers can successfully extract both local and global features. However, these models inherit the transformer’s quadratic computational complexity, limiting their efffciency. Inspired by the recent Receptance Weighted Key Value (RWKV) model, which achieves linear complexity for long-distance modeling, we explore its potential for medical image segmentation. While directly applying vision-RWKV yields suboptimal results due to insufffcient local feature exploration and disrupted spatial continuity, we propose a novel nested structure, Zigzag RWKV-in-RWKV (Zig-RiR), to address these issues. It consists of Outer and Inner RWKV blocks to adeptly capture both global and local features without disrupting spatial continuity. We treat local patches as ”visual sentences” and use the Outer Zig-RWKV to explore global information. Then, we decompose each sentence into subpatches (”visual words”) and use the Inner Zig-RWKV to further explore local information among words, at negligible computational cost. We also introduce a Zigzag-WKV attention mechanism to ensure spatial continuity during token scanning. By aggregating visual word and sentence features, our Zig-RiR can effectively explore both global and local information while preserving spatial continuity. Experiments on four medical image segmentation datasets of both 2D and 3D modalities demonstrate the superior accuracy and efffciency of our method, outperforming the state-of-the-art method 14.4 times in speed and reducing GPU memory usage by 89.5% when testing on 1024 × 1024 high-resolution medical images.
Overview of our Zig-RiR with hierarchical encoder-decoder structure.
Zig-RiR adopts a U-shaped architecture consisting of a convolutional stem, a Zig-RiR encoder, and a plain decoder. The key innovation lies in the Zig-RiR block, which features a nested RWKV-in-RWKV structure and a novel Zigzag WKV attention mechanism.
Qualitative comparison on skin leison segmentation (ISIC) and multi-organ segmentation (Synapse & ACDC) tasks. Our proposed Zig-RiR achieves accurate segmentation performance against existing methods.
The code is tested with PyTorch 1.11.0 and CUDA 11.3. After cloning the repository, follow the below steps for installation,
- Create and activate conda environment
conda create --name zig_rir python=3.8
conda activate zig_rir- Install PyTorch and torchvision
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 --extra-index-url https://download.pytorch.org/whl/cu113- Install other dependencies
pip install -r requirements.txtWhen dealing with 3D datasets Synapse and ACDC, we follow the same dataset preprocessing as in nnFormer.
The dataset folders for Synapse should be organized as follows:
./DATASET_Synapse/
├── unetr_pp_raw/
├── unetr_pp_raw_data/
├── Task02_Synapse/
├── imagesTr/
├── imagesTs/
├── labelsTr/
├── labelsTs/
├── dataset.json
├── Task002_Synapse
├── unetr_pp_cropped_data/
├── Task002_Synapse
The dataset folders for ACDC should be organized as follows:
./DATASET_Acdc/
├── unetr_pp_raw/
├── unetr_pp_raw_data/
├── Task01_ACDC/
├── imagesTr/
├── imagesTs/
├── labelsTr/
├── labelsTs/
├── dataset.json
├── Task001_ACDC
├── unetr_pp_cropped_data/
├── Task001_ACDC
Please refer to Setting up the datasets on nnFormer repository for more details. Alternatively, you can download the preprocessed dataset for Synapse and ACDC and extract it under the project directory.
The following scripts can be used for training our Zig-RiR model on the datasets:
###############2D dataset############### We also provide 2D training scripts for the 3D Synapse and ACDC datasets, referring to the slicing data preprocessing script in 2D TransUnet.
CUDA_VISIBLE_DEVICES=0 python train.py --dataset ISIC16 --end_epoch 200 --warm_epochs 5 --lr 0.0003 --train_batchsize 8 --crop_size 512 512 --nclass 2
###############3D dataset############### We refer to the official UNETR++ repository when training 3D datasets.
CUDA_VISIBLE_DEVICES=0 python /zig_rir3d/run/run_training.py 3d_fullres unetr_pp_trainer_synapse 2 0
CUDA_VISIBLE_DEVICES=0 python /zig_rir3d/run/run_training.py 3d_fullres unetr_pp_trainer_acdc 1 0
1- For 2D ISIC dataset, you can run the following command for evaluation:
CUDA_VISIBLE_DEVICES=0 python test2d.py --dataset ISIC16 --end_epoch 200 --warm_epochs 5 --lr 0.0003 --train_batchsize 8 --crop_size 512 512 --nclass 22- For 3D Synapse dataset, find your saved Synapse weight and paste model_final_checkpoint.model in the following path:
zig_rir3d/evaluation/unetr_pp_synapse_checkpoint/unetr_pp/3d_fullres/Task002_Synapse/unetr_pp_trainer_synapse__unetr_pp_Plansv2.1/fold_0/Then, run
bash evaluation_scripts/run_evaluation_synapse.sh3- For 3D ACDC dataset, find your saved ACDC weight and paste model_final_checkpoint.model it in the following path:
zig_rir3d/evaluation/unetr_pp_acdc_checkpoint/unetr_pp/3d_fullres/Task001_ACDC/unetr_pp_trainer_acdc__unetr_pp_Plansv2.1/fold_0/Then, run
bash evaluation_scripts/run_evaluation_acdc.shThis repository is built based on UNETR++ and nnFormer repository.
If you use our work, please consider citing:
@article{chen2025zig,
title={Zig-RiR: Zigzag RWKV-in-RWKV for Efficient Medical Image Segmentation},
author={Chen, Tianxiang and Zhou, Xudong and Tan, Zhentao and Wu, Yue and Wang, Ziyang and Ye, Zi and Gong, Tao and Chu, Qi and Yu, Nenghai and Lu, Le},
journal={IEEE Transactions on Medical Imaging},
year={2025},
publisher={IEEE}
}Should you have any question, you may contact the first two authors at txchen@mail.ustc.edu.cn and xd_zhou@mail.ustc.edu.cn.



