Skip to content

txchen-USTC/Zig-RiR

Repository files navigation

[TMI 2025] Zig-RiR: Zigzag RWKV-in-RWKV for Efficient Medical Image Segmentation

Tianxiang Chen1,2, Xudong Zhou1, Zhentao Tan1,2, Yue Wu2, Ziyang Wang3,4, Zi Ye*5, Tao Gong1, Qi Chu1, Nenghai Yu1, and Lu Le6

University of Science and Technology of China1, Alibaba Cloud2, University of Oxford3, The Alan Turing Institute4, Trinity College Dublin5, Alibaba DAMO Academy6

paper

🚀 News

  • (April 10, 2025): We're thrilled to share that Zig-RiR has been accepted to IEEE TMI-2025! 🎊.

Abstract: Medical image segmentation has made signiffcant strides with the development of basic models. Speciffcally, models that combine CNNs with transformers can successfully extract both local and global features. However, these models inherit the transformer’s quadratic computational complexity, limiting their efffciency. Inspired by the recent Receptance Weighted Key Value (RWKV) model, which achieves linear complexity for long-distance modeling, we explore its potential for medical image segmentation. While directly applying vision-RWKV yields suboptimal results due to insufffcient local feature exploration and disrupted spatial continuity, we propose a novel nested structure, Zigzag RWKV-in-RWKV (Zig-RiR), to address these issues. It consists of Outer and Inner RWKV blocks to adeptly capture both global and local features without disrupting spatial continuity. We treat local patches as ”visual sentences” and use the Outer Zig-RWKV to explore global information. Then, we decompose each sentence into subpatches (”visual words”) and use the Inner Zig-RWKV to further explore local information among words, at negligible computational cost. We also introduce a Zigzag-WKV attention mechanism to ensure spatial continuity during token scanning. By aggregating visual word and sentence features, our Zig-RiR can effectively explore both global and local information while preserving spatial continuity. Experiments on four medical image segmentation datasets of both 2D and 3D modalities demonstrate the superior accuracy and efffciency of our method, outperforming the state-of-the-art method 14.4 times in speed and reducing GPU memory usage by 89.5% when testing on 1024 × 1024 high-resolution medical images.


Architecture overview of Zig-RiR

Overview of our Zig-RiR with hierarchical encoder-decoder structure.

Architecture Overview

Zig-RiR adopts a U-shaped architecture consisting of a convolutional stem, a Zig-RiR encoder, and a plain decoder. The key innovation lies in the Zig-RiR block, which features a nested RWKV-in-RWKV structure and a novel Zigzag WKV attention mechanism.

Zig-RiR Block

Zigzag RWKV Block


Visualizaion Results

ISIC & Synapse & ACDC Dataset

Qualitative comparison on skin leison segmentation (ISIC) and multi-organ segmentation (Synapse & ACDC) tasks. Our proposed Zig-RiR achieves accurate segmentation performance against existing methods.

Synapse Results


Installation

The code is tested with PyTorch 1.11.0 and CUDA 11.3. After cloning the repository, follow the below steps for installation,

  1. Create and activate conda environment
conda create --name zig_rir python=3.8
conda activate zig_rir
  1. Install PyTorch and torchvision
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
  1. Install other dependencies
pip install -r requirements.txt

Dataset

When dealing with 3D datasets Synapse and ACDC, we follow the same dataset preprocessing as in nnFormer.

The dataset folders for Synapse should be organized as follows:

./DATASET_Synapse/
  ├── unetr_pp_raw/
      ├── unetr_pp_raw_data/
           ├── Task02_Synapse/
              ├── imagesTr/
              ├── imagesTs/
              ├── labelsTr/
              ├── labelsTs/
              ├── dataset.json
           ├── Task002_Synapse
       ├── unetr_pp_cropped_data/
           ├── Task002_Synapse

The dataset folders for ACDC should be organized as follows:

./DATASET_Acdc/
  ├── unetr_pp_raw/
      ├── unetr_pp_raw_data/
           ├── Task01_ACDC/
              ├── imagesTr/
              ├── imagesTs/
              ├── labelsTr/
              ├── labelsTs/
              ├── dataset.json
           ├── Task001_ACDC
       ├── unetr_pp_cropped_data/
           ├── Task001_ACDC

Please refer to Setting up the datasets on nnFormer repository for more details. Alternatively, you can download the preprocessed dataset for Synapse and ACDC and extract it under the project directory.

Training

The following scripts can be used for training our Zig-RiR model on the datasets:

###############2D dataset############### We also provide 2D training scripts for the 3D Synapse and ACDC datasets, referring to the slicing data preprocessing script in 2D TransUnet.
CUDA_VISIBLE_DEVICES=0 python train.py --dataset ISIC16 --end_epoch 200 --warm_epochs 5 --lr 0.0003 --train_batchsize 8 --crop_size 512 512 --nclass 2

###############3D dataset############### We refer to the official UNETR++ repository when training 3D datasets. 
CUDA_VISIBLE_DEVICES=0 python /zig_rir3d/run/run_training.py 3d_fullres unetr_pp_trainer_synapse 2 0
CUDA_VISIBLE_DEVICES=0 python /zig_rir3d/run/run_training.py 3d_fullres unetr_pp_trainer_acdc 1 0

Evaluation

1- For 2D ISIC dataset, you can run the following command for evaluation:

CUDA_VISIBLE_DEVICES=0 python test2d.py --dataset ISIC16 --end_epoch 200 --warm_epochs 5 --lr 0.0003 --train_batchsize 8 --crop_size 512 512 --nclass 2

2- For 3D Synapse dataset, find your saved Synapse weight and paste model_final_checkpoint.model in the following path:

zig_rir3d/evaluation/unetr_pp_synapse_checkpoint/unetr_pp/3d_fullres/Task002_Synapse/unetr_pp_trainer_synapse__unetr_pp_Plansv2.1/fold_0/

Then, run

bash evaluation_scripts/run_evaluation_synapse.sh

3- For 3D ACDC dataset, find your saved ACDC weight and paste model_final_checkpoint.model it in the following path:

zig_rir3d/evaluation/unetr_pp_acdc_checkpoint/unetr_pp/3d_fullres/Task001_ACDC/unetr_pp_trainer_acdc__unetr_pp_Plansv2.1/fold_0/

Then, run

bash evaluation_scripts/run_evaluation_acdc.sh

Acknowledgement

This repository is built based on UNETR++ and nnFormer repository.

Citation

If you use our work, please consider citing:

@article{chen2025zig,
  title={Zig-RiR: Zigzag RWKV-in-RWKV for Efficient Medical Image Segmentation},
  author={Chen, Tianxiang and Zhou, Xudong and Tan, Zhentao and Wu, Yue and Wang, Ziyang and Ye, Zi and Gong, Tao and Chu, Qi and Yu, Nenghai and Lu, Le},
  journal={IEEE Transactions on Medical Imaging},
  year={2025},
  publisher={IEEE}
}

Contact

Should you have any question, you may contact the first two authors at txchen@mail.ustc.edu.cn and xd_zhou@mail.ustc.edu.cn.

About

[TMI 2025] Zig-RiR: Zigzag RWKV-in-RWKV for Efficient Medical Image Segmentation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages