Superpatch-based 3D Medical Image Self-Supervised Pretraining via Noise-Enhanced Dual-Masking
IEEE AICAS 2026
NEMESIS is a self-supervised pretraining framework for 3D CT volumes. It addresses the core challenge of applying Vision Transformers (ViTs) to volumetric medical images — memory constraints and annotation scarcity — through three complementary ideas:
- Superpatch processing: Randomly crop 128³ sub-volumes from full CT scans, enabling ViT-scale pretraining without memory-prohibitive full-volume attention.
- Dual-masking (MATB): Apply both plane-wise (axial/xy) and axis-wise (depth/z) masking jointly, exploiting the natural anisotropy of CT acquisition.
- NEMESIS Tokens (NTs): Learnable tokens that attend over unmasked patch slices via multi-head cross-attention, providing a compact summary of visible context for the decoder.
Noise injection during reconstruction further regularises the encoder representations.
| Method | AUROC | F1 |
|---|---|---|
| NEMESIS (frozen) | 0.9633 | 0.8791 |
| SuPreM (fine-tuned) | 0.9493 | 0.8602 |
| VoCo (fine-tuned) | 0.9387 | 0.8441 |
| ResNet3D-50 | 0.9312 | 0.8279 |
| Random ViT | 0.8843 | 0.7915 |
NEMESIS frozen encoder outperforms fully fine-tuned competing methods with 32× fewer GFLOPs.
git clone https://github.com/hsjung/NEMESIS.git
cd NEMESIS
# Conda (recommended)
conda env create -f environment.yml
conda activate nemesis
# or pip (core model only)
pip install -e .
# with benchmark dependencies
pip install -e ".[benchmark]"Download from HuggingFace Hub:
pip install huggingface_hub
huggingface-cli download whilethis/NEMESIS MAE_768_0.5.pt --local-dir pretrained/See pretrained/README.md for details and alternative download methods.
NEMESIS was pretrained on a mixed dataset of publicly available CT scans. Prepare a JSON index file in the following format:
{
"train": [{"image": "/path/to/scan.nii.gz"}, ...],
"val": [{"image": "/path/to/scan.nii.gz"}, ...],
"test": [{"image": "/path/to/scan.nii.gz"}, ...]
}Public datasets used during pretraining:
data/
BTCV/
imagesTr/ # NIfTI CT volumes (*.nii.gz)
labelsTr/ # NIfTI label maps (*.nii.gz)
Download BTCV from Synapse and place under data/BTCV/.
python scripts/pretrain.py \
--config configs/pretrain.yaml \
--exp_name NEMESIS_pretrain \
--data_json data/combined_dataset.json \
--epochs 50 \
--batch_size 4 \
--mask_ratio 0.5 \
--device_ids 0Key options:
| Argument | Default | Description |
|---|---|---|
--config |
configs/pretrain.yaml |
YAML config file |
--exp_name |
required | Experiment name (creates results/<name>/) |
--data_json |
required | Path to dataset JSON index |
--epochs |
50 | Total training epochs |
--batch_size |
4 | Batch size per GPU |
--mask_ratio |
0.5 | Masking ratio for both axes (plane + axis) |
--embed_dim |
768 | Encoder embedding dimension |
--device_ids |
0 |
Comma-separated GPU IDs |
--amp |
off | Enable mixed-precision training |
--resume |
— | Path to checkpoint to resume from |
# Run all baselines + NEMESIS
bash scripts/run_benchmarks.sh 0 # GPU 0
# Or run a single config
python benchmark/scripts/train_classification.py \
--config benchmark/configs/btcv_cls_nemesis.yaml \
--device_ids 0Results are written to results/<experiment_name>/.
Benchmark configs available:
| Config | Model |
|---|---|
btcv_cls_nemesis.yaml |
NEMESIS (frozen encoder) |
btcv_cls_nemesis_finetune.yaml |
NEMESIS (fine-tuned encoder) |
btcv_cls_random_vit.yaml |
Random ViT (untrained) |
btcv_cls_resnet3d.yaml |
ResNet3D-50 |
btcv_cls_voco.yaml |
VoCo (SwinUNETR) |
btcv_cls_suprem.yaml |
SuPreM (SwinUNETR) |
For VoCo and SuPreM, download their pretrained weights separately:
- VoCo:
pretrained/VoCo_B_SSL_head.pt— VoCo official repo - SuPreM:
pretrained/supervised_suprem_swinunetr_2100.pth— SuPreM official repo
NEMESIS/
├── nemesis/ # Core model package
│ └── models/
│ └── mae.py # MAEgic3DMAE, MAEgicEncoder, MAEgicDecoder
├── benchmark/ # Downstream evaluation
│ ├── configs/ # YAML configs per method
│ ├── datasets/ # BTCV, Synapse, KiTS23, MSD Pancreas
│ ├── models/ # Classifier/segmentation heads
│ ├── scripts/ # train_classification.py, train_segmentation.py
│ └── training/ # Trainers, metrics
├── configs/
│ └── pretrain.yaml # Default pretraining config
├── scripts/
│ ├── pretrain.py # Pretraining entry point
│ └── run_benchmarks.sh # Run all benchmarks
├── pretrained/ # Place .pt weights here (see README inside)
├── data/ # Place datasets here (gitignored)
├── requirements.txt
└── environment.yml
If you use NEMESIS in your research, please cite:
@inproceedings{jung2026nemesis,
title = {{NEMESIS}: Superpatch-based 3{D} Medical Image Self-Supervised Pretraining
via Noise-Enhanced Dual-Masking},
author = {Jung, Hyeonseok and others},
booktitle = {IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)},
year = {2026},
}MIT License. See LICENSE for details.