Towards All-Day Perception for Off-Road Driving: A Large-Scale Multispectral Dataset and Comprehensive Benchmark
Shuo Wang, Jilin Mei, Wenfei Guan, Shuai Wang, Yan Xing, Chen Minβ , Yu Huβ
Institute of Computing Technology, Chinese Academy of Sciences
Off-road nighttime autonomous driving suffers from unreliable visible-light perception, making infrared modality crucial for accurate freespace detection. We present the IRON dataset β the first large-scale infrared dataset for off-road temporal freespace detection under all-day conditions β comprising 24,314 densely annotated infrared images with synchronized RGB images. Building on this dataset, we propose IRONet, a novel flow-free framework for temporal freespace detection that aggregates historical context via a memory-attention mechanism and a carefully designed mask decoder. IRONet achieves 82.93% IoU and 90.66% F1 at 32 FPS on IRON, and generalizes robustly to RGB modalities on ORFD and Rellis-3D.
The IRON dataset provides the first large-scale infrared video sequences for temporal off-road freespace detection, covering diverse terrains and illumination conditions.
|
|
Each column shows a different scene type; rows show the aligned RGB image, infrared image, and freespace annotation respectively.
The IRON dataset is available on Baidu Netdisk:
Baidu Netdisk: https://pan.baidu.com/s/1UYPkj6nHYQRu2SFo7UuwGw?pwd=eiz6 οΌζεη οΌeiz6οΌ
IRONet is a flow-free temporal segmentation framework consisting of three stages:
-
Multi-Scale Feature Extraction β A ConvMAE-pretrained ViT backbone with a PSP-FPN neck extracts multi-scale infrared features
{F_t^i}. -
Memory Attention β A FIFO memory bank stores mask-aware historical features. Cross-attention with 3D spatiotemporal positional embeddings yields temporally-enriched features
FΜ_t. -
Memory Decoder β A SAM-style decoder with two key innovations:
- π΅ SGMC (Semantic Guided Memory Compensation): Re-initializes decoder semantics when memory contains no freespace signal (e.g., at sequence start, occlusions, sharp turns).
- π ADT (Alternating Dual-task Training): Prevents semantic shortcutting by alternating segmentation targets between freespace and background, maintaining strong supervision to temporal modules.
| Method | Backbone | Prec. (%) | Rec. (%) | F1 (%) | IoU (%) | Params (M) | FPS |
|---|---|---|---|---|---|---|---|
| U-Net | β | 71.30 | 90.12 | 79.61 | 66.13 | 31.04 | 21 |
| SegFormer | ViT-S | 82.75 | 92.41 | 87.31 | 77.48 | 27.48 | 33 |
| ROD | ViT-S | 86.12 | 89.06 | 87.56 | 77.88 | 33.43 | 68 |
| DeepLabV3+ | ResNet-101 | 87.25 | 90.31 | 88.75 | 79.78 | 58.75 | 103 |
| Mask2Former | ResNet-50 | 87.80 | 92.22 | 89.95 | 81.74 | 43.95 | 23 |
| β IRONet_3F | ViT-B | 88.15 | 93.07 | 90.55 | 82.73 | 104.49 | 23 |
| π IRONet_5F | ViT-S | 90.85 | 90.49 | 90.66 | 82.93 | 40.05 | 32 |
|
ORFD Dataset
|
Rellis-3D Dataset
|
β Reproduced from official code. Bold = best, underline = second best.
- Python 3.7+
- PyTorch 1.11+
- CUDA 11.3+
- mmcv-full 1.5.x
- mmsegmentation 0.24.x
# 1. Clone the repository
git clone https://github.com/wsnbws/IRON.git
cd IRON
# 2. Create conda environment
conda create -n ironet python=3.8 -y
conda activate ironet
# 3. Install PyTorch (example: CUDA 11.3)
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html
# 4. Install mmcv-full
pip install mmcv-full==1.5.0 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.11.0/index.html
# 5. Install mmsegmentation
pip install mmsegmentation==0.24.0
# 6. Install other dependencies
pip install timm einops scipyDownload the ConvMAE pretrained backbone weights:
| Backbone | Pretrain Data | Download |
|---|---|---|
| ViT-S (ConvMAE) | ImageNet-1K | convmae_small.pth |
| ViT-B (ConvMAE) | ImageNet-1K | convmae_base.pth |
Place downloaded weights in ./pretrained/.
data/
βββ IRON/
βββ images/
β βββ training/
β β βββ seq_001/
β β β βββ image_000000_*.jpg
β β β βββ ...
β β βββ ...
β βββ testing/
β βββ ...
βββ annotations/
βββ training/
βββ testing/
Update the dataset root in configs/_base_/datasets/drivable_video.py:
data_root = 'data/IRON'Download from the official ORFD repository and update paths in configs/_base_/datasets/orfd_video.py.
Download from the official RELLIS-3D repository and update paths in configs/_base_/datasets/rellis_video.py.
# IRONet_5F on IRON dataset (best model)
bash train.sh configs/ironet/ironet_vits_iron_5f.py work_dirs/ironet_5f pretrained/convmae_small.pth 1
# IRONet_3F ViT-S on IRON dataset
bash train.sh configs/ironet/ironet_vits_iron_3f.py work_dirs/ironet_3f pretrained/convmae_small.pth 1
# IRONet_3F ViT-B on IRON dataset
bash train.sh configs/ironet/ironet_vitb_iron_3f.py work_dirs/ironet_vitb_3f pretrained/convmae_base.pth 1# 4-GPU training example
bash train.sh configs/ironet/ironet_vits_iron_5f.py work_dirs/ironet_5f pretrained/convmae_small.pth 4# ORFD
bash train.sh configs/ironet/ironet_vits_orfd_5f.py work_dirs/ironet_orfd_5f pretrained/convmae_small.pth 1
# Rellis-3D
bash train.sh configs/ironet/ironet_vits_rellis_5f.py work_dirs/ironet_rellis_5f pretrained/convmae_small.pth 1# Evaluate with visualization output
bash test.sh configs/ironet/ironet_vits_iron_5f.py work_dirs/ironet_5f/checkpoints/best_model.pth results/
# Evaluate metrics only (no visualization)
python custom_test.py configs/ironet/ironet_vits_iron_5f.py work_dirs/ironet_5f/checkpoints/best_model.pth --evalbash tools/dist_test.sh configs/ironet/ironet_vits_iron_5f.py work_dirs/ironet_5f/checkpoints/best_model.pth 4Pre-trained IRONet checkpoints will be released upon paper acceptance.
| Config | Dataset | IoU (%) | F1 (%) | FPS | Download |
|---|---|---|---|---|---|
| ironet_vits_iron_3f | IRON | 82.73 | 90.55 | 23 | Coming soon |
| ironet_vits_iron_5f | IRON | 82.93 | 90.66 | 32 | Coming soon |
| ironet_vits_orfd_5f | ORFD | 94.8 | 97.3 | 32 | Coming soon |
| ironet_vits_rellis_5f | Rellis-3D | β | 95.67 | 32 | Coming soon |
If you find this work useful, please cite:
@article{wang2025iron,
title={Towards All-Day Perception for Off-Road Driving: A Large-Scale Multispectral Dataset and Comprehensive Benchmark},
author={Wang, Shuo and Mei, Jilin and Guan, Wenfei and Wang, Shuai and Xing, Yan and Min, Chen and Hu, Yu},
journal={IEEE Transactions on ...},
year={2025}
}This project builds upon mmsegmentation, ConvMAE, and SAM2. We thank the authors for their open-source contributions.
This project is released under the MIT License.
