This is the code for the paper ESS: Learning Event-based Semantic Segmentation from Still Images (PDF) by Zhaoning Sun*, Nico Messikommer*, Daniel Gehrig, and Davide Scaramuzza. For an overview of our method, check out our video.
If you use any of this code, please cite the following publication:
@Article{Sun22eccv,
author = {Zhaoning Sun* and Nico Messikommer* and Daniel Gehrig and Davide Scaramuzza},
title = {ESS: Learning Event-based Semantic Segmentation from Still Images},
journal = {European Conference on Computer Vision. (ECCV)},
year = {2022},
}
Retrieving accurate semantic information in challenging high dynamic range (HDR) and high-speed conditions remains an open challenge for image-based algorithms due to severe image degradations. Event cameras promise to address these challenges since they feature a much higher dynamic range and are resilient to motion blur. Nonetheless, semantic segmentation with event cameras is still in its infancy which is chiefly due to the novelty of the sensor, and the lack of high-quality, labeled datasets. In this work, we introduce ESS, which tackles this problem by directly transferring the semantic segmentation task from existing labeled image datasets to unlabeled events via unsupervised domain adaptation (UDA). Compared to existing UDA methods, our approach aligns recurrent, motion-invariant event embeddings with image embeddings. For this reason, our method neither requires video data nor per-pixel alignment between images and events and, crucially, does not need to hallucinate motion from still images. Additionally, to spur further research in event-based semantic segmentation, we introduce DSEC-Semantic, the first large-scale event-based dataset with fine-grained labels. We show that using image labels alone, ESS outperforms existing UDA approaches, and when combined with event labels, it even outperforms state-of-the-art supervised approaches on both DDD17 and DSEC-Semantic. Finally, ESS is general-purpose, which unlocks the vast amount of existing labeled image datasets and paves the way for new and exciting research directions in new fields previously inaccessible for event cameras.
If desired, a conda environment can be created using the followig command:
conda create -n <env_name>
As an initial step, the wheel package needs to be installed with the following command:
pip install wheel
The required python packages are listed in the requirements.txt
file.
pip install -r requirements.txt
Pre-trained E2VID model needs to be downloaded here and placed in /e2vid/pretrained/
.
The DSEC-Semantic dataset can be downloaded here. The dataset should have the following format:
├── DSEC_Semantic
│ ├── train
│ │ ├── zurich_city_00_a
│ │ │ ├── semantic
│ │ │ │ ├── left
│ │ │ │ │ ├── 11classes
│ │ │ │ │ │ └──data
│ │ │ │ │ │ ├── 000000.png
│ │ │ │ │ │ └── ...
│ │ │ │ │ └── 19classes
│ │ │ │ │ └──data
│ │ │ │ │ ├── 000000.png
│ │ │ │ │ └── ...
│ │ │ │ └── timestamps.txt
│ │ │ └── events
│ │ │ └── left
│ │ │ ├── events.h5
│ │ │ └── rectify_map.h5
│ │ └── ...
│ └── test
│ ├── zurich_city_13_a
│ │ └── ...
│ └── ...
The original DDD17 dataset with semantic segmentation labels can be downloaded here. Additionally, we provide a pre-processed DDD17 dataset with semantic labels here. Please do not forget to cite DDD17 and Ev-SegNet if you are using the DDD17 with semantic labels.
The Cityscapes dataset can be downloaded here.
The settings for the training can be specified in config/settings_XXXX.yaml
.
Two different models can be trained:
- ess: ESS UDA / ESS supervised (events labels + frames labels)
- ess_supervised: ESS supervised (only events labels)
The following command starts the training:
CUDA_VISIBLE_DEVICES=<GPU_ID>, python train.py --settings_file config/settings_XXXX.yaml
For testing the pre-trained models, please set load_pretrained_weights=True
and specify the path of pre-trained weights in pretrained_file
.
To download the pre-trained weights for the models on DDD17 and DSEC in the UDA setting, please fill in your details in this Google Form.
Several network architectures were adapted from: https://github.com/uzh-rpg/rpg_e2vid
The general training framework was inspired by: https://github.com/uzh-rpg/rpg_ev-transfer
The DSEC data loader was adapted from: https://github.com/uzh-rpg/DSEC
The optimizer was adapted from: https://github.com/LiyuanLucasLiu/RAdam
The DICE loss was adapted from: https://github.com/Guocode/DiceLoss.Pytorch