Deep Event Visual Odometry

Simon Klenk^1,2* Marvin Motzet^1,2* Lukas Koestler^1,2 Daniel Cremers^1,2

^*equal contribution

¹Technical University of Munich (TUM) ²Munich Center for Machine Learning (MCML)

International Conference on 3D Vision (3DV) 2024, Davos, CH

Abstract

Event cameras offer the exciting possibility of tracking the camera's pose during high-speed motion and in adverse lighting conditions. Despite this promise, existing event-based monocular visual odometry (VO) approaches demonstrate limited performance on recent benchmarks. To address this limitation, some methods resort to additional sensors such as IMUs, stereo event cameras, or frame-based cameras. Nonetheless, these additional sensors limit the application of event cameras in real-world devices since they increase cost and complicate system requirements. Moreover, relying on a frame-based camera makes the system susceptible to motion blur and HDR. To remove the dependency on additional sensors and to push the limits of using only a single event camera, we present Deep Event VO (DEVO), the first monocular event-only system with strong performance on a large number of real-world benchmarks. DEVO sparsely tracks selected event patches over time. A key component of DEVO is a novel deep patch selection mechanism tailored to event data. We significantly decrease the pose tracking error on seven real-world benchmarks by up to 97% compared to event-only methods and often surpass or are close to stereo or inertial methods.

Overview

During training, DEVO takes event voxel grids $\{\mathbf{E}_t\}_{t=1}^N$, inverse depths $\{\mathbf{d}_t\}_{t=1}^N$, and camera poses $\{\mathbf{T}_t\}_{t=1}^N$ of a sequence of size $N$ as input. DEVO estimates poses $\{\hat{\mathbf{T}}_t\}_{t=1}^N$ and depths $\{\hat{\mathbf{d}}_t\}_{t=1}^N$ of the sequence. Our novel patch selection network predicts a score map $\mathbf{S}_t$ to highlight optimal 2D coordinates $\mathbf{P}_t$ for optical flow and pose estimation. A recurrent update operator iteratively refines the sparse patch-based optical flow $\hat{\mathbf{f}}$ between event grids by predicting $\Delta\hat{\mathbf{f}}$ and updates poses and depths through a differentiable bundle adjustment (DBA) layer, weighted by $\omega$, for each revision. Ground truth optical flow $\mathbf{f}$ for supervision is computed using poses and depth maps. At inference, DEVO samples from a multinomial distribution based on the pooled score map $\mathbf{S}_t$.

Setup

The code was tested on Ubuntu 22.04 and CUDA Toolkit 11.x. We use Anaconda to manage our Python environment.

First, clone the repo

git clone https://github.com/tum-vision/DEVO.git --recursive
cd DEVO

Then, create and activate the Anaconda environment

conda env create -f environment.yml
conda activate devo

Next, install the DEVO package

wget https://gitlab.com/libeigen/eigen/-/archive/3.4.0/eigen-3.4.0.zip
unzip eigen-3.4.0.zip -d thirdparty

# install DEVO
pip install .

Only for Training

Please note, the training data have the size of about 1.1TB (rbg: 300GB, evs: 370GB).

First, download all RGB images and depth maps of TartanAir from the left camera (~500GB) to <TARTANPATH>

python thirdparty/tartanair_tools/download_training.py --output-dir <TARTANPATH> --rgb --depth --only-left

Next, generate event voxel grids using vid2e

python # TODO release simulation

We provide scene infomation (including frame graph for co-visability used by clip sampling). (Building dataset is expensive).

# download data (~450MB)
./download_data.sh

Only for Evalution

We provide a pretrained model for our simulated event data

# download model (~40MB)
./download_model.sh

Training

Make sure you have run ./download_data.sh. Your directory structure should look as follows

├── datasets
    ├── TartanAirEvs
        ├── abandonedfactory
        ├── abandonedfactory_night
        ├── ...
        ├── westerndesert
    ...

To train (log files will be written to runs/<your name>). Model will be run on the validation split every 10k iterations

python train.py -c="config/DEVO_base.conf" --name=<your name>

Evaluation

python evals/eval_evs/eval_XXX_evs.py --datapath=<path to xxx dataset> --weights="DEVO.pth" --stride=1 --trials=1 --expname=<your name>

News

Code and model are released.
[] TODO Release code for simulation

Citation

If you find our work useful, please cite our paper:

@article{klenk2023devo,
  title     = {Deep Event Visual Odometry},
  author    = {Klenk, Simon and Motzet, Marvin and Koestler, Lukas and Cremers, Daniel},
  journal   = {arXiv preprint arXiv:2312.09800},
  year      = {2023}
}

Acknowledgments

We thank the authors of the following repositories for publicly releasing their work:

This work was supported by the ERC Advanced Grant SIMULACRON.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
config		config
devo		devo
evals		evals
scripts		scripts
splits		splits
thirdparty		thirdparty
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
download_data.sh		download_data.sh
download_model.sh		download_model.sh
environment.yml		environment.yml
setup.py		setup.py
train.py		train.py

License

tum-vision/DEVO

Folders and files

Latest commit

History

Repository files navigation

Deep Event Visual Odometry

Abstract

Overview

Setup

Only for Training

Only for Evalution

Training

Evaluation

News

Citation

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Languages