MonoNeRF

This is the official implementation of our ICCV 2023 paper "MonoNeRF: Learning a Generalizable Dynamic Radiance Field from Monocular Videos".

MonoNeRF: Learning a Generalizable Dynamic Radiance Field from Monocular Videos
Fengrui Tian, Shaoyi Du, Yueqi Duan
in ICCV 2023

arxiv / paper / video

Introdution

In this paper, we target at the problem of learning a generalizable dynamic radiance field from monocular videos. Different from most existing NeRF methods that are based on multiple views, monocular videos only contain one view at each timestamp, thereby suffering from ambiguity along the view direction in estimating point features and scene flows. Previous studies such as DynNeRF disambiguate point features by positional encoding, which is not transferable and severely limits the generalization ability. As a result, these methods have to train one independent model for each scene and suffer from heavy computational costs when applying to increasing monocular videos in real-world applications. To address this, We propose MonoNeRF to simultaneously learn point features and scene flows with point trajectory and feature correspondence constraints across frames. More specifically, we learn an implicit velocity field to estimate point trajectory from temporal features with Neural ODE, which is followed by a flow-based feature aggregation module to obtain spatial features along the point trajectory. We jointly optimize temporal and spatial features in an end-to-end manner. Experiments show that our MonoNeRF is able to learn from multiple scenes and support new applications such as scene editing, unseen frame synthesis, and fast novel scene adaptation.

Environment Setup

The code is tested with

Ubuntu 16.04
Anaconda 3
Python 3.8.12
CUDA 11.1
A100 or 3090 GPUs

To get started, please create the conda environment mononerf by running

conda create --name mononerf python=3.8
conda activate mononerf

pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install imageio==2.19.2 pyhocon==0.3.60  pyparsing==2.4.7 configargparse==1.5.3 tensorboard==2.13.0 ipdb==0.13.13 imgviz==1.7.2 imageio--ffmpeg==0.4.8 
pip install mmcv-full==1.7.1

Then install MMAction2 v0.24.1 manually.

git clone https://github.com/open-mmlab/mmaction2.git
cd mmaction2
git checkout v0.24.1
pip install -v -e .
# "-v" means verbose, or more output
# "-e" means installing a project in editable mode,
# thus any local modifications made to the code will take effect without re-installation.

Install the torchdiffeq if you want to use Neural ODE for calculating trajectories.

pip install torchdiffeq==0.0.1

Install other dependencies.

pip install tqdm Pillow==9.1.1

Finally, clone the MonoNeRF project:

git clone https://github.com/tianfr/MonoNeRF.git
cd MonoNeRF

Dynamic Scene Dataset

The Dynamic Scene Dataset is used to quantitatively evaluate our method. Please refer to the official dataset to download the data. Here we present the data link from DynamicNeRF to download the training dataset.

wget --no-check-certificate https://filebox.ece.vt.edu/~chengao/free-view-video/data.zip
unzip data.zip
rm data.zip

We also provide the dataset link on Google Drive that contains both training and evaluation data and evaluation code on train/evaluation.py.

Backbone Checkpoints

Download the SlowOnly pretrained model from MMAction2 website.

mkdir checkpoints
wget -P checkpoints/ https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb/slowonly_imagenet_pretrained_r50_8x8x1_150e_kinetics400_rgb_20200912-3f9ce182.pth

Training

Environment Initialization

export PYTHONPATH=.

All the training procedure is conducted on GPU 0 by default.

Multiple scenes

You can train a model from scratch by running:

chmod +x Balloon1_Balloon2.sh
./Balloon1_Balloon2.sh 0

Unseen frames

Train model for rendering novel views on unseen frames:

chmod +x Balloon2_unseen_frames.sh
./Balloon2_unseen_frames.sh 0

Unseen scenes

Test the generalization ability on unseen scenes:

chmod +x generalization_from_Balloon1_Balloon2.sh
./generalization_from_Balloon1_Balloon2.sh 0 2000

Train a model on your sequence (from DynNeRF)

Set some paths

ROOT_PATH=/path/to/the/MonoNeRF/folder
DATASET_NAME=name_of_the_video_without_extension
DATASET_PATH=$ROOT_PATH/data/$DATASET_NAME

and install COLMAP manually. Then download MiDaS and RAFT weights

cd $ROOT_PATH
wget --no-check-certificate https://filebox.ece.vt.edu/~chengao/free-view-video/weights.zip
unzip weights.zip
rm weights.zip

Prepare training images and background masks from a video.

cd $ROOT_PATH/train/utils
python generate_data.py --videopath /path/to/the/video

Use COLMAP to obtain camera poses.

colmap feature_extractor \
--database_path $DATASET_PATH/database.db \
--image_path $DATASET_PATH/images_colmap \
--ImageReader.mask_path $DATASET_PATH/background_mask \
--ImageReader.single_camera 1

colmap exhaustive_matcher \
--database_path $DATASET_PATH/database.db

mkdir $DATASET_PATH/sparse
colmap mapper \
    --database_path $DATASET_PATH/database.db \
    --image_path $DATASET_PATH/images_colmap \
    --output_path $DATASET_PATH/sparse \
    --Mapper.num_threads 16 \
    --Mapper.init_min_tri_angle 4 \
    --Mapper.multiple_models 0 \
    --Mapper.extract_colors 0

Save camera poses into the format that NeRF reads.

cd $ROOT_PATH/train/utils
python generate_pose.py --dataset_path $DATASET_PATH

Estimate monocular depth.

cd $ROOT_PATH/train/utils
python generate_depth.py --dataset_path $DATASET_PATH --model $ROOT_PATH/weights/midas_v21-f6b98070.pt

Predict optical flows.

cd $ROOT_PATH/train/utils
python generate_flow.py --dataset_path $DATASET_PATH --model $ROOT_PATH/weights/raft-things.pth

Obtain motion mask (code adapted from NSFF).

cd $ROOT_PATH/train/utils
python generate_motion_mask.py --dataset_path $DATASET_PATH

Train a model. Please change expname and dataset_file_lists in mononerf_conf/exp/your_own_scene/your_own_scene.conf.

cd $ROOT_PATH/
chmod +x your_own_scene.sh
./your_own_scene.sh 0

License

This work is licensed under MIT License. See LICENSE for details.

If you find this code useful for your research, please consider citing the following paper:

@inproceedings{23iccv/tian_mononerf,
    author    = {Tian, Fengrui and Du, Shaoyi and Duan, Yueqi},
    title     = {{MonoNeRF}: Learning a Generalizable Dynamic Radiance Field from Monocular Videos},
    booktitle = {Proceedings of the International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023}
}

Acknowledgement

Our code is built upon NeRF, NeRF-pytorch, NSFF, DynamicNeRF, pixelNeRF, and Occupancy Flow. Our flow prediction code is modified from RAFT. Our depth prediction code is modified from MiDaS.

Contact

If you have any questions, please feel free to contact Fengrui Tian.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
configs		configs
mmaction_configs		mmaction_configs
mononerf_conf		mononerf_conf
src		src
train		train
.gitignore		.gitignore
Balloon1_Balloon2.sh		Balloon1_Balloon2.sh
Balloon2_Umbrella.sh		Balloon2_Umbrella.sh
Balloon2_unseen_frames.sh		Balloon2_unseen_frames.sh
LICENSE		LICENSE
README.md		README.md
generalization_from_Balloon1_Balloon2.sh		generalization_from_Balloon1_Balloon2.sh
your_own_scene.sh		your_own_scene.sh

License

tianfr/MonoNeRF

Folders and files

Latest commit

History

Repository files navigation

MonoNeRF

Introdution

Environment Setup

Dynamic Scene Dataset

Backbone Checkpoints

Training

Environment Initialization

Multiple scenes

Unseen frames

Unseen scenes

Train a model on your sequence (from DynNeRF)

License

Acknowledgement

Contact

About

Resources

License

Stars

Watchers

Forks

Languages