Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts ArXiv
Wenyan Cong1∗, Hanxue Liang2,1∗, Peihao Wang1, Zhiwen Fan1, Tianlong Chen1, Mukund Varma T3,1, Yi Wang1, Zhangyang Wang1
1University of Texas at Austin, 2University of Cambridge, 3Indian Institute of Technology Madras
* denotes equal contribution.
This repository is built based on GNT's offical repository
Cross-scene generalizable NeRF models, which can di- rectly synthesize novel views of unseen scenes, have be- come a new spotlight of the NeRF field. Several existing attempts rely on increasingly end-to-end “neuralized” ar- chitectures, i.e., replacing scene representation and/or ren- dering modules with performant neural networks such as transformers, and turning novel view synthesis into a feed- forward inference pipeline. While those feedforward “neu- ralized” architectures still do not fit diverse scenes well out of the box, we propose to bridge them with the powerful Mixture-of-Experts (MoE) idea from large language models (LLMs), which has demonstrated superior generalization ability by balancing between larger overall model capacity and flexible per-instance specialization. Starting from a re- cent generalizable NeRF architecture called GNT, we first demonstrate that MoE can be neatly plugged in to en- hance the model. We further customize a shared permanent expert and a geometry-aware consistency loss to enforce cross-scene consistency and spatial smoothness respec- tively, which are essential for generalizable view synthesis. Our proposed model, dubbed GNT with Mixture-of-View-Experts (GNT-MOVE), has experimentally shown state-of- the-art results when transferring to unseen scenes, indicat- ing remarkably better cross-scene generalization in both zero-shot and few-shot settings.
Clone this repository:
git clone https://github.com/VITA-Group/GNT-MOVE.git
cd GNT-MOVE/
The code is tested with python 3.8, cuda == 11.1, pytorch == 1.10.1. Additionally dependencies include:
torchvision
ConfigArgParse
imageio
matplotlib
numpy
opencv_contrib_python
Pillow
scipy
imageio-ffmpeg
lpips
scikit-image
We reuse the training, evaluation datasets from IBRNet. All datasets must be downloaded to a directory data/
within the project folder and must follow the below organization.
├──data/
├──ibrnet_collected_1/
├──ibrnet_collected_2/
├──real_iconic_noface/
├──spaces_dataset/
├──RealEstate10K-subset/
├──google_scanned_objects/
├──nerf_synthetic/
├──nerf_llff_data/
We refer to IBRNet's repository to download and prepare data. For ease, we consolidate the instructions below:
mkdir data
cd data/
# IBRNet captures
gdown https://drive.google.com/uc?id=1rkzl3ecL3H0Xxf5WTyc2Swv30RIyr1R_
unzip ibrnet_collected.zip
# LLFF
gdown https://drive.google.com/uc?id=1ThgjloNt58ZdnEuiCeRf9tATJ-HI0b01
unzip real_iconic_noface.zip
## [IMPORTANT] remove scenes that appear in the test set
cd real_iconic_noface/
rm -rf data2_fernvlsb data2_hugetrike data2_trexsanta data3_orchid data5_leafscene data5_lotr data5_redflower
cd ../
# Spaces dataset
git clone https://github.com/augmentedperception/spaces_dataset
# RealEstate 10k
## make sure to install ffmpeg - sudo apt-get install ffmpeg
git clone https://github.com/qianqianwang68/RealEstate10K_Downloader
cd RealEstate10K_Downloader
python3 generate_dataset.py train
cd ../
# Google Scanned Objects
gdown https://drive.google.com/uc?id=1w1Cs0yztH6kE3JIz7mdggvPGCwIKkVi2
unzip google_scanned_objects_renderings.zip
# Blender dataset
gdown https://drive.google.com/uc?id=18JxhpWD-4ZmuFKLzKlAw-w5PpzZxXOcG
unzip nerf_synthetic.zip
# LLFF dataset (eval)
gdown https://drive.google.com/uc?id=16VnMcF1KJYxN9QId6TClMsZRahHNMW5g
unzip nerf_llff_data.zip
If you find our work / code implementation useful for your own research, please cite our paper.
@inproceedings{
gntmove2023,
title={Enhancing Ne{RF} akin to Enhancing {LLM}s: Generalizable Ne{RF} Transformer with Mixture-of-View-Experts},
author={Wenyan Cong and Hanxue Liang and Peihao Wang and Zhiwen Fan and Tianlong Chen and Mukund Varma and Yi Wang and Zhangyang Wang},
booktitle={ICCV},
year={2023}
}