Skip to content

[ICCV2023] "Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts" by Wenyan Cong, Hanxue Liang, Peihao Wang, Zhiwen Fan, Tianlong Chen, Mukund Varma, Yi Wang, Zhangyang Wang

VITA-Group/GNT-MOVE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts ArXiv

Wenyan Cong1∗, Hanxue Liang2,1∗, Peihao Wang1, Zhiwen Fan1, Tianlong Chen1, Mukund Varma T3,1, Yi Wang1, Zhangyang Wang1

1University of Texas at Austin, 2University of Cambridge, 3Indian Institute of Technology Madras

* denotes equal contribution.

This repository is built based on GNT's offical repository

Introduction

Cross-scene generalizable NeRF models, which can di- rectly synthesize novel views of unseen scenes, have be- come a new spotlight of the NeRF field. Several existing attempts rely on increasingly end-to-end “neuralized” ar- chitectures, i.e., replacing scene representation and/or ren- dering modules with performant neural networks such as transformers, and turning novel view synthesis into a feed- forward inference pipeline. While those feedforward “neu- ralized” architectures still do not fit diverse scenes well out of the box, we propose to bridge them with the powerful Mixture-of-Experts (MoE) idea from large language models (LLMs), which has demonstrated superior generalization ability by balancing between larger overall model capacity and flexible per-instance specialization. Starting from a re- cent generalizable NeRF architecture called GNT, we first demonstrate that MoE can be neatly plugged in to en- hance the model. We further customize a shared permanent expert and a geometry-aware consistency loss to enforce cross-scene consistency and spatial smoothness respec- tively, which are essential for generalizable view synthesis. Our proposed model, dubbed GNT with Mixture-of-View-Experts (GNT-MOVE), has experimentally shown state-of- the-art results when transferring to unseen scenes, indicat- ing remarkably better cross-scene generalization in both zero-shot and few-shot settings.

Installation

Clone this repository:

git clone https://github.com/VITA-Group/GNT-MOVE.git
cd GNT-MOVE/

The code is tested with python 3.8, cuda == 11.1, pytorch == 1.10.1. Additionally dependencies include:

torchvision
ConfigArgParse
imageio
matplotlib
numpy
opencv_contrib_python
Pillow
scipy
imageio-ffmpeg
lpips
scikit-image

Datasets

We reuse the training, evaluation datasets from IBRNet. All datasets must be downloaded to a directory data/ within the project folder and must follow the below organization.

├──data/
    ├──ibrnet_collected_1/
    ├──ibrnet_collected_2/
    ├──real_iconic_noface/
    ├──spaces_dataset/
    ├──RealEstate10K-subset/
    ├──google_scanned_objects/
    ├──nerf_synthetic/
    ├──nerf_llff_data/

We refer to IBRNet's repository to download and prepare data. For ease, we consolidate the instructions below:

mkdir data
cd data/

# IBRNet captures
gdown https://drive.google.com/uc?id=1rkzl3ecL3H0Xxf5WTyc2Swv30RIyr1R_
unzip ibrnet_collected.zip

# LLFF
gdown https://drive.google.com/uc?id=1ThgjloNt58ZdnEuiCeRf9tATJ-HI0b01
unzip real_iconic_noface.zip

## [IMPORTANT] remove scenes that appear in the test set
cd real_iconic_noface/
rm -rf data2_fernvlsb data2_hugetrike data2_trexsanta data3_orchid data5_leafscene data5_lotr data5_redflower
cd ../

# Spaces dataset
git clone https://github.com/augmentedperception/spaces_dataset

# RealEstate 10k
## make sure to install ffmpeg - sudo apt-get install ffmpeg
git clone https://github.com/qianqianwang68/RealEstate10K_Downloader
cd RealEstate10K_Downloader
python3 generate_dataset.py train
cd ../

# Google Scanned Objects
gdown https://drive.google.com/uc?id=1w1Cs0yztH6kE3JIz7mdggvPGCwIKkVi2
unzip google_scanned_objects_renderings.zip

# Blender dataset
gdown https://drive.google.com/uc?id=18JxhpWD-4ZmuFKLzKlAw-w5PpzZxXOcG
unzip nerf_synthetic.zip

# LLFF dataset (eval)
gdown https://drive.google.com/uc?id=16VnMcF1KJYxN9QId6TClMsZRahHNMW5g
unzip nerf_llff_data.zip

Usage

Cite this work

If you find our work / code implementation useful for your own research, please cite our paper.

@inproceedings{
    gntmove2023,
    title={Enhancing Ne{RF} akin to Enhancing {LLM}s: Generalizable Ne{RF} Transformer with Mixture-of-View-Experts},
    author={Wenyan Cong and Hanxue Liang and Peihao Wang and Zhiwen Fan and Tianlong Chen and Mukund Varma and Yi Wang and Zhangyang Wang},
    booktitle={ICCV},
    year={2023}
}

About

[ICCV2023] "Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts" by Wenyan Cong, Hanxue Liang, Peihao Wang, Zhiwen Fan, Tianlong Chen, Mukund Varma, Yi Wang, Zhangyang Wang

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages