3DEnhancer employs a multi-view diffusion model to enhance multi-view images, thus improving 3D models.

Introducing 3DEnhancer

Despite advances in neural rendering, due to the scarcity of high-quality 3D datasets and the inherent limitations of multi-view diffusion models, view synthesis and 3D model generation are restricted to low resolutions with suboptimal multi-view consistency. In this study, we present a novel 3D enhancement pipeline, dubbed 3DEnhancer, which employs a multi-view latent diffusion model to enhance coarse 3D inputs while preserving multi-view consistency. Our method includes a pose-aware encoder and a diffusion-based denoiser to refine low-quality multi-view images, along with data augmentation and a multi-view attention module with epipolar aggregation to maintain consistent, high-quality 3D outputs across views. Unlike existing video-based approaches, our model supports seamless multi-view enhancement with improved coherence across diverse viewing angles. Extensive evaluations show that 3DEnhancer significantly outperforms existing methods, boosting both multi-view enhancement and per-instance 3D optimization tasks.
- [2024/03/08] Our inference code and Gradio demo are released.
- [2024/12/25] Our paper and project page are now live. Merry Christmas!
-
Clone Repo
git clone --recurse-submodules https://github.com/Luo-Yihang/3DEnhancer cd 3DEnhancer
-
Create Conda Environment
conda create -n 3denhancer python=3.10 -y conda activate 3denhancer
-
Install Python Dependencies
Important: Install Torch and Xformers based on your CUDA version. For example, for Torch 2.1.0 + CUDA 11.8:
# Install Torch and Xformers pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118 pip install -U xformers --index-url https://download.pytorch.org/whl/cu118 # Install other dependencies pip install -r requirements.txt
Download the pretrained model from Hugging Face and place it under pretrained_models/3DEnhancer
:
mkdir -p pretrained_models/3DEnhancer
wget -P pretrained_models/3DEnhancer https://huggingface.co/Luo-Yihang/3DEnhancer/resolve/main/model.safetensors
The code has been tested on NVIDIA A100 and V100 GPUs. An NVIDIA GPU with at least 18GB of memory is required.
We provide example inputs in assets/examples/mv_lq
, where each subfolder contains four sequential multi-view images. Perform inference on multi-view images using an aligned prompt
and noise_level
. For example:
python inference.py \
--input_folder assets/examples/mv_lq/vase \
--output_folder results/vase \
--prompt "vase" \
--noise_level 0
For more options, refer to inference.py
.
The script app.py
provides a simple web demo for generating and enhancing multi-view images, as well as reconstructing 3D models using LGM.
Install a modified Gaussian splatting (with depth and alpha rendering) required for LGM:
git clone --recursive https://github.com/ashawkey/diff-gaussian-rasterization
pip install ./diff-gaussian-rasterization
Download the LGM pretrained weights from Hugging Face and place it under pretrained_models/LGM
:
mkdir -p pretrained_models/LGM
wget -P pretrained_models/LGM https://huggingface.co/ashawkey/LGM/resolve/main/model_fp16_fixrot.safetensors
After installing the dependencies, start the demo with:
python app.py
The web demo is also available on Hugging Face Spaces! 🎉
- Release paper and project page.
- Release inference code.
- Release Gradio demo.
This project is licensed under NTU S-Lab License 1.0. Redistribution and use should follow this license.
If you find our code or paper helps, please consider citing:
@article{luo20243denhancer,
title={3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement},
author={Yihang Luo and Shangchen Zhou and Yushi Lan and Xingang Pan and Chen Change Loy},
booktitle={arXiv preprint arXiv:2412.18565}
year={2024},
}
If you have any questions, please feel free to reach us at luo_yihang@outlook.com
.