Skip to content

Luo-Yihang/3DEnhancer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement

3DEnhancer employs a multi-view diffusion model to enhance multi-view images, thus improving 3D models.

pipeline 📖 For more visual results, go checkout our project page

Introducing 3DEnhancer
pipeline

Despite advances in neural rendering, due to the scarcity of high-quality 3D datasets and the inherent limitations of multi-view diffusion models, view synthesis and 3D model generation are restricted to low resolutions with suboptimal multi-view consistency. In this study, we present a novel 3D enhancement pipeline, dubbed 3DEnhancer, which employs a multi-view latent diffusion model to enhance coarse 3D inputs while preserving multi-view consistency. Our method includes a pose-aware encoder and a diffusion-based denoiser to refine low-quality multi-view images, along with data augmentation and a multi-view attention module with epipolar aggregation to maintain consistent, high-quality 3D outputs across views. Unlike existing video-based approaches, our model supports seamless multi-view enhancement with improved coherence across diverse viewing angles. Extensive evaluations show that 3DEnhancer significantly outperforms existing methods, boosting both multi-view enhancement and per-instance 3D optimization tasks.

🔥 News

  • [2024/03/08] Our inference code and Gradio demo are released.
  • [2024/12/25] Our paper and project page are now live. Merry Christmas!

🔧 Installation

  1. Clone Repo

    git clone --recurse-submodules https://github.com/Luo-Yihang/3DEnhancer
    cd 3DEnhancer
  2. Create Conda Environment

    conda create -n 3denhancer python=3.10 -y
    conda activate 3denhancer
  3. Install Python Dependencies

    Important: Install Torch and Xformers based on your CUDA version. For example, for Torch 2.1.0 + CUDA 11.8:

    # Install Torch and Xformers
    pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
    pip install -U xformers --index-url https://download.pytorch.org/whl/cu118
    
    # Install other dependencies
    pip install -r requirements.txt
    

💾 Pretrained Weights

Download the pretrained model from Hugging Face and place it under pretrained_models/3DEnhancer:

mkdir -p pretrained_models/3DEnhancer
wget -P pretrained_models/3DEnhancer https://huggingface.co/Luo-Yihang/3DEnhancer/resolve/main/model.safetensors

💻 Inference

The code has been tested on NVIDIA A100 and V100 GPUs. An NVIDIA GPU with at least 18GB of memory is required.

We provide example inputs in assets/examples/mv_lq, where each subfolder contains four sequential multi-view images. Perform inference on multi-view images using an aligned prompt and noise_level. For example:

python inference.py \
    --input_folder assets/examples/mv_lq/vase \
    --output_folder results/vase \
    --prompt "vase" \
    --noise_level 0

For more options, refer to inference.py.

⚡ Demo

The script app.py provides a simple web demo for generating and enhancing multi-view images, as well as reconstructing 3D models using LGM.

Install a modified Gaussian splatting (with depth and alpha rendering) required for LGM:

git clone --recursive https://github.com/ashawkey/diff-gaussian-rasterization
pip install ./diff-gaussian-rasterization

Download the LGM pretrained weights from Hugging Face and place it under pretrained_models/LGM:

mkdir -p pretrained_models/LGM
wget -P pretrained_models/LGM https://huggingface.co/ashawkey/LGM/resolve/main/model_fp16_fixrot.safetensors

After installing the dependencies, start the demo with:

python app.py

The web demo is also available on Hugging Face Spaces! 🎉

📆 TODO

  • Release paper and project page.
  • Release inference code.
  • Release Gradio demo.

📃 License

This project is licensed under NTU S-Lab License 1.0. Redistribution and use should follow this license.

📝 Citation

If you find our code or paper helps, please consider citing:

@article{luo20243denhancer,
    title={3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement}, 
    author={Yihang Luo and Shangchen Zhou and Yushi Lan and Xingang Pan and Chen Change Loy},
    booktitle={arXiv preprint arXiv:2412.18565}
    year={2024},
}

📫 Contact

If you have any questions, please feel free to reach us at luo_yihang@outlook.com.

About

[CVPR 2025] 3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement

Topics

Resources

License

Stars

Watchers

Forks

Languages