DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

Yukang Cao^*, Yan-Pei Cao^*, Kai Han^†, Ying Shan, Kwan-Yee K. Wong^†

Please refer to our webpage for more visualizations.

Abstract

We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars with controllable poses. While encouraging results have been reported by recent methods on text-guided 3D common object generation, generating high-quality human avatars remains an open challenge due to the complexity of the human body's shape, pose, and appearance. We propose DreamAvatar to tackle this challenge, which utilizes a trainable NeRF for predicting density and color for 3D points and pretrained text-to-image diffusion models for providing 2D self-supervision. Specifically, we leverage the SMPL model to provide shape and pose guidance for the generation. We introduce a dual-observation-space design that involves the joint optimization of a canonical space and a posed space that are related by a learnable deformation field. This facilitates the generation of more complete textures and geometry faithful to the target pose. We also jointly optimize the losses computed from the full body and from the zoomed-in 3D head to alleviate the common multi-face ''Janus'' problem and improve facial details in the generated avatars. Extensive evaluations demonstrate that DreamAvatar significantly outperforms existing methods, establishing a new state-of-the-art for text-and-shape guided 3D human avatar generation.

Installation

See installation.md for additional information, including installation via Docker.

The following steps have been tested on Ubuntu20.04.

You must have an NVIDIA graphics card with at least 48GB VRAM and have CUDA installed.
Install Python >= 3.8.
(Optional, Recommended) Create a virtual environment:

python3 -m virtualenv venv
. venv/bin/activate

# Newer pip versions, e.g. pip-23.x, can be much faster than old versions, e.g. pip-20.x.
# For instance, it caches the wheels of git packages to avoid unnecessarily rebuilding them later.
python3 -m pip install --upgrade pip

Install PyTorch >= 1.12. We have tested on torch1.12.1+cu113 and torch2.0.0+cu118, but other versions should also work fine.

# torch1.12.1+cu113
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
# or torch2.0.0+cu118
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

Install pytorch3d and kaolin.

pip install git+https://github.com/facebookresearch/pytorch3d.git@v0.7.1
pip install git+https://github.com/NVIDIAGameWorks/kaolin.git

(Optional, Recommended) Install ninja to speed up the compilation of CUDA extensions:

pip install ninja

Install dependencies:

pip install -r requirements.txt

Download SMPL-X model

Please kindly follow instructions in SMPL-X to download required model.

./smpl_data
├── SMPLX_NEUTRAL.pkl
├── SMPLX_FEMALE.pkl
├── SMPLX_MALE.pkl

Training canonical DreamAvatar

# avatar generation with 512x512 NeRF rendering, ~48GB VRAM
python launch.py --config configs/dreamavatar.yaml --train --gpu 0 system.prompt_processor.prompt="Wonder Woman"
# if you don't have enough VRAM, try training with 64x64 NeRF rendering
python launch.py --config configs/dreamavatar.yaml --train --gpu 0 system.prompt_processor.prompt="Wonder Woman" data.width=64 data.height=64 data.batch_size=1

Resume from checkpoints

If you want to resume from a checkpoint, do:

# resume training from the last checkpoint, you may replace last.ckpt with any other checkpoints
python launch.py --config path/to/trial/dir/configs/parsed.yaml --train --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt
# if the training has completed, you can still continue training for a longer time by setting trainer.max_steps
python launch.py --config path/to/trial/dir/configs/parsed.yaml --train --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt trainer.max_steps=20000
# you can also perform testing using resumed checkpoints
python launch.py --config path/to/trial/dir/configs/parsed.yaml --test --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt
# note that the above commands use parsed configuration files from previous trials
# which will continue using the same trial directory
# if you want to save to a new trial directory, replace parsed.yaml with raw.yaml in the command

# only load weights from saved checkpoint but dont resume training (i.e. dont load optimizer state):
python launch.py --config path/to/trial/dir/configs/parsed.yaml --train --gpu 0 system.weights=path/to/trial/dir/ckpts/last.ckpt

Misc.

If you want to cite our work, please use the following bib entry:

@article{cao2023dreamavatar,
  title={Dreamavatar: Text-and-shape guided 3d human avatar generation via diffusion models},
  author={Cao, Yukang and Cao, Yan-Pei and Han, Kai and Shan, Ying and Wong, Kwan-Yee K},
  journal={arXiv preprint arXiv:2304.00916},
  year={2023}
}

Acknowledgement

Thanks to the brilliant works from Threestudio and Stable-DreamFusion

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
SMPL_poses		SMPL_poses
configs		configs
docker		docker
docs		docs
load		load
threestudio		threestudio
DOCUMENTATION.md		DOCUMENTATION.md
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
launch.py		launch.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SMPL_poses

SMPL_poses

configs

configs

docker

docker

docs

docs

load

load

threestudio

threestudio

DOCUMENTATION.md

DOCUMENTATION.md

LICENSE

LICENSE

README.md

README.md

environment.yml

environment.yml

launch.py

launch.py

requirements.txt

requirements.txt

Repository files navigation

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

Abstract

Installation

Download SMPL-X model

Training canonical DreamAvatar

Resume from checkpoints

Misc.

Acknowledgement

About

Releases

Packages

Languages

License

yukangcao/DreamAvatar

Folders and files

Latest commit

History

Repository files navigation

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

Abstract

Installation

Download SMPL-X model

Training canonical DreamAvatar

Resume from checkpoints

Misc.

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Languages