DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer

Paper | Demo

Update

[30/03/2024]: The evaluation code is updated.
[07/02/2024]: The inference script is released.
[06/02/2024]: The model weight is released.

Get started

Environment Setup

conda create --name diffspeaker python=3.9
conda activate diffspeaker

Install MPI-IS. Follow the command in MPI-IS to install the package. Depending on if you have /usr/include/boost/ directories, The command is likely to be

git clone https://github.com/MPI-IS/mesh.git
cd mesh
sudo apt-get install libboost-dev
python -m pip install pip==20.2.4
BOOST_INCLUDE_DIRS=/usr/include/boost/ make all
python -m pip install --upgrade pip

Then install the rest of the dependencies.

cd ..
git clone https://github.com/theEricMa/DiffSpeaker.git
cd DiffSpeaker
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install imageio-ffmpeg
pip install -r requirements.txt

Model Weights

You can access the model parameters by clicking here. Place the checkpoints folder into the root directory of your project. This folder includes the models that have been trained on the BIWI and vocaset datasets, utilizing wav2vec and hubert as the backbones.

Prediction

For the BIWI model, use the script below to perform inference on your chosen audio files. Specify the audio file using the --example argument.

sh scripts/demo/demo_biwi.sh

For the vocaset model, run the following script.

sh scripts/demo/demo_vocaset.sh

Evaluation

To obtain the metrics reported in the paper, use the scripts in scripts/diffusion/biwi_evaluation and scripts/diffusion/vocaset_evaluation. For example, to evaluate DiffSpeaker in BIWI dataset with the hubert backbone, use the following script.

sh scripts/diffusion/biwi_evaluation/diffspeaker_hubert_biwi.sh

Training

Data Preparation

Model Training

mkdir experiments

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
alm		alm
configs		configs
datasets		datasets
demo/wavs		demo/wavs
scripts		scripts
README.md		README.md
demo_biwi.py		demo_biwi.py
demo_vocaset.py		demo_vocaset.py
demo_vocaset_text.py		demo_vocaset_text.py
eval_biwi.py		eval_biwi.py
eval_vocaset.py		eval_vocaset.py
requirements.txt		requirements.txt
train.py		train.py

theEricMa/DiffSpeaker

Folders and files

Latest commit

History

Repository files navigation

DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer

Paper | Demo

Update

Get started

Environment Setup

Model Weights

Prediction

Evaluation

Training

Data Preparation

Model Training

About

Resources

Stars

Watchers

Forks

Languages