This repository provides PyTorch implementations for audio driven face meshes or blendshape models.
Currently, it supports the following models:
- Audio2Face
- VOCA
- FaceFormer
And the following feature extractors are available:
- Wav2Vec
- MFCCExtractor
This repository uses VOCASET as the template, which is introduced in 'Capture, Learning, and Synthesis of 3D Speaking Styles' (CVPR 2019).
Additionally, FLAME_sample
has been extracted and converted to assets/FLAME_sample.obj
and the Renderer has been redesigned. As a result, the psbody
library is not required in this repository, which may cause installation issues for Apple Silicon users.
VOCA link
- VOCASET ref
- Cudeiro, Daniel, et al. "Capture, learning, and synthesis of 3D speaking styles." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. ref
- TimoBolkart/voca ref
- Fan, Yingruo, et al. "FaceFormer: Speech-Driven 3D Facial Animation with Transformers." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022. ref
- NVIDIA. Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion. ref