YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

In our recent paper we propose the YourTTS model. YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS. Our method builds upon the VITS model and adds several novel modifications for zero-shot multi-speaker and multilingual training. We achieved state-of-the-art (SOTA) results in zero-shot multi-speaker TTS and results comparable to SOTA in zero-shot voice conversion on the VCTK dataset. Additionally, our approach achieves promising results in a target language with a single-speaker dataset, opening possibilities for zero-shot multi-speaker TTS and zero-shot voice conversion systems in low-resource languages. Finally, it is possible to fine-tune the YourTTS model with less than 1 minute of speech and achieve state-of-the-art results in voice similarity and with reasonable quality. This is important to allow synthesis for speakers with a very different voice or recording characteristics from those seen during training.

Audios samples

Visit our website for audio samples.

Implementation

All of our experiments were implemented at Coqui TTS.

Colab Demos

Checkpoints

All the released checkpoints are licensed under CC BY-NC-ND 4.0

Model	URL
Speaker Encoder	link
Exp 1. YourTTS-EN(VCTK)	link
Exp 1. YourTTS-EN(VCTK) + SCL	link
Exp 2. YourTTS-EN(VCTK)-PT	link
Exp 2. YourTTS-EN(VCTK)-PT + SCL	link
Exp 3. YourTTS-EN(VCTK)-PT-FR	link
Exp 3. YourTTS-EN(VCTK)-PT-FR SCL	link
Exp 4. YourTTS-EN(VCTK+LibriTTS)-PT-FR SCL	link

Results replicability

To replicability we make the audios used to calculate the MOS available here. In addition, we provide the Mean Opinion scores for each audio here.

To recompute our MOS results follow the instructions here. To predict the test sequences and compute the SECS results, please use the Jupyter Notebooks available here.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
metrics		metrics
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics

metrics

LICENSE

LICENSE

README.md

README.md

Repository files navigation

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

Audios samples

Implementation

Colab Demos

Checkpoints

Results replicability

About

Releases

Packages

Languages

License

techthiyanes/YourTTS

Folders and files

Latest commit

History

Repository files navigation

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

Audios samples

Implementation

Colab Demos

Checkpoints

Results replicability

About

Resources

License

Stars

Watchers

Forks

Languages