Skip to content

wbjnpu/DiffSinger

 
 

Repository files navigation

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism

arXiv GitHub Stars downloads | Open In Colab | Interactive🤗 TTS | Interactive🤗 SVS

This repository is the official PyTorch implementation of our AAAI-2022 paper, in which we propose DiffSinger (for Singing-Voice-Synthesis) and DiffSpeech (for Text-to-Speech).

🎉 🎉 🎉 Updates:

  • Sep.11, 2022: 🔌 DiffSinger-PN. Add plug-in PNDM, ICLR 2022 in our laboratory, to accelerate DiffSinger freely.
  • Jul.27, 2022: Update documents for SVS. Add easy inference A & B; Add Interactive SVS running on HuggingFace🤗 SVS.
  • Mar.2, 2022: MIDI-B-version.
  • Mar.1, 2022: NeuralSVB, for singing voice beautifying, has been released.
  • Feb.13, 2022: NATSpeech, the improved code framework, which contains the implementations of DiffSpeech and our NeurIPS-2021 work PortaSpeech has been released.
  • Jan.29, 2022: support MIDI-A-version SVS.
  • Jan.13, 2022: support SVS, release PopCS dataset.
  • Dec.19, 2021: support TTS. HuggingFace🤗 TTS

🚀 News:

  • Feb.24, 2022: Our new work, NeuralSVB was accepted by ACL-2022 arXiv. Demo Page.
  • Dec.01, 2021: DiffSinger was accepted by AAAI-2022.
  • Sep.29, 2021: Our recent work PortaSpeech: Portable and High-Quality Generative Text-to-Speech was accepted by NeurIPS-2021 arXiv .
  • May.06, 2021: We submitted DiffSinger to Arxiv arXiv.

Environments

conda create -n your_env_name python=3.8
source activate your_env_name 
pip install -r requirements_2080.txt   (GPU 2080Ti, CUDA 10.2)
or pip install -r requirements_3090.txt   (GPU 3090, CUDA 11.4)

Documents

Overview

Mel Pipeline Dataset Pitch Input F0 Prediction Acceleration Method Vocoder
DiffSpeech (Text->F0, Text+F0->Mel, Mel->Wav) Ljspeech None Explicit Shallow Diffusion NSF-HiFiGAN
DiffSinger (Lyric+F0->Mel, Mel->Wav) PopCS Ground-Truth F0 None Shallow Diffusion NSF-HiFiGAN
DiffSinger (Lyric+MIDI->F0, Lyric+F0->Mel, Mel->Wav) OpenCpop MIDI Explicit Shallow Diffusion NSF-HiFiGAN
FFT-Singer (Lyric+MIDI->F0, Lyric+F0->Mel, Mel->Wav) OpenCpop MIDI Explicit Invalid NSF-HiFiGAN
DiffSinger (Lyric+MIDI->Mel, Mel->Wav) OpenCpop MIDI Implicit None Pitch-Extractor + NSF-HiFiGAN
DiffSinger+PNDM (Lyric+MIDI->Mel, Mel->Wav) OpenCpop MIDI Implicit PLMS Pitch-Extractor + NSF-HiFiGAN

Tensorboard

tensorboard --logdir_spec exp_name
Tensorboard

Audio Demos

Old audio samples can be found in our demo page. Audio samples generated by this repository are listed here:

TTS audio samples

Speech samples (test set of LJSpeech) can be found in resources/demos_1213.

SVS audio samples

Singing samples (test set of PopCS) can be found in resources/demos_0112.

Citation

@article{liu2021diffsinger,
  title={Diffsinger: Singing voice synthesis via shallow diffusion mechanism},
  author={Liu, Jinglin and Li, Chengxi and Ren, Yi and Chen, Feiyang and Liu, Peng and Zhao, Zhou},
  journal={arXiv preprint arXiv:2105.02446},
  volume={2},
  year={2021}}

Acknowledgements

Our codes are based on the following repos:

Also thanks Keon Lee for fast implementation of our work.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.9%
  • Batchfile 0.1%