The source code of the paper Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables, accepted at ISMIR 2023.
- Install python requirements.
pip install requirements.txt
-
Download the MPop600 dataset. The dataset is conducted in a download-by-request manner. Please contact their third author Yi-Jhe Lee to get the raw files.
-
Resample the data to 24 kHz.
python scripts/resample_dir.py **/f1/ output_dir --sr 24000
- Generate F0 labels (stored as
.pv
files).
python scripts/wav2f0.py output_dir
- Train with the configurations
config.yaml
we used in the paper (available underckpts/
).
python main.py fit --config config.yaml --dataset.init_args.wav_dir output_dir
python main.py test --config config.yaml --ckpt_path checkpoint.ckpt --data.init_args.duration 6 --data.init_args.overlap 0 --data.init_args.batch_size 16
python test_rtf.py config.yaml checkpoint.ckpt test.wav
- MOS: compute MOS score given the rating file from GO Listen.
- time-domain l2 experiment: the notebook used to conduct the time-domain L2 loss ablation study in the paper.