GitHub - v-manhlt3/FlowVocoder

FlowVocoder: A small Footprint Neural Vocoder based Normalizing Flow forSpeech Synthesis

Setup

Clone this repo and install requirements

git clone https://github.com/tienmanhptit1312/FlowVocoder.git
cd FlowVocoder
pip install -r requirements.txt

Install Apex for mixed-precision training:

Train your model

Download LJ Speech Data. Then, uncompress LJ-Speech dataset where you downloaded it.

Copy wave files from LJ-Speech directory to FlowVocoder directory.

cp -r [LJ-Speech dataset's directory]/wavs [FlowVocoder's directory]

Make a list of the file names to use for training/testing.
```
ls wavs/*.wav | tail -n+1310 > train_files.txt
ls wavs/*.wav | head -n1310 > test_files.txt
```
-n1310 indicates that this example reserves the first 1310 audio clips for model testing. The remaining dataset is used for training.
Edit the configuration file and train the model.

Below are the example commands using flowvocoder.json
```
python train.py -c configs/flowvocoder.json --tr
```
Single-node multi-GPU training is automatically enabled with [DataParallel] (instead of [DistributedDataParallel] for simplicity).

For mixed precision training, set "fp16_run": true on the configuration file.

You can load the trained weights from saved checkpoints by providing the path to checkpoint_path variable in the config file.

checkpoint_path accepts either explicit path, or the parent directory if resuming from averaged weights over multiple checkpoints. It takes about a week to train this model with two V100 Nvidia GPUs with batch-size=2. You can download our pretrained model for about 1M training iterations: link for reproducing purpose.

Examples

insert checkpoint_path: "experiments/flowvocoder/flowvocoder_5000" in the config file then run
```
python train.py -c configs/flowvocoder.json --tr
```
for loading averaged weights over 10 recent checkpoints, insert checkpoint_path: "experiments/flowvocoder" in the config file then run
```
python train.py -a 10 -c configs/flowvocoder.json
```
Synthesize waveform from the trained model.

insert checkpoint_path in the config file and use --synthesize to train.py. The model generates waveform by looping over test_files.txt.
```
python train.py --synthesize -c configs/flowvocoder.json
```
if fp16_run: true, the model uses FP16 (half-precision) arithmetic for faster performance (on GPUs equipped with Tensor Cores).

Reference

NVIDIA Tacotron2: https://github.com/NVIDIA/tacotron2

WaveFlow: https://github.com/L0SG/WaveFlow

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
configs		configs
models		models
tacotron2_custom		tacotron2_custom
utils		utils
README.md		README.md
functions.py		functions.py
mel2samp.py		mel2samp.py
modules.py		modules.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configs

configs

models

models

tacotron2_custom

tacotron2_custom

utils

utils

README.md

README.md

functions.py

functions.py

mel2samp.py

mel2samp.py

modules.py

modules.py

requirements.txt

requirements.txt

train.py

train.py

Repository files navigation

FlowVocoder: A small Footprint Neural Vocoder based Normalizing Flow forSpeech Synthesis

Setup

Train your model

Examples

Reference

About

Releases

Packages

Languages

v-manhlt3/FlowVocoder

Folders and files

Latest commit

History

Repository files navigation

FlowVocoder: A small Footprint Neural Vocoder based Normalizing Flow forSpeech Synthesis

Setup

Train your model

Examples

Reference

About

Resources

Stars

Watchers

Forks

Languages