End-to-end Text-to-Speech with Generative Adversarial Networks

This repository contains implementation and end-to-end training scripts for text-to-speech models, based off End-to-End Adversarial Text-to-Speech (Donahue et al. 2020).

Usage

To setup the Python environment, run

python -m venv ttsgan
source ttsgan/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt

Aggregate audio files from the LJ-Speech dataset by running

ls LJSpeech-1.1/wavs/*.wav | tail -n+10 > train_files.txt
ls LJSpeech-1.1/wavs/*.wav | head -n10 > test_files.txt

Specify the path to the metadata.csv via the --metadata_file flag. Download the CMU phonemizer dictionary here and specify the path via the --cmudict_file flag.

To train, simply run

python train.py -c config.yml

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
modules		modules
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yml		config.yml
dataset.py		dataset.py
models.py		models.py
requirements.txt		requirements.txt
train.py		train.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-end Text-to-Speech with Generative Adversarial Networks

Usage

About

Releases

Packages

Languages

License

vliu15/tts-gan

Folders and files

Latest commit

History

Repository files navigation

End-to-end Text-to-Speech with Generative Adversarial Networks

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages