Pronunciation correction in vector quantized PPG representation space

Work in progress: Inspired from this paper Zero-Shot Foreign Accent Conversion without a Native Reference

This is the translator module as shown in the above paper.

Installation

Install ffmpeg.
Install Kaldi
Install PyKaldi
Install packages using environment.yml file.
Download pretrained TDNN-F model, extract it, and set PRETRAIN_ROOT in kaldi_scripts/extract_features_kaldi.sh to the pretrained model directory.

Dataset

Acoustic Model: LibriSpeech. Download pretrained TDNN-F acoustic model here.
- You also need to set KALDI_ROOT and PRETRAIN_ROOT in kaldi_scripts/extract_features_kaldi.sh accordingly.
Vector Quantization: [ARCTIC and L2-ARCTIC, see here for detailed training process.
Translator seq2seq (i.e., Seq2seq model): ARCTIC and L2-ARCTIC. Please see here for a merged version. All the pretrained the models are available (To be updated) here

Directory layout (Format your dataset to match below)

datatset_root
├── speaker 1
├── speaker 2 
│   ├── wav          # contains all the wav files from speaker 2
│   └── kaldi        # Kaldi files (auto-generated after running kaldi-scripts
.
.
└── speaker N

Quick Start

See the inference script

Training

Use Kaldi to extract BNF for individual speakers (Do it for all speakers)

./kaldi_scripts/extract_features_kaldi.sh /path/to/speaker

Preprocessing

python preprocess_bnfs.py path/to/dataset
python make_data.py #Edit the file to specify dataset path

Vector Quantize the BNFs see here
Setting Training params See conf/
Training Model

./train.sh

Synthesizer Code and Training see here

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
bin		bin
conf		conf
data_objects		data_objects
kaldi_scripts		kaldi_scripts
speaker_encoder		speaker_encoder
src		src
synthesizer/src		synthesizer/src
utils		utils
vocoder		vocoder
.gitignore		.gitignore
LICENSE		LICENSE
README copy.md		README copy.md
README.md		README.md
dev_all.txt		dev_all.txt
environment.yml		environment.yml
inference.ipynb		inference.ipynb
main.py		main.py
make_data.py		make_data.py
path.sh		path.sh
preprocess_bnfs.py		preprocess_bnfs.py
train.sh		train.sh
train_all.txt		train_all.txt
train_no_ctc.sh		train_no_ctc.sh

License

warisqr007/vq-bnf-translator

Folders and files

Latest commit

History

Repository files navigation

Pronunciation correction in vector quantized PPG representation space

Installation

Dataset

Directory layout (Format your dataset to match below)

Quick Start

Training

About

Topics

Resources

License

Stars

Watchers

Forks

Languages