Skip to content

warisqr007/vq-bnf

Repository files navigation

Vector Quantize PPGs/Bottleneck features

Code for vector quantizing speech dataset, including melspectrograms, phonetic-posteriorgrams/bottleneck features(BNFs). This repo trains an independent module to vector quantize BNFs.

For usage in voice conversion, see here

Installation

  • Install ffmpeg.
  • Install Kaldi
  • Install PyKaldi
  • Install packages using environment.yml file.
  • Download pretrained TDNN-F model, extract it, and set PRETRAIN_ROOT in kaldi_scripts/extract_features_kaldi.sh to the pretrained model directory.

Dataset

  • Acoustic Model: LibriSpeech. Download pretrained TDNN-F acoustic model here.
    • You also need to set KALDI_ROOT and PRETRAIN_ROOT in kaldi_scripts/extract_features_kaldi.sh accordingly.
  • Vector Quantization: [ARCTIC and L2-ARCTIC, see here for detailed training process.

All the pretrained the models are available (To be updated) here

Directory layout (Format your dataset to match below)

datatset_root
├── speaker 1
├── speaker 2 
│   ├── wav          # contains all the wav files from speaker 2
│   └── kaldi        # Kaldi files (auto-generated after running kaldi-scripts
.
.
└── speaker N

Quick Start

See the inference script

Training

  • Use Kaldi to extract BNF for individual speakers (Do it for all speakers)
./kaldi_scripts/extract_features_kaldi.sh /path/to/speaker
  • Preprocessing
python preprocess_bnfs.py path/to/dataset
python python make_data_all.py  #Edit the file to specify dataset path
  • Setting Training params. See conf/

  • Training VQ Model

./train.sh