deep-tts

Deep Learning based Text-to-Speech Synthesis (under construction)

##To Do List

Feature extraction
1. Linguistic: ourselves (or Festival as backup)
2. Acoustic: WORLD
State alignment:
1. Kaldi (Refer to Merlin: The frame alignment and state information was obtained from forced alignment using a monophone HMM-based system with 5 emitting states per phone. Based on my previous experience, obejective model measurements highly depend on a accurate alignment.)
2. HTS as backup
Neural network: MXNet (or CNTK as backup)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback