Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract duration from tacotron2 model #73

Closed
linhld0811 opened this issue Jul 1, 2020 · 3 comments
Closed

Extract duration from tacotron2 model #73

linhld0811 opened this issue Jul 1, 2020 · 3 comments
Assignees
Labels
FastSpeech FastSpeech related problems. question ❓ Further information is requested Tacotron Tacotron related question.

Comments

@linhld0811
Copy link

I don't send any issue to this comment. Following to the tutorial in training model fastspeech2, we have to extract the duration from alignment of tacotron2 model( on function get_duration_from_alignment on file extract_duration.py).
I just want to know what exactly of this term "duration". Anyone help me to figure out this definition?!

@dathudeptrai dathudeptrai added FastSpeech FastSpeech related problems. question ❓ Further information is requested Tacotron Tacotron related question. labels Jul 1, 2020
@dathudeptrai dathudeptrai added this to In progress in Tacotron 2 Jul 1, 2020
@dathudeptrai dathudeptrai added this to In progress in FastSpeech2 Jul 1, 2020
@trfnhle
Copy link
Collaborator

trfnhle commented Jul 1, 2020

Character and Mel-spectrogram have different lengths, len(character)< len(mel). In fast speech, we mapping from characters to mel so each character will represent for a chunk mel and length of this chunk, we call duration.

@linhld0811
Copy link
Author

linhld0811 commented Jul 1, 2020

I have a question: Can i use time alignment( duration in ms of each phoneme which is extracted by kaldi) for transform to duration format that you use for training fastspeech2?!

@azraelkuan
Copy link
Collaborator

azraelkuan commented Jul 1, 2020

@linhld0811 yes, but you should use the same phoneme in tts as you use in kalid.
check this: https://montreal-forced-aligner.readthedocs.io/
and this example file for ljspeech: https://github.com/ivanvovk/DurIAN/blob/master/filelists/train_filelist.txt

@trfnhle trfnhle closed this as completed Jul 1, 2020
@dathudeptrai dathudeptrai moved this from In progress to Done in Tacotron 2 Jul 1, 2020
@dathudeptrai dathudeptrai moved this from In progress to Done in FastSpeech2 Jul 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FastSpeech FastSpeech related problems. question ❓ Further information is requested Tacotron Tacotron related question.
Projects
FastSpeech2
  
Done
Tacotron 2
  
Done
Development

No branches or pull requests

4 participants