Tacotron2 teacher forcing #43

superhg2012 · 2020-06-12T08:08:47Z

@dathudeptrai Training Sampler used teacher forching(ground_truth mel-spectrogram frames) during all training period. Did you met Explosure bias problem? audio quality always good in inference time?

dathudeptrai · 2020-06-12T08:20:28Z

@superhg2012 audio quality is good after 60k training steps. before that steps, the model bias on mel-groundtruth so the quality is not good when inference.

superhg2012 · 2020-06-12T08:21:39Z

if training to 200K, what about inferenced audio quality? will it be better?

dathudeptrai · 2020-06-12T08:23:30Z

@superhg2012 i trained to 120k and i saw the valid loss didn't decrease so i stopped it. I think the quality is very good at 120k now, don't u think so ?

superhg2012 · 2020-06-12T08:35:15Z

@superhg2012 i trained to 120k and i saw the valid loss didn't decrease so i stopped it. I think the quality is very good at 120k now, don't u think so ?

I trained Tacotron without window when inference, audio quality is statisfying. I want to know that whether window or monotonic constraint during inference help improve audio quanlity or only useful for fastspeech alignment

dathudeptrai · 2020-06-12T08:39:48Z

@superhg2012 window contraints use in case an alignment explode when inferencing very long sentences. But in my model, somehow it can be inference the samples > 3000 decoder steps ^^. I see the model without window contraint is better. For training fastspeech, you can đecode tacotron without alignment contraint and use output mel spectrogram of tacotron for fastspeech training, it's my FastSpeech V3 (significantly improve over FastSpeech V1 (window contraint + teacher forcing + mel groundtruth))

superhg2012 · 2020-06-12T08:54:44Z

Do you mean training Fastspeech V3 use predicted mel-spectrogram from Tacotron2 and alignments from Tacotron2 instead of ground truth mel-spectrorgam?

dathudeptrai · 2020-06-12T08:56:20Z

@superhg2012 yes. alignment from tacotron-2 120K without window masking trick and use predicted mel for fastspeech training. You can hear a audio samples on valid set, this is a significant improvement.

superhg2012 · 2020-06-12T09:02:28Z

thank u , your alignment were generated same way as mel-spectrogram? or genreated with GTA mode?

dathudeptrai · 2020-06-12T09:03:46Z

no teacher forcing, no window masking, save durations and mels at the same time, you need modify the code a bit :d

superhg2012 · 2020-06-12T09:12:36Z

All right, thank you! There are many differences with other open implementions of Fastspeech like Fastspeech. it's alignment are generated with teacher forcing. Also, In Fastspeech2 , they pointed out that using predicted mel-spectrogram from Teacher model(TransformerTTS) have some information loss compared with ground-truth ones, since the quality of the audio synthesized from the generated mel-spectrograms is usually worse than that from the ground-truth ones. So, puzzling....But I will try your idea.

dathudeptrai · 2020-06-12T09:28:50Z

@superhg2012 you can compared my results with other implementation to make decision :))). there are no puzzling here. On FastSpeech 1 they use alignment extracted from tacotron2 so they use predicted mel from tacotron to train fastspeech, it's make sense. On fastspeech 2, they use duration extracted from mel groundtruth by Montreal Forced Aligner so they use mel groundtruth to train :)). So, a durations and a mels should come from the same source :))).

superhg2012 · 2020-06-12T09:44:45Z

more clear now, thanks again

dathudeptrai self-assigned this Jun 12, 2020

dathudeptrai added the question ❓ Further information is requested label Jun 12, 2020

dathudeptrai added this to In progress in Tacotron 2 Jun 12, 2020

dathudeptrai added the Tacotron Tacotron related question. label Jun 12, 2020

dathudeptrai closed this as completed Jun 12, 2020

dathudeptrai moved this from In progress to Done in Tacotron 2 Jun 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tacotron2 teacher forcing #43

Tacotron2 teacher forcing #43

superhg2012 commented Jun 12, 2020 •

edited

dathudeptrai commented Jun 12, 2020

superhg2012 commented Jun 12, 2020

dathudeptrai commented Jun 12, 2020 •

edited

superhg2012 commented Jun 12, 2020 •

edited

dathudeptrai commented Jun 12, 2020 •

edited

superhg2012 commented Jun 12, 2020

dathudeptrai commented Jun 12, 2020 •

edited

superhg2012 commented Jun 12, 2020

dathudeptrai commented Jun 12, 2020

superhg2012 commented Jun 12, 2020

dathudeptrai commented Jun 12, 2020 •

edited

superhg2012 commented Jun 12, 2020

Tacotron2 teacher forcing #43

Tacotron2 teacher forcing #43

Comments

superhg2012 commented Jun 12, 2020 • edited

dathudeptrai commented Jun 12, 2020

superhg2012 commented Jun 12, 2020

dathudeptrai commented Jun 12, 2020 • edited

superhg2012 commented Jun 12, 2020 • edited

dathudeptrai commented Jun 12, 2020 • edited

superhg2012 commented Jun 12, 2020

dathudeptrai commented Jun 12, 2020 • edited

superhg2012 commented Jun 12, 2020

dathudeptrai commented Jun 12, 2020

superhg2012 commented Jun 12, 2020

dathudeptrai commented Jun 12, 2020 • edited

superhg2012 commented Jun 12, 2020

superhg2012 commented Jun 12, 2020 •

edited

dathudeptrai commented Jun 12, 2020 •

edited

superhg2012 commented Jun 12, 2020 •

edited

dathudeptrai commented Jun 12, 2020 •

edited

dathudeptrai commented Jun 12, 2020 •

edited

dathudeptrai commented Jun 12, 2020 •

edited