-
Notifications
You must be signed in to change notification settings - Fork 658
Description
Hi. It seems that max_len doesnt work properly.
mel_len should be mel_input_length_all.max(), not mel_input_length_all.min()
It leads that we select the maximum length as minimum length in batch. With this formula we will select max_len only when the minimum length in batch will be greater than max_len
mel_input_length_all = accelerator.gather(mel_input_length) # for balanced load
mel_len = min([int(mel_input_length_all.min().item() / 2 - 1), max_len // 2])
mel_len_st = int(mel_input_length.min().item() / 2 - 1)
For example if max_len==400, maximum length of mels in batch was 600 and minimum is 92 with whis formula we assign mel_len=min(92, 400)= 92
Thus, all samples in clipped batch will be with maximum length of 92 because we do
gt.append(mels[bib, :, (random_start * 2) : ((random_start + mel_len) * 2)])
It means that we always train on samples with minimum lenght in tha batch. Here some shapes for example
print(mels.shape, gt.shape, st.shape, wav.shape)
torch.Size([32, 80, 662]) torch.Size([32, 80, 92]) torch.Size([32, 80, 96]) torch.Size([32, 27600])
torch.Size([32, 80, 434]) torch.Size([32, 80, 92]) torch.Size([32, 80, 92]) torch.Size([32, 27600])
torch.Size([32, 80, 844]) torch.Size([32, 80, 92]) torch.Size([32, 80, 92]) torch.Size([32, 27600])
27600/300=92 (300 is hop len)
Also random_start leads to cropping the begging of samples that less than max_len and using padding instead
More over we skip many of samples
if gt.shape[-1] < 80:
continue
To fix it we should crop only samples which length is greater than max_len
Did I noticed the bug or I dont understand something?