Skip to content

max_len doesnt crop samples properly #290

@FormMe

Description

@FormMe

Hi. It seems that max_len doesnt work properly.

mel_len should be mel_input_length_all.max(), not mel_input_length_all.min()
It leads that we select the maximum length as minimum length in batch. With this formula we will select max_len only when the minimum length in batch will be greater than max_len

mel_input_length_all = accelerator.gather(mel_input_length)  # for balanced load
mel_len = min([int(mel_input_length_all.min().item() / 2 - 1), max_len // 2])
mel_len_st = int(mel_input_length.min().item() / 2 - 1)

For example if max_len==400, maximum length of mels in batch was 600 and minimum is 92 with whis formula we assign mel_len=min(92, 400)= 92
Thus, all samples in clipped batch will be with maximum length of 92 because we do

gt.append(mels[bib, :, (random_start * 2) : ((random_start + mel_len) * 2)])

It means that we always train on samples with minimum lenght in tha batch. Here some shapes for example

print(mels.shape, gt.shape, st.shape, wav.shape)
torch.Size([32, 80, 662]) torch.Size([32, 80, 92]) torch.Size([32, 80, 96]) torch.Size([32, 27600])
torch.Size([32, 80, 434]) torch.Size([32, 80, 92]) torch.Size([32, 80, 92]) torch.Size([32, 27600])
torch.Size([32, 80, 844]) torch.Size([32, 80, 92]) torch.Size([32, 80, 92]) torch.Size([32, 27600])

27600/300=92 (300 is hop len)

Also random_start leads to cropping the begging of samples that less than max_len and using padding instead
More over we skip many of samples

if gt.shape[-1] < 80:
   continue

To fix it we should crop only samples which length is greater than max_len

Did I noticed the bug or I dont understand something?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions