Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Sampler to create minibatches #3

Open
vimalmanohar opened this issue Sep 28, 2018 · 0 comments
Open

Implement Sampler to create minibatches #3

vimalmanohar opened this issue Sep 28, 2018 · 0 comments
Labels
preliminary Preliminary work for testing; we might modify later (such as moving to C++ implementation) python-only Requires only python coding (no C++)

Comments

@vimalmanohar
Copy link
Owner

vimalmanohar commented Sep 28, 2018

Sampler to create minibatches based on the length of the utterances so that similar length utterances are together.
This needs to be moved to a separate module outside the loss function module.

It has access to a list of utterances sorted by length so that same-size batches can be created easily.

  • Create batches of appropriate size.
  • Return randomly sampled batches.
  • Random sampling within a batch.
  • Different length utterances should have different minibatch sizes (This might be different from the usual approach which uses same minibatch size. We have to use the kaldi approach for this).

[Note from Dan: we could consider padding with silence or speed-perturbing (like Hossein does) to make sure the utterance lengths are not all distinct. Also, I’d like to be able to support chunks of utterances, but this is not a hurry right now. You may have to have a mechanism to use different minibatch sizes for different utterance lengths, if you have very variable utterance lengths.]

@vimalmanohar vimalmanohar added the python-only Requires only python coding (no C++) label Sep 28, 2018
@vimalmanohar vimalmanohar added the preliminary Preliminary work for testing; we might modify later (such as moving to C++ implementation) label Sep 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preliminary Preliminary work for testing; we might modify later (such as moving to C++ implementation) python-only Requires only python coding (no C++)
Projects
None yet
Development

No branches or pull requests

1 participant