Added a librispeech data generator. #419

wingsbr · 2017-11-14T15:27:36Z

Added a generator for the librispeech datasets and included it in the supported generators in t2t-datagen.

Like the audio / TIMIT generator, this is dependent upon sox for WAV file generation.

googlebot · 2017-11-14T15:27:39Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If your company signed a CLA, they designated a Point of Contact who decides which employees are authorized to participate. You may need to contact the Point of Contact for your company and ask to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the project maintainer to go/cla#troubleshoot.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again.

googlebot · 2017-11-14T15:35:53Z

CLAs look good, thanks!

mschonwe · 2017-11-15T14:41:48Z

@vince62s -- I work with @wingsbr. I was collaborating with Archy de Berker via gitter. Looks like a co-worker of his (Majid) of ended up posting another LibriSpeech data generator (which I haven't looked at yet).

To your question about MFCC, I thought for the 'problem' it would be best to leave pre-processing/spectral featurization/MFCC up to user, to allow for experimentation. We'll make sure to link up with Majid to avoid duplication if/when we code up the modality bottom to handle these transformations.

vince62s · 2017-11-15T15:58:33Z

Thanks for your feedback. As it is now (and it was similar for WSJ) I have the impression it is trying to take frames directly as inputs, which is really disturbing ...

mschonwe · 2017-11-15T16:16:37Z

Agreed. It is an important TODO.

I haven't seen good results published from working from raw waveform. However, there are many papers with good results starting from spectral features. But even here there is a lot of room for experimentation (e.g., mel-scale, #of bins, window-size, etc.).

I think going to MFCC is over-engineering the features, and it would be better to let the NN to derive its own features from the spectrum. In any case, different researchers, and different domains may warrant different approaches. Hence, I would argue against pushing the encoding into the TFRecords.

mjlaali · 2017-11-15T16:37:44Z

@mschonwe Sorry, I did not see you pull request before sending mine. I closed mine.

Regarding MFCC, I suggest that to put the pre-processing in problem.Problem class and as a dataset is generated, it saved in the correct format. Something similar to input space id.

mschonwe · 2017-11-15T16:58:14Z

@mjlaali No worries - let's chat on Gitter to coordinate incorporating the signal processing.
Thankfully the ops required are now part of TF1.4 (https://www.tensorflow.org/api_docs/python/tf/contrib/signal).

zh794390558

why not register this Problem rather than put it into _SUPPORTED_PROBLEM_GENERATORS?

wingsbr · 2017-11-20T22:39:51Z

@zh794390558 Good idea. I expanded the librispeech generator to include a problem and modality and registered those rather than using _SUPPORTED_PROBLEM_GENERATORS.

zh794390558 · 2017-11-21T08:34:28Z

tensor2tensor/data_generators/librispeech.py

        "http://www.openslr.org/resources/12/dev-other.tar.gz",
        "dev-other"
    ],
+]'''


Why comment above code ?

zh794390558 · 2017-11-21T08:37:20Z

tensor2tensor/data_generators/librispeech.py

+class LibrispeechTextEncoder(text_encoder.TextEncoder):
+
+  def encode(self, s):
+    return [ord[c] for c in s]


should inlcued self._num_reserved_ids

ord[c] is wrong.

why not encode like self._num_reserved_ids + i ?

Ah, good catch, I'll fix that syntax. Regarding, num_reserved_ids, I based it on the timit generator:

https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/audio.py#L150

which doesn't offset for num_reserved_ids, but you're right that it makes sense, so I will do so here.

zh794390558 · 2017-11-21T08:44:26Z

tensor2tensor/data_generators/librispeech.py

+  def example_reading_spec(self):
+    data_fields = {
+        "inputs": tf.VarLenFeature(tf.int64),
+        #"audio/channel_count": tf.FixedLenFeature([], tf.int64),


this can be reserved！

I'm sorry, I don't understand. What are you suggesting?

zh794390558 · 2017-11-21T08:45:13Z

tensor2tensor/data_generators/librispeech.py

+
+
+  def generator(self, data_dir, tmp_dir, training, eos_list=None, start_from=0, how_many=0):
+    eos_list = [1]


Good catch, I had meant to fix that but forgot. Doing so now.

zh794390558 · 2017-11-21T08:47:01Z

tensor2tensor/data_generators/librispeech.py

+  def hparams(self, defaults, unused_model_hparams):
+    p = defaults    
+    p.stop_at_eos = int(False)
+    p.input_modality = { "inputs": ("audio:librispeech_modality", None) }      


registry.Modalities.AUDIO

Wouldn't that result in the base Audio modality being used, and bypass the custom signal processing added to bottom()?

zh794390558 · 2017-11-21T08:48:09Z

tensor2tensor/data_generators/librispeech.py

+    return problem.SpaceID.EN_CHR
+
+  @property
+  def num_shards(self):


transformer_base_single_gpu, does this used?

I'm not sure I understand the question, but I used transformer_base_single_gpu as the basis for the hparams because that was what was referenced in all of the examples:

https://github.com/tensorflow/tensor2tensor/blob/master/docs/new_problem.md
https://github.com/tensorflow/tensor2tensor/blob/master/docs/walkthrough.md
https://github.com/tensorflow/tensor2tensor/blob/master/README.md

lukaszkaiser

Great thanks guys! Let's get it in and see how it trains :).

zh794390558 · 2017-11-24T10:45:47Z

@wingsbr which paper do you based on for your experiment? Maybe we have some common .

mschonwe · 2017-11-26T13:37:49Z

@zh794390558 most of our work has been on Listen, Attend and Spell with various enhancements. We were about to start working on implementing arxiv.org/pdf/1610.03022v1.pdf which, in part, uses convolutions rather than the pblstm of LAS.

We wanted to give the 'all you need is attention' transformer model a try, and utilize the framework extensions that t2t offers. So far we haven't gotten the transformer model to do much more than learn the LM from the labels.

If the transformer model isn't viable for this task, perhaps we can collaborate implementing a t2t Problem for convolutional+rnn model (like 1610.03022v1).

zh794390558 · 2017-12-10T08:10:44Z

@mschonwe Sorry for later response. I work on Mandarin, also use the Listen, Attend and Spell model, now
want to try some tricks in T2T which successfully used in NMT, including 'Attention is all you need'. I think 'convolutional+rnn model' maybe a good attempt.

Added a librispeech data generator.

0c026d2

.

75ec0f6

zh794390558 reviewed Nov 20, 2017

View reviewed changes

wingsbr added 2 commits November 20, 2017 16:32

Expanded to include librispeech Problem and Modality.

5365113

Added librispeech to data_generators/all_problems.py

844df4d

zh794390558 reviewed Nov 21, 2017

View reviewed changes

wingsbr added 2 commits November 21, 2017 09:07

Switched to full librispeech datasets.

98c7b41

Variety of fixes based on PR comments.

23129f2

lukaszkaiser approved these changes Nov 23, 2017

View reviewed changes

lukaszkaiser merged commit 92983ea into tensorflow:master Nov 23, 2017



		def generator(self, data_dir, tmp_dir, training, eos_list=None, start_from=0, how_many=0):
		eos_list = [1]

Added a librispeech data generator. #419

Added a librispeech data generator. #419

Uh oh!

Conversation

wingsbr commented Nov 14, 2017

Uh oh!

googlebot commented Nov 14, 2017

Uh oh!

googlebot commented Nov 14, 2017

Uh oh!

mschonwe commented Nov 15, 2017

Uh oh!

vince62s commented Nov 15, 2017

Uh oh!

mschonwe commented Nov 15, 2017

Uh oh!

mjlaali commented Nov 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mschonwe commented Nov 15, 2017

Uh oh!

zh794390558 left a comment

Choose a reason for hiding this comment

Uh oh!

wingsbr commented Nov 20, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zh794390558 Nov 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukaszkaiser left a comment

Choose a reason for hiding this comment

Uh oh!

zh794390558 commented Nov 24, 2017

Uh oh!

mschonwe commented Nov 26, 2017

Uh oh!

zh794390558 commented Dec 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

mjlaali commented Nov 15, 2017 •

edited

Loading

zh794390558 Nov 21, 2017 •

edited

Loading

zh794390558 commented Dec 10, 2017 •

edited

Loading