algorithmic_reverse_nlplike generator #57

ReDeiPirati · 2017-06-27T13:44:35Z

@lukaszkaiser I've struggled a bit around Zipf distribution because numpy.random.zipf is first of all a Zeta-distribution(are similar but not equal) and second doesn't allow to generate sample from a given range or chose alpha values less than 1.0. So i've followed the advices in this two stackoverflow posts(first and second) and created a function to generate the distribution and another for generating samples(both with test).
As i said in the closed issue, i have found that alpha(for the Zipf Distr) usually is in the range [1.1-1.6] for modelling natural text: so for generating sample which could potentially emulate nlp like task i've chosen(and tested) values that generate samples for all the range following the Zipf distribution.

Update

Update to v1.0.8

… following Zipf's LAw

lukaszkaiser

Great thanks! This looks almost ready, just 2 small questions. It's great to have this problem done :).

lukaszkaiser · 2017-06-28T03:23:58Z

tensor2tensor/data_generators/algorithmic.py

+
+  """
+  u = np.random.random(sample_len)
+  return [t+1 for t in np.searchsorted(distr_map, u)] # 0 pad and 1 EOS


Is 1 enough here for both PAD and EOS? Just asking, if it is, please add a comment making that clear.

It would be enough, but i have continued to think about this line which is a little bit tricky. From the numpy docs about numpy.random.random they said that the values returned are in the range [0.0,1.0) but obtain an absolute zero(0.00000...0) is possible but almost improbable. So, maybe it's better to add sanity check about the improbable zero. And comment in a more clear way.

lukaszkaiser · 2017-06-28T03:25:30Z

tensor2tensor/bin/t2t-datagen

+    "algorithmic_reverse_nlplike_decimal8K": (
+        lambda: algorithmic.reverse_generator_nlplike(8000, 40, 100000,
+                                                      10, 1.250),
+        lambda: algorithmic.reverse_generator_nlplike(8000, 400, 10000,


I think keeping length 40 for both train and dev makes more sense for nlplike tasks. On purely algorithmic tasks, we want to see generalization to much higher lengths. It's a nice-have in NLP too, but less important. Maybe just a little larger, like 60 or so?

Maybe 70 in train and 700 in dev, would it be better?

…ax_length and add __pycache__ entry in .gitignore

lukaszkaiser

Looks good, thanks!

ReDeiPirati and others added 8 commits June 21, 2017 14:22

Merge pull request #1 from tensorflow/master

f284ea4

Update

Merge pull request #2 from tensorflow/master

834c617

Update

Merge pull request #3 from tensorflow/master

3b244fa

Update

Merge pull request #4 from tensorflow/master

14796f2

Update

Merge pull request #5 from tensorflow/master

3b2fb4a

Update to v1.0.8

Update

38b9c11

Add new generator: algorithmic_reverse_nlplike that generates samples…

79f309a

… following Zipf's LAw

Add algorithmic_reverse_nlplike to PROBLEM_HPARAMS_MAP

759789b

lukaszkaiser suggested changes Jun 28, 2017

View reviewed changes

Clear comment, add sanity check, change algorithmic_reverse_nlplike m…

31f5dfa

…ax_length and add __pycache__ entry in .gitignore

lukaszkaiser approved these changes Jun 29, 2017

View reviewed changes

lukaszkaiser merged commit a2a6178 into tensorflow:master Jun 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

algorithmic_reverse_nlplike generator #57

algorithmic_reverse_nlplike generator #57

Uh oh!

ReDeiPirati commented Jun 27, 2017 •

edited

Loading

Uh oh!

lukaszkaiser left a comment

Uh oh!

lukaszkaiser Jun 28, 2017

Uh oh!

ReDeiPirati Jun 28, 2017

Uh oh!

lukaszkaiser Jun 28, 2017

Uh oh!

ReDeiPirati Jun 28, 2017

Uh oh!

lukaszkaiser left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

algorithmic_reverse_nlplike generator #57

algorithmic_reverse_nlplike generator #57

Uh oh!

Conversation

ReDeiPirati commented Jun 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukaszkaiser left a comment

Choose a reason for hiding this comment

Uh oh!

lukaszkaiser Jun 28, 2017

Choose a reason for hiding this comment

Uh oh!

ReDeiPirati Jun 28, 2017

Choose a reason for hiding this comment

Uh oh!

lukaszkaiser Jun 28, 2017

Choose a reason for hiding this comment

Uh oh!

ReDeiPirati Jun 28, 2017

Choose a reason for hiding this comment

Uh oh!

lukaszkaiser left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ReDeiPirati commented Jun 27, 2017 •

edited

Loading