Bug fixes in inference and data generation; faster token unescaping #162

vthorsteinsson · 2017-07-17T17:22:58Z

This PR contains three fixes/improvements:

In generator_utils.py, fixes regression where tokenizer.token_counts was referred to in the latest release of T2T but this variable/attribute does not exist.
In text_encoder.py, uses a regex to unescape tokens, which is much faster than the previous while loop. In the most common case, the regex matches nothing and the token string is passed through unchanged (into the final underscore cut-off).
In trainer_utils.py, when decoding from a dataset, estimator.predict() is (as of the latest release of T2T) called with as_iterable=False. This requires a change to the decoding loop, so that inputs, targets and outputs are correctly iterated.

lukaszkaiser

Looks good, thanks Villi.

vthorsteinsson added 6 commits July 7, 2017 17:24

Change mode to executable

617a794

Merge remote-tracking branch 'upstream/master'

3e4b686

Used regex in _unescape_token()

a2b1c60

Merge remote-tracking branch 'upstream/master'

fb808c3

Merge decoding of tokens using regex

d8d379c

Bug fixes in generator_utils and trainer_utils

e2ed8ed

lukaszkaiser approved these changes Jul 18, 2017

View reviewed changes

lukaszkaiser merged commit c91989c into tensorflow:master Jul 18, 2017

vthorsteinsson deleted the ice branch July 18, 2017 10:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug fixes in inference and data generation; faster token unescaping #162

Bug fixes in inference and data generation; faster token unescaping #162

Uh oh!

vthorsteinsson commented Jul 17, 2017

Uh oh!

lukaszkaiser left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bug fixes in inference and data generation; faster token unescaping #162

Bug fixes in inference and data generation; faster token unescaping #162

Uh oh!

Conversation

vthorsteinsson commented Jul 17, 2017

Uh oh!

lukaszkaiser left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants