Skip to content
This repository was archived by the owner on Jul 7, 2023. It is now read-only.

Conversation

@vthorsteinsson
Copy link
Contributor

This PR contains three fixes/improvements:

  1. In generator_utils.py, fixes regression where tokenizer.token_counts was referred to in the latest release of T2T but this variable/attribute does not exist.

  2. In text_encoder.py, uses a regex to unescape tokens, which is much faster than the previous while loop. In the most common case, the regex matches nothing and the token string is passed through unchanged (into the final underscore cut-off).

  3. In trainer_utils.py, when decoding from a dataset, estimator.predict() is (as of the latest release of T2T) called with as_iterable=False. This requires a change to the decoding loop, so that inputs, targets and outputs are correctly iterated.

Copy link
Contributor

@lukaszkaiser lukaszkaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks Villi.

@lukaszkaiser lukaszkaiser merged commit c91989c into tensorflow:master Jul 18, 2017
@vthorsteinsson vthorsteinsson deleted the ice branch July 18, 2017 10:46
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants