Add official flag-parsing and benchmarking logging utils to Transformer #4163

k-w-w · 2018-05-03T22:12:35Z

No description provided.

qlzh727 · 2018-05-03T22:24:38Z

official/transformer/data_download.py


  FLAGS, unparsed = parser.parse_known_args()
-  main(sys.argv)
+  tf.app.run()


Since this is a standalone library that download and process the data, tf.app.run() does not do much here. If we want to use all the tf lib here, probably we should change the flag parse part also to absl flags, so that its consistent across the module.

qlzh727 · 2018-05-03T22:34:50Z

official/transformer/transformer_main.py

+      ])
+      bleu_writer.add_summary(summary, global_step)
+      bleu_writer.flush()
+      benchmark_logger.log_metric(


What's the difference between the eval_results and the value here?

The results from estimator.evaluate() are based on an approximate translation (long story short, approx. translations are heavily based on the target golden values provided).
The function evaluate_and_log_bleu uses the estimator.predict() path to compute the translations, where the golden values are not provided. The translations are compared to reference file to get the actual bleu score.

Can this be added in a comment?

qlzh727 · 2018-05-03T22:42:54Z

official/transformer/transformer_main.py

+      raise ValueError("Vocabulary file %s does not exist" % vocab_file_path)
+
+
+def main(_):


I think you can follow taylor's change in imagenet, which creates a wrapper of main function, so that flag object can be injected

qlzh727 · 2018-05-03T22:45:08Z

official/transformer/transformer_main.py

+      batch_size=params.batch_size  # for ExamplesPerSecondHook
+  )
+  benchmark_logger = logger.config_benchmark_logger(flags_obj.benchmark_log_dir)
+  benchmark_logger.log_run_info("transformer")


The interface is updated with some extra params, please rebase and add value if needed.

Oooh +1 to saving the params. Very nice

robieta · 2018-05-03T22:40:47Z

official/transformer/README.md

+     * `--steps_between_evals`: Number of training steps to run between evaluations.

-   Only one of `train_epochs` or `train_steps` may be set. Since the default option is to evaluate the model after training for an epoch, it may take 4 or more hours between model evaluations. To get more frequent evaluations, use the flags `--train_steps=250000 --steps_between_eval=1000`.
+   Only one of `train_epochs` or `train_steps` may be set. Since the default option is to evaluate the model after training for an epoch, it may take 4 or more hours between model evaluations. To get more frequent evaluations, use the flags `--train_steps=250000 --steps_between_evals=1000`.


Absl will enforce constraints on flags. For instance after defining the flags, the validator code looks like:

msg = "--train_steps and --train_epochs were set. Only one may be defined." @flags.multi_flags_validator(["train_epochs", "train_steps"], message=msg) def _check_train_limits(flag_dict): return flag_dict["train_epochs"] is None or flag_dict["train_steps"] is None

And a similar for your other checks. (There is also a single flag validator.)

robieta · 2018-05-03T22:47:50Z

official/transformer/compute_bleu.py


  FLAGS, unparsed = parser.parse_known_args()
-  main(sys.argv)
+  tf.app.run()


Why is this file using:

argparse (Just for consistency)

tf.app.run instead of absl.app.run (tf.app.run silently swallows typos.)

Changed to use absl

robieta · 2018-05-03T22:48:21Z

official/transformer/data_download.py


  FLAGS, unparsed = parser.parse_known_args()
-  main(sys.argv)
+  tf.app.run()


ditto on argparse and tf.app.run.

robieta · 2018-05-03T22:51:26Z

official/transformer/transformer_main.py


-  # Print out training schedule
+  # Print details of training schedule.
  print("Training schedule:")


for consistency with the rest of official these prints should probably be tf.logging.info's

robieta · 2018-05-03T22:54:59Z

official/transformer/transformer_main.py

+def define_transformer_flags():
+  """Add flags for running transformer_main."""
+  # Add common flags (data_dir, model_dir, train_epochs, etc.).
+  flags_core.define_base(multi_gpu=False, export_dir=False)


you also want num_gpu=False.

robieta · 2018-05-03T22:56:30Z

official/transformer/transformer_main.py

+                          train_epochs=None)
+
+
+def validate_flags(flags_obj):


As noted above this can go away.

robieta · 2018-05-03T22:59:03Z

official/transformer/transformer_main.py

-    if not tf.gfile.Exists(FLAGS.bleu_ref):
-      raise ValueError("BLEU source file %s does not exist" % FLAGS.bleu_ref)
+  # Define parameters based on flags
+  if flags_obj.params == "base":


This might be clearer as a global dict

PARAMS_MAP = { "base": model_params.TransformerBaseParams, "big": model_params.TransformerBigParams, }

robieta · 2018-05-03T23:01:44Z

official/transformer/transformer_main.py

-      raise ValueError("BLEU source file %s does not exist" % FLAGS.bleu_source)
-    if not tf.gfile.Exists(FLAGS.bleu_ref):
-      raise ValueError("BLEU source file %s does not exist" % FLAGS.bleu_ref)
+  # Define parameters based on flags


I really like this pattern of packaging hyperparameters. Very clean.

robieta · 2018-05-03T23:03:26Z

official/transformer/transformer_main.py

+  params.epochs_between_evals = flags_obj.epochs_between_evals
  params.repeat_dataset = single_iteration_train_epochs
-
+  if flags_obj.batch_size is not None:


params.batch_size = flags_obj.batch_size or params.batch_size? (Since we don't need to respect 0 as a legitimate batch size)

robieta · 2018-05-03T23:04:38Z

official/transformer/transformer_main.py

  estimator = tf.estimator.Estimator(
-      model_fn=model_fn, model_dir=FLAGS.model_dir, params=params)
+      model_fn=model_fn, model_dir=flags_obj.model_dir, params=params)
  train_schedule(


As long as these are already nice and broken out, may I request that they be kwargs?

k-w-w · 2018-05-04T01:50:16Z

Whoops, pushed a bunch of unrelated commits. Rebased them away.

@qlzh727 @robieta Thanks for the initial reviews, I've made the requested changes. PTAL

k-w-w · 2018-05-04T01:57:20Z

official/transformer/transformer_main.py

-    return tf.gfile.Exists(flags_dict["bleu_source"]) and (
-        tf.gfile.Exists(flags_dict["bleu_ref"])) and (
-        tf.gfile.Exists(vocab_file_path))
+    return (


Does anyone know how to make this look better? It is difficult to please the lint overlords.

return all([ tf.gfile.Exists(flags_dict["bleu_source"]), tf.gfile.Exists(flags_dict["bleu_ref"]), tf.gfile.Exists(vocab_file_path), ])

karmel

We should start thinking about the checklist for official models...

Are all base flags enabled?
Is benchmarking enabled?
Are all file references Gfile?
Is a savedmodel exported?

karmel · 2018-05-04T17:40:18Z

official/transformer/compute_bleu.py

-  FLAGS, unparsed = parser.parse_known_args()
-  main(sys.argv)
+  flags.DEFINE_string(
+      name="translation", short_name="t", default=None,


The collision of abbreviations is inevitable at this point. Maybe we should make a rule that we don't have abbreviations for flags defined within model modules. Else, we might add a flag to the base set, conflict, and not realize it until someone tries to run an inheriting model and it either errors out at arg load time, or treats flags in strange ways. Thoughts, @robieta ?

I think at this point one letter abbreviations are too dangerous to add, because they may collide not only with our own flags but flags defined who knows where. Two letters seems reasonably safe.

If we're doing an end-to-end test it will blow up for collisions, so I'm less worried that we will accidentally have internal collisions.

Is there an e2e test for every model? If not, should that be a requirement in our Official Model checklist?

Right now mnist is missing it. It requires synthetic to be hermetic which I think is why mnist doesn't have it yet.

I definitely think it is something we want for every model.

karmel · 2018-05-04T17:45:31Z

official/transformer/transformer_main.py

+      ])
+      bleu_writer.add_summary(summary, global_step)
+      bleu_writer.flush()
+      benchmark_logger.log_metric(


Can this be added in a comment?

karmel · 2018-05-04T17:48:09Z

official/transformer/transformer_main.py

+  )
+  benchmark_logger = logger.config_benchmark_logger(flags_obj.benchmark_log_dir)
+  benchmark_logger.log_run_info(
+      "transformer", "wmt_translate_ende", params.__dict__)


maybe include kwarg names here for clarity?

k-w-w · 2018-05-04T18:57:40Z

@karmel Thanks for the review. I'll add the savedmodel export in a separate PR, to keep changes relatively related to flag parsing + logging.

k-w-w · 2018-05-08T17:10:43Z

Bump, PTAL

robieta

Just a few minor comments around flags.

robieta · 2018-05-08T18:22:26Z

official/transformer/compute_bleu.py

+      help=flags_core.help_wrap("File containing reference translation."))
+  flags.mark_flag_as_required("reference")
+
+  flags.DEFINE_multi_enum(


Can you add a both option and propagate that through? (and make this just a DEFINE_enum) with this one would have to input
--bleu_variant cased --bleu_variant uncased, and may be confused when
--bleu_variant cased uncased doesn't compute uncased. Unfortunately absl doesn't have a great way to define a list of enumerables.

robieta · 2018-05-08T18:24:08Z

official/transformer/compute_bleu.py

+          "Specify one or more BLEU variants to calculate. Variants: \"cased\" "
+          "or \"uncased\"."))
+
+  FLAGS = flags.FLAGS


Can you encapsulate everything the same way you did in transformer_main?

What is the reason for encapsulating everything?

In part it is conceptually easier if functions are pure, and in part because we may well want to use this in a way other than calling the file directly.

Oooh I see. Thanks for the explanation.

robieta · 2018-05-08T18:35:39Z

official/transformer/transformer_main.py

-    if FLAGS.train_epochs is None:
-      FLAGS.train_epochs = DEFAULT_TRAIN_EPOCHS
-    train_eval_iterations = FLAGS.train_epochs // FLAGS.epochs_between_eval
+    if flags_obj.train_epochs is None:


We have adopted a convention of not modifying flags objects, and instead calling getter functions to retrieve values. See official.utils.flags._performance.get_loss_scale() as an example.

Yes, it makes sense not to alter the value of the flags after. It would great if these conventions can be listed (maybe as a sub-checkbox in karmel's list above).

Would you mind adding an "immutability" section to flags/README.md?

robieta · 2018-05-08T20:37:12Z

official/transformer/data_download.py

 import random
-import sys
 import tarfile
 import urllib


This is breaking python3. Need to use six.moves.urllib

k-w-w · 2018-05-08T23:06:49Z

@robieta Thanks, for the comments.

I edited compute_bleu, but I'm not sure I prefer this version to how it was before. Currently there isn't really a use case for calling the compute_bleu.main (or the added compute_bleu.compute_bleu) outside of running compute_bleu.py. The function bleu_wrapper does all of the work in computing the bleu score. The main function just prints it out.

In general, I don't think we should require a encapsulating function for scripts like compute_bleu.py or data_download.py, unless there is a foreseeable use case.

robieta

You know what, I was wrong on the compute_belu encapsulation. You have my blessing to change it back. Sorry for making you go through the trouble.

k-w-w · 2018-05-09T17:47:09Z

@robieta Thanks
@qlzh727 Do the changes look good to you?

qlzh727 · 2018-05-09T17:43:56Z

official/transformer/compute_bleu.py

+  """Print out the BLEU scores calculated from the files defined in flags."""
+  if flags_obj.bleu_variant in ("both", "uncased"):
+    score = bleu_wrapper(flags_obj.reference, flags_obj.translation, False)
    print("Case-insensitive results:", score)


I think a following change will probably needed to change all the print to tf.logging for consistency.

qlzh727 · 2018-05-09T17:46:33Z

official/transformer/data_download.py

+    inprogress_filepath, _ = urllib.request.urlretrieve(
        url, inprogress_filepath, reporthook=download_report_hook)
    # Print newline to clear the carriage return from the download progress.
    print()


Let's grep and replace all the print() into tf.logging.info

The download progress rewrites the line to update the progress. I don't think it works well with tf.logging.info.

ah, sorry for missing that.

qlzh727 · 2018-05-09T17:49:35Z

official/transformer/transformer_main.py

+
+  # Add transformer-specific flags
+  flags.DEFINE_enum(
+      name="params", short_name="mp", default="big",


Name a parameter to "params" is quite evil since it does not provide much context. How about rename this into param_set or param_template so that its more explicit.

Evil is a bit harsh, don't you think?

I do agree that the argument name is vague. I'll change it to param_set.

Sorry for being too harsh. param_set SGTM.

qlzh727 · 2018-05-09T17:52:56Z

official/transformer/transformer_main.py

+      name="params", short_name="mp", default="big",
+      enum_values=["base", "big"],
+      help=flags_core.help_wrap(
+          "Parameter set to use when creating and training the model."))


I think its also worth mentioning all the individual param values under this umbrella as well.

I left it out because the absl -h option shows the possible enum values.

This is what the help text looks like:

-mp,--params: <base|big>: Parameter set to use when creating and training the model. (default: 'big')

What i am trying to say is param_set=base will populate other a, b and c param, which is not showing up the help txt.

My bad, I completely misread that. Yes, I think it would be good to see the individual param values. Maybe not all of them, but at least the ones that change between the big and base parameter sets.

Something like setting params_set=big increases the default batch size, as well as the hidden_size, filter_size, and num_heads topology hyperparameters.\nSee transformer/model/model_params.py for details.?

Yes, like that. I've pushed the change that updates the help text

qlzh727 · 2018-05-09T17:56:40Z

official/transformer/transformer_main.py

-    if FLAGS.train_epochs is None:
-      FLAGS.train_epochs = DEFAULT_TRAIN_EPOCHS
-    train_eval_iterations = FLAGS.train_epochs // FLAGS.epochs_between_eval
+    if flags_obj.train_epochs is not None:


i think this if-else can be combined into:

train_epochs = flags_obj.train_epochs or DEFAULT_TRAIN_EPOCHS

That's much cleaner, thanks

qlzh727 · 2018-05-09T17:59:45Z

official/transformer/transformer_main.py

+  benchmark_logger.log_run_info(
+      model_name="transformer",
+      dataset_name="wmt_translate_ende",
+      run_params=params.__dict__)


I think its better to not log all the params, since it includes noises like data_dir. Maybe a more explicit set of param is better here.

I'm thinking logging all the params might be useful (including data_dir) because it allows the run to be reproduced with the same dataset files as well as hyperparameters.

k-w-w · 2018-05-09T18:48:08Z

@qlzh727 Thanks again for all of the comments. They were very helpful. I've pushed the changes requested.

qlzh727 · 2018-05-09T18:48:32Z

Please wait for the comments from karmel@ if there is any.

…er (tensorflow#4163)

k-w-w requested review from qlzh727 and robieta May 3, 2018 22:12

k-w-w requested review from a team and karmel as code owners May 3, 2018 22:12

googlebot added the cla: yes label May 3, 2018

qlzh727 requested changes May 3, 2018

View reviewed changes

robieta suggested changes May 3, 2018

View reviewed changes

k-w-w requested review from derekjchow, jch1 and pkulzc as code owners May 4, 2018 01:46

k-w-w force-pushed the t-utils branch from c2344bc to d3b349f Compare May 4, 2018 01:48

k-w-w removed request for derekjchow, jch1 and pkulzc May 4, 2018 01:48

k-w-w commented May 4, 2018

View reviewed changes

karmel reviewed May 4, 2018

View reviewed changes

k-w-w added 12 commits May 4, 2018 11:48

some preliminary changes to transformer

ddd393c

more changes

fc8d589

finishing adding flags to transformer

3fe53dd

Add benchmark logger

6be42aa

add benchmark logger, readme changes

b0dcfc2

lints

eb776ab

lints

7c788a1

address PR comments

fd79d91

lints

c8e7a16

lint...

612bf13

lint

6837808

Address PR comments

68e9ba9

k-w-w force-pushed the t-utils branch from 546d078 to 68e9ba9 Compare May 4, 2018 18:49

fix lint

a373828

robieta suggested changes May 8, 2018

View reviewed changes

robieta reviewed May 8, 2018

View reviewed changes

k-w-w added 3 commits May 8, 2018 15:37

address review comments

2d8a94b

lint issue

2faecc4

address review comment

1c74943

robieta approved these changes May 9, 2018

View reviewed changes

k-w-w added 2 commits May 9, 2018 10:45

removing changes from compute bleu

c55af3c

removing changes from compute bleu

426b480

qlzh727 requested changes May 9, 2018

View reviewed changes

k-w-w added 3 commits May 9, 2018 11:27

address pr comments

f19c8a9

update params to param_set

a06c8b1

update --param_set help text

f6b631b

qlzh727 approved these changes May 9, 2018

View reviewed changes

k-w-w merged commit a84e1ef into master May 11, 2018

k-w-w deleted the t-utils branch May 11, 2018 18:02

omegafragger pushed a commit to omegafragger/models that referenced this pull request May 15, 2018

Add official flag-parsing and benchmarking logging utils to Transform…

db3b3e6

…er (tensorflow#4163)

		raise ValueError("Vocabulary file %s does not exist" % vocab_file_path)


		def main(_):

Add official flag-parsing and benchmarking logging utils to Transformer #4163

Add official flag-parsing and benchmarking logging utils to Transformer #4163

Uh oh!

Conversation

k-w-w commented May 3, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k-w-w commented May 4, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karmel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k-w-w commented May 4, 2018

Uh oh!

k-w-w commented May 8, 2018

Uh oh!

robieta left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

karmel left a comment •

edited

Loading