Add session hook for benchmark metric logging. #3672

qlzh727 · 2018-03-20T18:11:57Z

Current hook is very similar as the LoggingTensorHook. Some of the
function are directly copied since the original one was not
exposed for import. We should seek to eventually move this code to
core when it is mature enough.

Current hook is very similar as the LoggingTensorHook. Some of the function are directly copied since the original one was not exposed for import. We should seek to eventually move this code to core when it is mature enough.

karmel

This looks sufficiently similar to LoggingTensorHook that I think we would be better off subclassing that, calling super for the methods with changes, and doing the post-work. We can overwrite an entire method if necessary. Thoughts?

karmel · 2018-03-20T23:06:32Z

official/utils/logging/metric_hook.py

+from __future__ import division
+from __future__ import print_function
+
+from official.utils.logging import logger


nit: local (official) imports should go below the third party.

The existing hook is similar enough to LoggingTensorHook, and we should eliminate duplicate as much as possible.

qlzh727 · 2018-03-21T00:10:27Z

Good point. Update to inherit LoggingTensorHook as parent.

karmel

This is exciting. Two notes:

Our lint tests are working! Please lint.
After this gets merged, we will want to add the module and its test to the build file.

karmel · 2018-03-21T03:24:56Z

official/utils/logging/metric_hook.py

+
+  This hook is very similar as tf.train.LoggingTensorHook, which logs given
+  tensors every N local steps, every N seconds, or at the end.
+


Some details on how it is different/what it is used for?

karmel · 2018-03-21T03:25:52Z

official/utils/logging/metric_hook.py

+      tensors: `dict` that maps string-valued tags to tensors/tensor names,
+          or `iterable` of tensors/tensor names.
+      log_dir: `string`, directory path that metric hook should write log to.
+      metric_logger: `BenchmarkLogger`, the benchmark logger that hook should


An instance of BL or a class?

karmel · 2018-03-21T03:29:34Z

official/utils/logging/metric_hook.py

+
+  def begin(self):
+    super(LoggingMetricHook, self).begin()
+    if tf.train.get_global_step() is None:


I suspect the graph can optimize this out, but, for clarity-- maybe call get_global_step once, assigned to a var, and then check and use that var below?

karmel · 2018-03-21T03:43:32Z

official/utils/logging/metric_hook.py

+    self._current_tensors[GLOBAL_STEP_TENSOR_NAME] = tf.train.get_global_step()
+
+  def after_run(self, unused_run_context, run_values):
+    if self._should_trigger:


Comments for all these methods would be helpful, since most people won't know what the parent class does.

karmel · 2018-03-21T03:45:26Z

official/utils/logging/metric_hook.py

+  def _log_metric(self, tensor_values):
+    self._timer.update_last_triggered_step(self._iter_count)
+    global_step = tensor_values[GLOBAL_STEP_TENSOR_NAME]
+    for tag in self._tag_order:


Probably worth noting that this comes from LoggingTensorHook, which captures the keys of tensors during init.

karmel · 2018-03-21T03:46:58Z

official/utils/logging/metric_hook_test.py

+import time
+
+from official.utils.logging import metric_hook
+import tensorflow as tf


import order

karmel · 2018-03-21T03:49:16Z

official/utils/logging/metric_hook_test.py

+from tensorflow.python.framework import constant_op
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import variables as variables_lib
+from tensorflow.python.training import monitored_session


I haven't validated this for all, but certainly many of the classes/functions used below that I see is available from the top-level tf import. Let's stick with that unless absolutely necessary.

Done. Thanks for the suggestion.

karmel · 2018-03-21T03:50:52Z

official/utils/logging/metric_hook_test.py

+      mon_sess = monitored_session._HookedSession(sess, [hook])
+      sess.run(variables_lib.global_variables_initializer())
+
+      # metric_log = os.path.join(self.log_dir, "metric.log")


nit: remove debugging lines

karmel · 2018-03-21T03:56:45Z

official/utils/logging/metric_hook.py

+    if tf.train.get_global_step() is None:
+      raise RuntimeError(
+          "Global step should be created to use LoggingMetricHook.")
+    self._current_tensors[GLOBAL_STEP_TENSOR_NAME] = tf.train.get_global_step()


Will this create a problem if someone happens to pass in the global step tensor, or a tensor by the same name? TF is often more cautious about placeholder vars that it adds in-- see, for example, https://github.com/tensorflow/tensorflow/blob/r1.6/tensorflow/python/estimator/inputs/numpy_io.py#L51 .

Understood, I get rid of the self defined name, and use the default ops name as the key. I will trust the if user put a tensor with that name, it will also be the global step tensor, otherwise they are kind of shooting on their own foot. On the other hand, if user put global step tensor as a input with different name, they will just get a extra metric logged, which does not provide much value for them.

karmel · 2018-03-21T04:01:02Z

official/utils/logging/metric_hook_test.py

+    train_op = constant_op.constant(3)
+
+    hook = metric_hook.LoggingMetricHook(
+        tensors=[t.name], every_n_secs=1.0, at_end=at_end,


Can we make sure to test the case of actually having multiple tensors passed in, and also passed in as a dict? Also, we use tensor names in all of these tests; let's make sure to test also with the tensors themselves, which are allowed according to the docstring.

Done. Added test with multi tensors.

1. Update global step tensor handle. 2. Update tests. 3. Update document.

karmel

One last comment, but looks good.

karmel · 2018-03-21T19:55:54Z

official/utils/logging/metric_hook.py

+      raise RuntimeError(
+          "Global step should be created to use LoggingMetricHook.")
+    if not self._current_tensors.has_key(ops.GraphKeys.GLOBAL_STEP):
+      self._current_tensors[ops.GraphKeys.GLOBAL_STEP] = global_step_tensor


I think you should be able to just say global_step_tensor.name here, which also allows us to remove the dependency on ops.

karmel · 2018-03-21T19:58:55Z

Ah, looks like some py3 errors as well; should be easy to resolve.

robieta · 2018-03-21T18:54:25Z

official/utils/logging/metric_hook.py

+    if global_step_tensor is None:
+      raise RuntimeError(
+          "Global step should be created to use LoggingMetricHook.")
+    if not self._current_tensors.has_key(ops.GraphKeys.GLOBAL_STEP):


.has_key() is deprecated in favor of "in".

robieta · 2018-03-21T19:45:21Z

official/utils/logging/metric_hook.py

+      values = session.run(self._current_tensors)
+      self._log_metric(values)
+
+  def _log_metric(self, tensor_values):


It would be nice to sync timestamps for different tensors in the same measurement. Right now the times are slightly off which could be annoying later.

{"name": "train_accuracy", "timestamp": "2018-03-21T12:40:03.460442Z", ... "global_step": 33} {"name": "learning_rate", "timestamp": "2018-03-21T12:40:03.460687Z", ... "global_step": 33}

Understood, I think Karmel also had similar comment about bulk logging metrics. Currently we still can align them via global_step. Will address this when in future change.

Add session hook for benchmark metric logging.

bf93f76

Current hook is very similar as the LoggingTensorHook. Some of the function are directly copied since the original one was not exposed for import. We should seek to eventually move this code to core when it is mature enough.

qlzh727 requested review from k-w-w, karmel and nealwu as code owners March 20, 2018 18:11

googlebot added the cla: yes label Mar 20, 2018

qlzh727 requested review from robieta and yhliang2018 March 20, 2018 18:30

karmel suggested changes Mar 20, 2018

View reviewed changes

Update metric_hook to use LoggingTensorHook as base.

595629a

The existing hook is similar enough to LoggingTensorHook, and we should eliminate duplicate as much as possible.

karmel suggested changes Mar 21, 2018

View reviewed changes

Address review comment.

c5c2f23

1. Update global step tensor handle. 2. Update tests. 3. Update document.

karmel approved these changes Mar 21, 2018

View reviewed changes

robieta reviewed Mar 21, 2018

View reviewed changes

qlzh727 added 2 commits March 21, 2018 13:29

Update tests for py3.

1e52ca6

Fix lint error

abe297e

qlzh727 merged commit 4b85dab into tensorflow:master Mar 21, 2018

qlzh727 deleted the model_test branch March 21, 2018 21:03

qlzh727 restored the model_test branch March 21, 2018 21:03

qlzh727 deleted the model_test branch May 3, 2018 17:20


		This hook is very similar as tf.train.LoggingTensorHook, which logs given
		tensors every N local steps, every N seconds, or at the end.

Add session hook for benchmark metric logging. #3672

Add session hook for benchmark metric logging. #3672

Uh oh!

Conversation

qlzh727 commented Mar 20, 2018

Uh oh!

karmel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qlzh727 commented Mar 21, 2018

Uh oh!

karmel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karmel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karmel commented Mar 21, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants