Add golden tests to official. #3723

robieta · 2018-03-23T20:33:36Z

Added a simple subclass of tf.test.TestCase to allow specification of reference (gold standard) behavior and detect when layer definitions change. This test class is designed to be less brittle than the previous resnet_test.py by restoring weights instead of relying on tensorflow's RNG which can change with implementation.

For instance with respect to the recent batch norm change in TensorFlow, the test issues a warning (since the graph changed), but does not fail.

qlzh727

Haven't check the test for golden yet, but some comment to address first.

qlzh727 · 2018-03-23T20:38:02Z

official/resnet/layer_test.py

@@ -0,0 +1,206 @@
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.


nit: 2018. I hope we have some macro to generate this, and people don't have to copy paste from random places.

I absolutely copy-paste. If there is a better way I'd love to know.

qlzh727 · 2018-03-23T20:39:33Z

official/resnet/layer_test.py

+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+


I think we are quite loose for import format. Usually one line break is enough between imports. The only place need 2 lines are between classes.

Neat. It did look quite awkward.

qlzh727 · 2018-03-23T20:43:02Z

official/utils/testing/golden.py

+
+class BaseTest(tf.test.TestCase):
+  """TestCase subclass for performing golden tests.
+  """


I think you can just wrap this into previous line, and ditto for all the comment below.

qlzh727 · 2018-03-23T20:48:25Z

official/utils/testing/golden.py

+  """TestCase subclass for performing golden tests.
+  """
+
+  def regenerate(self):


The doc above says this is a class method, whereas this is just a abstract instance method.

Good catch. Fixed.

qlzh727 · 2018-03-23T20:51:36Z

official/utils/testing/golden.py

+    """Convenience function for matrix testing.
+
+    Args:
+      input_array: Tensor (numpy array), from which key values are extracted.


then probably should just call it tensor?

It's tricky because it's not a tf.Tensor, it's the result of .eval() being called and is therefore a numpy array. I thought if I called it tensor it would imply it was an instance of tf.Tensor.

Just say numpy array and not tensor here in that case, which is assumed to mean tf.Tensor.

qlzh727 · 2018-03-23T21:33:43Z

official/utils/testing/golden.py

+        saver.restore(sess=sess, save_path=os.path.join(
+            data_dir, self.ckpt_prefix))
+        if differences:
+          print()


What are u trying to print here?

I don't want the warning to get lost in a stream of warnings, so I line break to make it more obvious.

Since print() and logging.warning might go to different output stream, I will suggest you to use tf.logging.warn for this new line as well, although an extra new line does not make the error message more obvious. Usually the error log is very long, and we will not go through all of them, so add some key words to make it easy for search might be a better idea.

I hadn't considered that they would go to different locations. I think I'll just drop it.

qlzh727 · 2018-03-23T21:34:03Z

official/utils/testing/golden.py

+      sess.run(init)
+      try:
+        saver.restore(sess=sess, save_path=os.path.join(
+            data_dir, self.ckpt_prefix))


I think this can fit in previous line.

Sadly no. 4 characters too long.

qlzh727 · 2018-03-23T21:35:00Z

official/utils/testing/golden.py

+            data_dir, self.ckpt_prefix))
+        if differences:
+          print()
+          warnings.warn(


Can u use tf.logging.warning?

Hey, yes! And it seems to be a proper subclass of warnings so the "did the warning fire" test still works.

qlzh727 · 2018-03-23T21:36:28Z

official/utils/testing/golden.py

+      ops = [op.eval() for op in ops_to_eval]
+      if correctness_function is not None:
+        results = correctness_function(*ops)
+        with open(os.path.join(data_dir, "results.json"), "rt") as f:


nit: I think 'r' is fine since "t" (text) mode is a default value.

I know it's not strictly necessary, but since I use "b" elsewhere in the file I like to be explicit.

qlzh727 · 2018-03-23T21:39:21Z

official/utils/testing/golden.py

+        dtypes into builtin dtypes.
+    """
+
+    ops_to_eval = [] if ops_to_eval is None else ops_to_eval


I think this can be simplified as ops_to_eval = ops_to_eval or []

karmel

Lots o' comments, but this is cool. Nicely done. What does the total size of the files end up being?

karmel · 2018-03-23T23:21:32Z

official/resnet/layer_test.py

+from official.utils.testing import golden  # pylint: disable=g-bad-import-order
+
+
+DATA_FORMAT = "channels_last"  # CPU instructions often preclude channels_last


The comment here is not true-- do you mean preclude channels_first?

Yes. Fixed.

karmel · 2018-03-23T23:24:18Z

official/resnet/layer_test.py

+      channels: The number of channels in the fake image.
+    """
+
+    name = "batch_size_{}__{}{}__version_{}__width_{}__channels_{}".format(


nit: consider dashes and underscores instead of single and double underscores for separation.

karmel · 2018-03-23T23:28:08Z

official/resnet/layer_test.py

+    self._resnet_block_ops(test=True, batch_size=32, bottleneck=False,
+                           projection=False, version=2, width=8, channels=4)
+
+  def regenerate(self):


Docstring me.

karmel · 2018-03-23T23:33:11Z

official/resnet/test_data/batch_norm/results.json

@@ -0,0 +1 @@
+[32, 16, 16, 3, 0.9722558259963989, 0.18413543701171875, 12374.20703125, 32, 16, 16, 3, 1.6126631498336792, -1.096894383430481, -0.041595458984375]


Proposal: test_data for all the models should live in one place, perhaps under utils/testing, rather than in the model directories, since most users don't care.

Discuss.

Pros:
Less complexity for users.
No longer need file magic in child classes.

Cons:
(sort of) violates the idea that tests live with the thing they test.

If you're fine with that separation I would definitely prefer that "hide the crimes" approach.

karmel · 2018-03-23T23:43:06Z

official/utils/testing/golden.py

@@ -0,0 +1,273 @@
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.


A more descriptive name for this module would be preferable. golden_data_tester, saved_data_tests... something like that.

I was under the impression that the convention is to eliminate redundancy in naming, hence testing.golden rather than something like testing.golden_test. Or have I misunderstood?

Then pick a name other than "golden," which is insufficiently descriptive. reference_data, or something similar. "Golden" can mean too many things.

karmel · 2018-03-24T03:39:15Z

official/utils/testing/golden_test.py

+import warnings
+
+import tensorflow as tf
+from official.utils.testing import golden  # pylint: disable=g-bad-import-order


Reminder: Build files will be needed for all of this.
nit: add the bad import line to the tf import, and then if there are other official imports, you won't have to add the flag for all.

Indeed. Also putting the bad-import on the tf line saved me a duplicate in layer_test.py. We certainly have our priorities in order!

karmel · 2018-03-24T03:39:32Z

official/utils/testing/golden_test.py

+
+class GoldenBaseTest(golden.BaseTest):
+  """Class to ensure that golden testing runs properly.
+  """


karmel · 2018-03-24T03:40:02Z

official/utils/testing/golden_test.py

+  """
+
+  @property
+  def file(self):


nit: bad habit to shadow builtin names. Also, see above on keeping the files in one place.

Yes, this is now obsolete.

karmel · 2018-03-24T03:42:04Z

official/utils/testing/golden.py

+
+  def _manage_ops(self, name, graph, ops_to_eval=None, test=True,
+                  correctness_function=None):
+    """Utility function to handle repeated work of graph checking and saving.


"manage" and "handle" are very vague; can you name this function something more descriptive? evaluate_or_construct_test_case, if that's what it's doing?

karmel · 2018-03-24T03:44:55Z

official/resnet/layer_test.py

+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+


Each file of this type should have a docstring explaining what failure means, under what condition files should be regenerated, and what the command line would be to do so.

robieta

Total test file size is ~800K.

Discussed tf.test.assert_equal_graph_def vs. pywrap_tensorflow.EqualGraphDefWrapper offline. If we use assert_equal_graph_def then the test can break due to changes in the implementation of TensorFlow. This way we know that the tests have to be updated at some point, but it isn't a test breaking issue. (Plus that way TF release can still run our tests, and if they break something it really is an issue.)

robieta · 2018-03-24T17:44:20Z

official/resnet/layer_test.py

+from official.utils.testing import golden  # pylint: disable=g-bad-import-order
+
+
+DATA_FORMAT = "channels_last"  # CPU instructions often preclude channels_last


Yes. Fixed.

robieta · 2018-03-24T17:44:35Z

official/resnet/layer_test.py

+      channels: The number of channels in the fake image.
+    """
+
+    name = "batch_size_{}__{}{}__version_{}__width_{}__channels_{}".format(


robieta · 2018-03-24T17:47:29Z

official/resnet/layer_test.py

+    self._resnet_block_ops(test=True, batch_size=32, bottleneck=False,
+                           projection=False, version=2, width=8, channels=4)
+
+  def regenerate(self):


robieta · 2018-03-24T17:51:42Z

official/resnet/test_data/batch_norm/results.json

@@ -0,0 +1 @@
+[32, 16, 16, 3, 0.9722558259963989, 0.18413543701171875, 12374.20703125, 32, 16, 16, 3, 1.6126631498336792, -1.096894383430481, -0.041595458984375]


Pros:
Less complexity for users.
No longer need file magic in child classes.

Cons:
(sort of) violates the idea that tests live with the thing they test.

If you're fine with that separation I would definitely prefer that "hide the crimes" approach.

robieta · 2018-03-24T17:56:22Z

official/utils/testing/golden.py

+# ==============================================================================
+"""TensorFlow testing subclass to automate numerical testing.
+
+  Golden tests determine when behavior deviates from some "gold standard", and


"Compositors―people who layout printed material with type―made the original rule that placed periods and commas inside quotation marks to protect the small metal pieces of type from breaking off the end of the sentence." TIL

robieta · 2018-03-26T16:57:30Z

official/utils/testing/golden_test.py

+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""This module tests generic behavior of golden tests.


Challenge accepted.

class TurtleTest(TurtleTest): """Test whether there is a turtle underneath.""" def __init__(self, *args, **kwargs): super(TurtleTest, self).__init__(*args, **kwargs) assert isinstance(self, TurtleTest)

The actual reason this is in here is that when writing the test class I called self.failureException() and only later realized it did nothing because I was supposed to raise it. So yeah... better to make sure that if things are broken it will actually detect them.

robieta · 2018-03-26T17:00:08Z

official/utils/testing/golden_test.py

+# ==============================================================================
+"""This module tests generic behavior of golden tests.
+
+  This test is not intended to test every layer of interest, and models should


"followed by the rest of the docstring starting at the same cursor position as the first quote of the first line." You are not wrong.

robieta · 2018-03-26T17:02:58Z

official/utils/testing/golden_test.py

+import warnings
+
+import tensorflow as tf
+from official.utils.testing import golden  # pylint: disable=g-bad-import-order


Indeed. Also putting the bad-import on the tf line saved me a duplicate in layer_test.py. We certainly have our priorities in order!

robieta · 2018-03-26T17:03:20Z

official/utils/testing/golden_test.py

+
+class GoldenBaseTest(golden.BaseTest):
+  """Class to ensure that golden testing runs properly.
+  """


robieta · 2018-03-26T17:09:05Z

official/utils/testing/golden_test.py

+  """
+
+  @property
+  def file(self):


Yes, this is now obsolete.

robieta · 2018-03-26T18:42:57Z

This got somewhat lost in the sea of responses, so repasting here:

Total test file size is ~800K.

Discussed tf.test.assert_equal_graph_def vs. pywrap_tensorflow.EqualGraphDefWrapper offline. If we use assert_equal_graph_def then the test can break due to changes in the implementation of TensorFlow. This way we know that the tests have to be updated at some point, but it isn't a test breaking issue. (Plus that way TF release can still run our tests, and if they break something it really is an issue.)

qlzh727 · 2018-03-26T20:44:33Z

official/resnet/layer_test.py

+  """Tests for core ResNet layers."""
+
+  @property
+  def my_name(self):


my_name is a weird function name, usually the caller will just call bestTestInstance.my_name, and since it already has the context of the instance, can we just call this "name" or "testName"?

I'll go with test_name. You're right, the "my" is odd and redundant.

qlzh727 · 2018-03-26T20:45:55Z

official/resnet/layer_test.py

+    """1D convolution with stride projector.
+
+    Args:
+      filters_out: Number of filters in the projection.


arg type is important in python as a weak type language.

I am going to defer this to a cleanup of all docstrings.

qlzh727 · 2018-03-26T20:46:53Z

official/resnet/layer_test.py

+
+    name = "batch-size-{}_{}{}_version-{}_width-{}_channels-{}".format(
+        batch_size, "bottleneck" if bottleneck else "building",
+        "_projection" if projection else "", version, width, channels


I would prefer one param per line for easy reading.

qlzh727 · 2018-03-26T20:54:51Z

official/utils/testing/golden.py

+        saver.restore(sess=sess, save_path=os.path.join(
+            data_dir, self.ckpt_prefix))
+        if differences:
+          print()


Since print() and logging.warning might go to different output stream, I will suggest you to use tf.logging.warn for this new line as well, although an extra new line does not make the error message more obvious. Usually the error log is very long, and we will not go through all of them, so add some key words to make it easy for search might be a better idea.

qlzh727 · 2018-03-26T20:59:01Z

official/utils/testing/golden.py

+            correctness_function=correctness_function
+        )
+      except:
+        tf.logging.error("Failed unittest {}".format(name))


In google3, we usually use %s as place holder and rely on default format to convert the message, eg ("Failed unittest %s", name). I guess your code is also fine, but just need to be sure for consistency.

Heh. I always want to nit people to replace all the %s with {}, but I fight the urge. I prefer the bracket style personally, but accept whatever everyone decides here.

robieta · 2018-03-26T22:25:53Z

I have addressed all comments.

karmel

Looking good. A few more requests.

karmel · 2018-03-27T01:20:47Z

official/resnet/layer_test.py

+  def test_batch_norm(self):
+    self._batch_norm_ops(test=True)
+
+  # Sadly python2 does not support "with self.subTest()"


No passive-aggressive commenting, please :)

I really want to be meta-passive aggressive right now.

karmel · 2018-03-27T01:21:53Z

official/resnet/layer_test.py

+      data_format: channels_first or channels_last
+
+    Returns:
+      A 1 wide CNN projector function.


1 wide? Can you clarify?

karmel · 2018-03-27T01:23:04Z

official/utils/testing/golden.py

@@ -0,0 +1,273 @@
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.


Then pick a name other than "golden," which is insufficiently descriptive. reference_data, or something similar. "Golden" can mean too many things.

karmel · 2018-03-27T01:49:14Z

official/utils/testing/golden_test.py

+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""This module tests generic behavior of golden tests.


+1 for turtles all the way down.

karmel · 2018-03-27T02:00:23Z

...sting/reference_data/ResNet/batch-size-32_building_version-1_width-8_channels-4/results.json

@@ -0,0 +1 @@
+[32, 8, 8, 4, 0.23128163814544678, 0.22117376327514648, 4100.51806640625, 32, 8, 8, 4, 0.9646798372268677, 0.16614516079425812, 5799.6708984375]


Why are the names camel cased? Seems preferable to just lowercase everything, rather than try to remember how to capitalize all these model names. I will point out that cifar10 is actually CIFAR-10, so, there's no hope if you try to stick with how things pretend to be spelled.

Hard to argue with nihilism. Lower case it is.

…to reference graphs, and apply golden tests to ResNet. update tests use more concise logic for path property delint add some comments delint address PR comments make resnet tests more concise, and supress warning test in py2 change resnet name template more shuffling of data dirs address PR comments and add tensorflow version info Remove subTest due to py2 switch from tf.__version__ to tf.VERSION, and include tf.GIT_VERSION supress lint error from json load unpack

robieta · 2018-03-27T17:48:47Z

I have addressed the additional requests.

robieta requested review from karmel, nealwu, qlzh727 and tfboyd March 23, 2018 20:33

robieta requested a review from k-w-w as a code owner March 23, 2018 20:33

googlebot added the cla: yes label Mar 23, 2018

robieta force-pushed the resnet_golden_tests branch from efbe554 to bff9c88 Compare March 23, 2018 21:01

qlzh727 requested changes Mar 23, 2018

View reviewed changes

robieta force-pushed the resnet_golden_tests branch from bff9c88 to 76101a3 Compare March 23, 2018 22:45

karmel suggested changes Mar 24, 2018

View reviewed changes

tensorflowbutler assigned benoitsteiner Mar 25, 2018

yifeif unassigned benoitsteiner Mar 26, 2018

robieta force-pushed the resnet_golden_tests branch from 76101a3 to 8f49279 Compare March 26, 2018 17:39

robieta commented Mar 26, 2018

View reviewed changes

robieta force-pushed the resnet_golden_tests branch from 8f49279 to 7960858 Compare March 26, 2018 18:38

robieta force-pushed the resnet_golden_tests branch 2 times, most recently from 53ccc78 to 6f5c979 Compare March 26, 2018 20:44

qlzh727 reviewed Mar 26, 2018

View reviewed changes

qlzh727 approved these changes Mar 26, 2018

View reviewed changes

robieta added the kokoro:force-run label Mar 26, 2018

kokoro-team removed the kokoro:force-run label Mar 26, 2018

karmel suggested changes Mar 27, 2018

View reviewed changes

Taylor Robie added 3 commits March 27, 2018 10:30

address PR comments

a700e61

address PR comments

a6d9c7f

robieta force-pushed the resnet_golden_tests branch from b62d7a3 to a6d9c7f Compare March 27, 2018 17:30

delint

92c74b9

karmel approved these changes Mar 27, 2018

View reviewed changes

robieta merged commit 587f579 into master Mar 27, 2018

robieta deleted the resnet_golden_tests branch March 27, 2018 19:51

		@@ -0,0 +1,206 @@
		# Copyright 2017 The TensorFlow Authors. All Rights Reserved.

		from official.utils.testing import golden # pylint: disable=g-bad-import-order


		DATA_FORMAT = "channels_last" # CPU instructions often preclude channels_last

		@@ -0,0 +1 @@
		[32, 16, 16, 3, 0.9722558259963989, 0.18413543701171875, 12374.20703125, 32, 16, 16, 3, 1.6126631498336792, -1.096894383430481, -0.041595458984375] No newline at end of file

		@@ -0,0 +1,273 @@
		# Copyright 2018 The TensorFlow Authors. All Rights Reserved.

		@@ -0,0 +1 @@
		[32, 8, 8, 4, 0.23128163814544678, 0.22117376327514648, 4100.51806640625, 32, 8, 8, 4, 0.9646798372268677, 0.16614516079425812, 5799.6708984375] No newline at end of file

Add golden tests to official. #3723

Add golden tests to official. #3723

Uh oh!

Conversation

robieta commented Mar 23, 2018

Uh oh!

qlzh727 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karmel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!