Skip to content

Conversation

robieta
Copy link
Contributor

@robieta robieta commented Mar 23, 2018

Added a simple subclass of tf.test.TestCase to allow specification of reference (gold standard) behavior and detect when layer definitions change. This test class is designed to be less brittle than the previous resnet_test.py by restoring weights instead of relying on tensorflow's RNG which can change with implementation.

For instance with respect to the recent batch norm change in TensorFlow, the test issues a warning (since the graph changed), but does not fail.

@robieta robieta requested review from karmel, nealwu, qlzh727 and tfboyd March 23, 2018 20:33
@robieta robieta requested a review from k-w-w as a code owner March 23, 2018 20:33
@robieta robieta force-pushed the resnet_golden_tests branch from efbe554 to bff9c88 Compare March 23, 2018 21:01
Copy link
Member

@qlzh727 qlzh727 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't check the test for golden yet, but some comment to address first.

@@ -0,0 +1,206 @@
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 2018. I hope we have some macro to generate this, and people don't have to copy paste from random places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I absolutely copy-paste. If there is a better way I'd love to know.

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are quite loose for import format. Usually one line break is enough between imports. The only place need 2 lines are between classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat. It did look quite awkward.


class BaseTest(tf.test.TestCase):
"""TestCase subclass for performing golden tests.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just wrap this into previous line, and ditto for all the comment below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

"""TestCase subclass for performing golden tests.
"""

def regenerate(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc above says this is a class method, whereas this is just a abstract instance method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed.

"""Convenience function for matrix testing.

Args:
input_array: Tensor (numpy array), from which key values are extracted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then probably should just call it tensor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's tricky because it's not a tf.Tensor, it's the result of .eval() being called and is therefore a numpy array. I thought if I called it tensor it would imply it was an instance of tf.Tensor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just say numpy array and not tensor here in that case, which is assumed to mean tf.Tensor.

saver.restore(sess=sess, save_path=os.path.join(
data_dir, self.ckpt_prefix))
if differences:
print()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are u trying to print here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want the warning to get lost in a stream of warnings, so I line break to make it more obvious.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since print() and logging.warning might go to different output stream, I will suggest you to use tf.logging.warn for this new line as well, although an extra new line does not make the error message more obvious. Usually the error log is very long, and we will not go through all of them, so add some key words to make it easy for search might be a better idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't considered that they would go to different locations. I think I'll just drop it.

sess.run(init)
try:
saver.restore(sess=sess, save_path=os.path.join(
data_dir, self.ckpt_prefix))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can fit in previous line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly no. 4 characters too long.

data_dir, self.ckpt_prefix))
if differences:
print()
warnings.warn(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can u use tf.logging.warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, yes! And it seems to be a proper subclass of warnings so the "did the warning fire" test still works.

ops = [op.eval() for op in ops_to_eval]
if correctness_function is not None:
results = correctness_function(*ops)
with open(os.path.join(data_dir, "results.json"), "rt") as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think 'r' is fine since "t" (text) mode is a default value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's not strictly necessary, but since I use "b" elsewhere in the file I like to be explicit.

dtypes into builtin dtypes.
"""

ops_to_eval = [] if ops_to_eval is None else ops_to_eval
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be simplified as ops_to_eval = ops_to_eval or []

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@robieta robieta force-pushed the resnet_golden_tests branch from bff9c88 to 76101a3 Compare March 23, 2018 22:45
Copy link
Contributor

@karmel karmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots o' comments, but this is cool. Nicely done. What does the total size of the files end up being?

from official.utils.testing import golden # pylint: disable=g-bad-import-order


DATA_FORMAT = "channels_last" # CPU instructions often preclude channels_last
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment here is not true-- do you mean preclude channels_first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Fixed.

channels: The number of channels in the fake image.
"""

name = "batch_size_{}__{}{}__version_{}__width_{}__channels_{}".format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider dashes and underscores instead of single and double underscores for separation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

self._resnet_block_ops(test=True, batch_size=32, bottleneck=False,
projection=False, version=2, width=8, channels=4)

def regenerate(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@@ -0,0 +1 @@
[32, 16, 16, 3, 0.9722558259963989, 0.18413543701171875, 12374.20703125, 32, 16, 16, 3, 1.6126631498336792, -1.096894383430481, -0.041595458984375] No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal: test_data for all the models should live in one place, perhaps under utils/testing, rather than in the model directories, since most users don't care.

Discuss.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pros:
Less complexity for users.
No longer need file magic in child classes.

Cons:
(sort of) violates the idea that tests live with the thing they test.

If you're fine with that separation I would definitely prefer that "hide the crimes" approach.

@@ -0,0 +1,273 @@
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more descriptive name for this module would be preferable. golden_data_tester, saved_data_tests... something like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that the convention is to eliminate redundancy in naming, hence testing.golden rather than something like testing.golden_test. Or have I misunderstood?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then pick a name other than "golden," which is insufficiently descriptive. reference_data, or something similar. "Golden" can mean too many things.

import warnings

import tensorflow as tf
from official.utils.testing import golden # pylint: disable=g-bad-import-order
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder: Build files will be needed for all of this.
nit: add the bad import line to the tf import, and then if there are other official imports, you won't have to add the flag for all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Also putting the bad-import on the tf line saved me a duplicate in layer_test.py. We certainly have our priorities in order!


class GoldenBaseTest(golden.BaseTest):
"""Class to ensure that golden testing runs properly.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewrap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"""

@property
def file(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: bad habit to shadow builtin names. Also, see above on keeping the files in one place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is now obsolete.


def _manage_ops(self, name, graph, ops_to_eval=None, test=True,
correctness_function=None):
"""Utility function to handle repeated work of graph checking and saving.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"manage" and "handle" are very vague; can you name this function something more descriptive? evaluate_or_construct_test_case, if that's what it's doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each file of this type should have a docstring explaining what failure means, under what condition files should be regenerated, and what the command line would be to do so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Copy link
Contributor Author

@robieta robieta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Total test file size is ~800K.

Discussed tf.test.assert_equal_graph_def vs. pywrap_tensorflow.EqualGraphDefWrapper offline. If we use assert_equal_graph_def then the test can break due to changes in the implementation of TensorFlow. This way we know that the tests have to be updated at some point, but it isn't a test breaking issue. (Plus that way TF release can still run our tests, and if they break something it really is an issue.)

from official.utils.testing import golden # pylint: disable=g-bad-import-order


DATA_FORMAT = "channels_last" # CPU instructions often preclude channels_last
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Fixed.

channels: The number of channels in the fake image.
"""

name = "batch_size_{}__{}{}__version_{}__width_{}__channels_{}".format(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

self._resnet_block_ops(test=True, batch_size=32, bottleneck=False,
projection=False, version=2, width=8, channels=4)

def regenerate(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@@ -0,0 +1 @@
[32, 16, 16, 3, 0.9722558259963989, 0.18413543701171875, 12374.20703125, 32, 16, 16, 3, 1.6126631498336792, -1.096894383430481, -0.041595458984375] No newline at end of file
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pros:
Less complexity for users.
No longer need file magic in child classes.

Cons:
(sort of) violates the idea that tests live with the thing they test.

If you're fine with that separation I would definitely prefer that "hide the crimes" approach.

# ==============================================================================
"""TensorFlow testing subclass to automate numerical testing.

Golden tests determine when behavior deviates from some "gold standard", and
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Compositors―people who layout printed material with type―made the original rule that placed periods and commas inside quotation marks to protect the small metal pieces of type from breaking off the end of the sentence." TIL

# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""This module tests generic behavior of golden tests.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Challenge accepted.

class TurtleTest(TurtleTest):
  """Test whether there is a turtle underneath."""

  def __init__(self, *args, **kwargs):
    super(TurtleTest, self).__init__(*args, **kwargs)
    assert isinstance(self, TurtleTest)

The actual reason this is in here is that when writing the test class I called self.failureException() and only later realized it did nothing because I was supposed to raise it. So yeah... better to make sure that if things are broken it will actually detect them.

# ==============================================================================
"""This module tests generic behavior of golden tests.

This test is not intended to test every layer of interest, and models should
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"followed by the rest of the docstring starting at the same cursor position as the first quote of the first line." You are not wrong.

import warnings

import tensorflow as tf
from official.utils.testing import golden # pylint: disable=g-bad-import-order
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Also putting the bad-import on the tf line saved me a duplicate in layer_test.py. We certainly have our priorities in order!


class GoldenBaseTest(golden.BaseTest):
"""Class to ensure that golden testing runs properly.
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"""

@property
def file(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is now obsolete.

@robieta robieta force-pushed the resnet_golden_tests branch from 8f49279 to 7960858 Compare March 26, 2018 18:38
@robieta
Copy link
Contributor Author

robieta commented Mar 26, 2018

This got somewhat lost in the sea of responses, so repasting here:

Total test file size is ~800K.

Discussed tf.test.assert_equal_graph_def vs. pywrap_tensorflow.EqualGraphDefWrapper offline. If we use assert_equal_graph_def then the test can break due to changes in the implementation of TensorFlow. This way we know that the tests have to be updated at some point, but it isn't a test breaking issue. (Plus that way TF release can still run our tests, and if they break something it really is an issue.)

@robieta robieta force-pushed the resnet_golden_tests branch 2 times, most recently from 53ccc78 to 6f5c979 Compare March 26, 2018 20:44
"""Tests for core ResNet layers."""

@property
def my_name(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my_name is a weird function name, usually the caller will just call bestTestInstance.my_name, and since it already has the context of the instance, can we just call this "name" or "testName"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go with test_name. You're right, the "my" is odd and redundant.

"""1D convolution with stride projector.

Args:
filters_out: Number of filters in the projection.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arg type is important in python as a weak type language.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to defer this to a cleanup of all docstrings.


name = "batch-size-{}_{}{}_version-{}_width-{}_channels-{}".format(
batch_size, "bottleneck" if bottleneck else "building",
"_projection" if projection else "", version, width, channels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer one param per line for easy reading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

saver.restore(sess=sess, save_path=os.path.join(
data_dir, self.ckpt_prefix))
if differences:
print()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since print() and logging.warning might go to different output stream, I will suggest you to use tf.logging.warn for this new line as well, although an extra new line does not make the error message more obvious. Usually the error log is very long, and we will not go through all of them, so add some key words to make it easy for search might be a better idea.

correctness_function=correctness_function
)
except:
tf.logging.error("Failed unittest {}".format(name))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In google3, we usually use %s as place holder and rely on default format to convert the message, eg ("Failed unittest %s", name). I guess your code is also fine, but just need to be sure for consistency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh. I always want to nit people to replace all the %s with {}, but I fight the urge. I prefer the bracket style personally, but accept whatever everyone decides here.

@robieta
Copy link
Contributor Author

robieta commented Mar 26, 2018

I have addressed all comments.

Copy link
Contributor

@karmel karmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. A few more requests.

def test_batch_norm(self):
self._batch_norm_ops(test=True)

# Sadly python2 does not support "with self.subTest()"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No passive-aggressive commenting, please :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really want to be meta-passive aggressive right now.

data_format: channels_first or channels_last

Returns:
A 1 wide CNN projector function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 wide? Can you clarify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified.

@@ -0,0 +1,273 @@
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then pick a name other than "golden," which is insufficiently descriptive. reference_data, or something similar. "Golden" can mean too many things.

# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""This module tests generic behavior of golden tests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for turtles all the way down.

@@ -0,0 +1 @@
[32, 8, 8, 4, 0.23128163814544678, 0.22117376327514648, 4100.51806640625, 32, 8, 8, 4, 0.9646798372268677, 0.16614516079425812, 5799.6708984375] No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the names camel cased? Seems preferable to just lowercase everything, rather than try to remember how to capitalize all these model names. I will point out that cifar10 is actually CIFAR-10, so, there's no hope if you try to stick with how things pretend to be spelled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to argue with nihilism. Lower case it is.

Taylor Robie added 3 commits March 27, 2018 10:30
…to reference graphs, and apply golden tests to ResNet.

update tests

use more concise logic for path property

delint

add some comments

delint

address PR comments

make resnet tests more concise, and supress warning test in py2

change resnet name template

more shuffling of data dirs

address PR comments and add tensorflow version info

Remove subTest due to py2

switch from tf.__version__ to tf.VERSION, and include tf.GIT_VERSION

supress lint error from json load unpack
@robieta robieta force-pushed the resnet_golden_tests branch from b62d7a3 to a6d9c7f Compare March 27, 2018 17:30
@robieta
Copy link
Contributor Author

robieta commented Mar 27, 2018

I have addressed the additional requests.

@robieta robieta merged commit 587f579 into master Mar 27, 2018
@robieta robieta deleted the resnet_golden_tests branch March 27, 2018 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants