-
Notifications
You must be signed in to change notification settings - Fork 45.5k
Add golden tests to official. #3723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
efbe554
to
bff9c88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't check the test for golden yet, but some comment to address first.
official/resnet/layer_test.py
Outdated
@@ -0,0 +1,206 @@ | |||
# Copyright 2017 The TensorFlow Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 2018. I hope we have some macro to generate this, and people don't have to copy paste from random places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I absolutely copy-paste. If there is a better way I'd love to know.
from __future__ import absolute_import | ||
from __future__ import division | ||
from __future__ import print_function | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are quite loose for import format. Usually one line break is enough between imports. The only place need 2 lines are between classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat. It did look quite awkward.
official/utils/testing/golden.py
Outdated
|
||
class BaseTest(tf.test.TestCase): | ||
"""TestCase subclass for performing golden tests. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just wrap this into previous line, and ditto for all the comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed.
"""TestCase subclass for performing golden tests. | ||
""" | ||
|
||
def regenerate(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc above says this is a class method, whereas this is just a abstract instance method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Fixed.
official/utils/testing/golden.py
Outdated
"""Convenience function for matrix testing. | ||
|
||
Args: | ||
input_array: Tensor (numpy array), from which key values are extracted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then probably should just call it tensor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's tricky because it's not a tf.Tensor, it's the result of .eval() being called and is therefore a numpy array. I thought if I called it tensor it would imply it was an instance of tf.Tensor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just say numpy array and not tensor here in that case, which is assumed to mean tf.Tensor.
official/utils/testing/golden.py
Outdated
saver.restore(sess=sess, save_path=os.path.join( | ||
data_dir, self.ckpt_prefix)) | ||
if differences: | ||
print() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are u trying to print here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want the warning to get lost in a stream of warnings, so I line break to make it more obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since print() and logging.warning might go to different output stream, I will suggest you to use tf.logging.warn for this new line as well, although an extra new line does not make the error message more obvious. Usually the error log is very long, and we will not go through all of them, so add some key words to make it easy for search might be a better idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't considered that they would go to different locations. I think I'll just drop it.
sess.run(init) | ||
try: | ||
saver.restore(sess=sess, save_path=os.path.join( | ||
data_dir, self.ckpt_prefix)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can fit in previous line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly no. 4 characters too long.
official/utils/testing/golden.py
Outdated
data_dir, self.ckpt_prefix)) | ||
if differences: | ||
print() | ||
warnings.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can u use tf.logging.warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, yes! And it seems to be a proper subclass of warnings so the "did the warning fire" test still works.
ops = [op.eval() for op in ops_to_eval] | ||
if correctness_function is not None: | ||
results = correctness_function(*ops) | ||
with open(os.path.join(data_dir, "results.json"), "rt") as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think 'r' is fine since "t" (text) mode is a default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's not strictly necessary, but since I use "b" elsewhere in the file I like to be explicit.
official/utils/testing/golden.py
Outdated
dtypes into builtin dtypes. | ||
""" | ||
|
||
ops_to_eval = [] if ops_to_eval is None else ops_to_eval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be simplified as ops_to_eval = ops_to_eval or []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
bff9c88
to
76101a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots o' comments, but this is cool. Nicely done. What does the total size of the files end up being?
official/resnet/layer_test.py
Outdated
from official.utils.testing import golden # pylint: disable=g-bad-import-order | ||
|
||
|
||
DATA_FORMAT = "channels_last" # CPU instructions often preclude channels_last |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment here is not true-- do you mean preclude channels_first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Fixed.
official/resnet/layer_test.py
Outdated
channels: The number of channels in the fake image. | ||
""" | ||
|
||
name = "batch_size_{}__{}{}__version_{}__width_{}__channels_{}".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider dashes and underscores instead of single and double underscores for separation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
official/resnet/layer_test.py
Outdated
self._resnet_block_ops(test=True, batch_size=32, bottleneck=False, | ||
projection=False, version=2, width=8, channels=4) | ||
|
||
def regenerate(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
@@ -0,0 +1 @@ | |||
[32, 16, 16, 3, 0.9722558259963989, 0.18413543701171875, 12374.20703125, 32, 16, 16, 3, 1.6126631498336792, -1.096894383430481, -0.041595458984375] No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposal: test_data for all the models should live in one place, perhaps under utils/testing, rather than in the model directories, since most users don't care.
Discuss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pros:
Less complexity for users.
No longer need file magic in child classes.
Cons:
(sort of) violates the idea that tests live with the thing they test.
If you're fine with that separation I would definitely prefer that "hide the crimes" approach.
@@ -0,0 +1,273 @@ | |||
# Copyright 2018 The TensorFlow Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more descriptive name for this module would be preferable. golden_data_tester, saved_data_tests... something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was under the impression that the convention is to eliminate redundancy in naming, hence testing.golden
rather than something like testing.golden_test
. Or have I misunderstood?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then pick a name other than "golden," which is insufficiently descriptive. reference_data, or something similar. "Golden" can mean too many things.
import warnings | ||
|
||
import tensorflow as tf | ||
from official.utils.testing import golden # pylint: disable=g-bad-import-order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder: Build files will be needed for all of this.
nit: add the bad import line to the tf import, and then if there are other official imports, you won't have to add the flag for all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Also putting the bad-import on the tf line saved me a duplicate in layer_test.py. We certainly have our priorities in order!
|
||
class GoldenBaseTest(golden.BaseTest): | ||
"""Class to ensure that golden testing runs properly. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rewrap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
""" | ||
|
||
@property | ||
def file(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: bad habit to shadow builtin names. Also, see above on keeping the files in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is now obsolete.
official/utils/testing/golden.py
Outdated
|
||
def _manage_ops(self, name, graph, ops_to_eval=None, test=True, | ||
correctness_function=None): | ||
"""Utility function to handle repeated work of graph checking and saving. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"manage" and "handle" are very vague; can you name this function something more descriptive? evaluate_or_construct_test_case, if that's what it's doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# ============================================================================== | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each file of this type should have a docstring explaining what failure means, under what condition files should be regenerated, and what the command line would be to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
76101a3
to
8f49279
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Total test file size is ~800K.
Discussed tf.test.assert_equal_graph_def vs. pywrap_tensorflow.EqualGraphDefWrapper offline. If we use assert_equal_graph_def then the test can break due to changes in the implementation of TensorFlow. This way we know that the tests have to be updated at some point, but it isn't a test breaking issue. (Plus that way TF release can still run our tests, and if they break something it really is an issue.)
official/resnet/layer_test.py
Outdated
from official.utils.testing import golden # pylint: disable=g-bad-import-order | ||
|
||
|
||
DATA_FORMAT = "channels_last" # CPU instructions often preclude channels_last |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Fixed.
official/resnet/layer_test.py
Outdated
channels: The number of channels in the fake image. | ||
""" | ||
|
||
name = "batch_size_{}__{}{}__version_{}__width_{}__channels_{}".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
official/resnet/layer_test.py
Outdated
self._resnet_block_ops(test=True, batch_size=32, bottleneck=False, | ||
projection=False, version=2, width=8, channels=4) | ||
|
||
def regenerate(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
@@ -0,0 +1 @@ | |||
[32, 16, 16, 3, 0.9722558259963989, 0.18413543701171875, 12374.20703125, 32, 16, 16, 3, 1.6126631498336792, -1.096894383430481, -0.041595458984375] No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pros:
Less complexity for users.
No longer need file magic in child classes.
Cons:
(sort of) violates the idea that tests live with the thing they test.
If you're fine with that separation I would definitely prefer that "hide the crimes" approach.
official/utils/testing/golden.py
Outdated
# ============================================================================== | ||
"""TensorFlow testing subclass to automate numerical testing. | ||
|
||
Golden tests determine when behavior deviates from some "gold standard", and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Compositors―people who layout printed material with type―made the original rule that placed periods and commas inside quotation marks to protect the small metal pieces of type from breaking off the end of the sentence." TIL
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# ============================================================================== | ||
"""This module tests generic behavior of golden tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Challenge accepted.
class TurtleTest(TurtleTest):
"""Test whether there is a turtle underneath."""
def __init__(self, *args, **kwargs):
super(TurtleTest, self).__init__(*args, **kwargs)
assert isinstance(self, TurtleTest)
The actual reason this is in here is that when writing the test class I called self.failureException() and only later realized it did nothing because I was supposed to raise it. So yeah... better to make sure that if things are broken it will actually detect them.
# ============================================================================== | ||
"""This module tests generic behavior of golden tests. | ||
|
||
This test is not intended to test every layer of interest, and models should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"followed by the rest of the docstring starting at the same cursor position as the first quote of the first line." You are not wrong.
import warnings | ||
|
||
import tensorflow as tf | ||
from official.utils.testing import golden # pylint: disable=g-bad-import-order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Also putting the bad-import on the tf line saved me a duplicate in layer_test.py. We certainly have our priorities in order!
|
||
class GoldenBaseTest(golden.BaseTest): | ||
"""Class to ensure that golden testing runs properly. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
""" | ||
|
||
@property | ||
def file(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is now obsolete.
8f49279
to
7960858
Compare
This got somewhat lost in the sea of responses, so repasting here: Total test file size is ~800K. Discussed tf.test.assert_equal_graph_def vs. pywrap_tensorflow.EqualGraphDefWrapper offline. If we use assert_equal_graph_def then the test can break due to changes in the implementation of TensorFlow. This way we know that the tests have to be updated at some point, but it isn't a test breaking issue. (Plus that way TF release can still run our tests, and if they break something it really is an issue.) |
53ccc78
to
6f5c979
Compare
official/resnet/layer_test.py
Outdated
"""Tests for core ResNet layers.""" | ||
|
||
@property | ||
def my_name(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my_name is a weird function name, usually the caller will just call bestTestInstance.my_name, and since it already has the context of the instance, can we just call this "name" or "testName"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll go with test_name. You're right, the "my" is odd and redundant.
"""1D convolution with stride projector. | ||
|
||
Args: | ||
filters_out: Number of filters in the projection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arg type is important in python as a weak type language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to defer this to a cleanup of all docstrings.
official/resnet/layer_test.py
Outdated
|
||
name = "batch-size-{}_{}{}_version-{}_width-{}_channels-{}".format( | ||
batch_size, "bottleneck" if bottleneck else "building", | ||
"_projection" if projection else "", version, width, channels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer one param per line for easy reading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
official/utils/testing/golden.py
Outdated
saver.restore(sess=sess, save_path=os.path.join( | ||
data_dir, self.ckpt_prefix)) | ||
if differences: | ||
print() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since print() and logging.warning might go to different output stream, I will suggest you to use tf.logging.warn for this new line as well, although an extra new line does not make the error message more obvious. Usually the error log is very long, and we will not go through all of them, so add some key words to make it easy for search might be a better idea.
correctness_function=correctness_function | ||
) | ||
except: | ||
tf.logging.error("Failed unittest {}".format(name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In google3, we usually use %s as place holder and rely on default format to convert the message, eg ("Failed unittest %s", name). I guess your code is also fine, but just need to be sure for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh. I always want to nit people to replace all the %s
with {}
, but I fight the urge. I prefer the bracket style personally, but accept whatever everyone decides here.
I have addressed all comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. A few more requests.
official/resnet/layer_test.py
Outdated
def test_batch_norm(self): | ||
self._batch_norm_ops(test=True) | ||
|
||
# Sadly python2 does not support "with self.subTest()" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No passive-aggressive commenting, please :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really want to be meta-passive aggressive right now.
official/resnet/layer_test.py
Outdated
data_format: channels_first or channels_last | ||
|
||
Returns: | ||
A 1 wide CNN projector function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 wide? Can you clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified.
@@ -0,0 +1,273 @@ | |||
# Copyright 2018 The TensorFlow Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then pick a name other than "golden," which is insufficiently descriptive. reference_data, or something similar. "Golden" can mean too many things.
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# ============================================================================== | ||
"""This module tests generic behavior of golden tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for turtles all the way down.
@@ -0,0 +1 @@ | |||
[32, 8, 8, 4, 0.23128163814544678, 0.22117376327514648, 4100.51806640625, 32, 8, 8, 4, 0.9646798372268677, 0.16614516079425812, 5799.6708984375] No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are the names camel cased? Seems preferable to just lowercase everything, rather than try to remember how to capitalize all these model names. I will point out that cifar10 is actually CIFAR-10, so, there's no hope if you try to stick with how things pretend to be spelled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard to argue with nihilism. Lower case it is.
…to reference graphs, and apply golden tests to ResNet. update tests use more concise logic for path property delint add some comments delint address PR comments make resnet tests more concise, and supress warning test in py2 change resnet name template more shuffling of data dirs address PR comments and add tensorflow version info Remove subTest due to py2 switch from tf.__version__ to tf.VERSION, and include tf.GIT_VERSION supress lint error from json load unpack
b62d7a3
to
a6d9c7f
Compare
I have addressed the additional requests. |
Added a simple subclass of tf.test.TestCase to allow specification of reference (gold standard) behavior and detect when layer definitions change. This test class is designed to be less brittle than the previous resnet_test.py by restoring weights instead of relying on tensorflow's RNG which can change with implementation.
For instance with respect to the recent batch norm change in TensorFlow, the test issues a warning (since the graph changed), but does not fail.