Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[determinism] Add GPU excepts, CPU d9m, and tests to crop_and_resize #48905

Conversation

duncanriach
Copy link
Contributor

This current PR adds the following functionality, plus associated tests:

When TF_DETERMINISTIC_OPS is set to "true" or "1" (when op-determinism is expected),

  1. the gradient w.r.t image of tf.image.crop_and_resize when running on CPU will be bit-exactly reproducible (from run-to-run), and
  2. an attempt to use the GPU kernels for gradient w.r.t either image or boxes of tf.image.crop_and_resize will cause a tf.errors.UnimplementedError to be thrown along with an understandable message.

This current PR is associated with RFC: Enabling Determinism in TensorFlow.

cc @sanjoy, @reedwm, @nluehr

@google-ml-butler google-ml-butler bot added the size:L CL Change Size: Large label May 4, 2021
@google-cla google-cla bot added the cla: yes label May 4, 2021
@duncanriach duncanriach force-pushed the crop-and-resize-nond9m-exceptions branch from 1a56261 to 49f0685 Compare May 4, 2021 23:46
@reedwm reedwm self-requested a review May 5, 2021 00:42
@gbaned gbaned self-assigned this May 5, 2021
@gbaned gbaned added the comp:core issues related to core part of tensorflow label May 5, 2021
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation May 5, 2021
@@ -1,4 +1,4 @@
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
# Copyright 2020, 2021 The TensorFlow Authors. All Rights Reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to this doc, copyright years should never be updated. From the link:

the year the file was created in full numeric form (e.g., 2010); don’t change this if you edit, move, or copy the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR does not change the year that the file was created.

See

When making changes to code with an existing copyright notice: ... Optionally, add current copyright year.

The upcoming commit replaces "2020, 2021" with "2020-2021".

err1 = gradient_checker_v2.max_error(
*gradient_checker_v2.compute_gradient(
if (self.__class__.__name__ ==
"CropAndResizeOpDeterministicTest" and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking for the class name, I would check if os.environ['TF_DETERMINISTIC_OPS'] is "1".

Also, maybe just disable this test when determinism is used, since you already test the backprop error messages elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to do it this way to document the provenance of this conditional branch (i.e. in a child class) and to encapsulate the mechanism by which op-determinism is enabled. However, I don't see any major downsides to changing this, specifically:

  1. if this conditional branch is removed by someone who doesn't understand why it's there or why it's needed then tests will fail and the person who made the change will be alerted to the error, and
  2. when TF_DETERMINISTIC_OPS is replaced with tf.config.*deterministic*, and TF_DETERMINISTIC_OPS is no longer set to '1' in the child class, then this test will fail because the exceptions will no longer be thrown.

Also, maybe just disable this test when determinism is used, since you already test the backprop error messages elsewhere.

Testing that the functionality in this conditional branch is (mostly) correct (not testing the error message) will ensure that when the exceptions are removed and deterministic GPU backprop functionality is added, this test will be modified to test that functionality. If the gradient tests are just skipped when determinism is expected then there is a higher probability that the deterministic GPU backprop functionality will (accidentally) not be tested.

The upcoming commit changes the condition to os.getenv('TF_DETERMINISTIC_OPS', '0') == '1'.

crop_size = constant_op.constant([3, 3], dtype=dtypes.int32)
return image, boxes, box_indices, crop_size

def _imageErrorMessage(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a class constant:

 _IMAGE_ERROR_MESSAGE = ("Deterministic GPU implementation of" +
                         " CropAndResizeBackpropImage not available")

And same for _boxesErrorMessage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the simplification suggestion.

I originally put these messages in the class-level with the intention of using them in the parent class, but I ended up deciding to not test the error messages in the parent class; there it seemed sufficient to simply check that the correct exception type was being thrown. The upcoming commit simplifies this so that there is nothing at all at the class level related to the exception messages.

Comment on lines 145 to 147
This test assumes that the base op test runs all the same test cases when
deterministic ops are not enabled and will therefore detect erroneous
exception throwing in those cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand this comment. What is the "base op test"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upcoming commit changes "base op test" to "test_base.CropAndResizeOpTestBase" in this particular instance. It also makes various similar clarifications elsewhere in image_grad_deterministic_test.py, xent_op_deterministic_test.py, and sparse_xent_op_deterministic_test.py.

PR Queue automation moved this from Assigned Reviewer to Reviewer Requested Changes May 5, 2021
@duncanriach duncanriach requested a review from reedwm May 6, 2021 03:29
PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer May 6, 2021
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels May 6, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label May 6, 2021
@copybara-service copybara-service bot merged commit 7317832 into tensorflow:master May 13, 2021
PR Queue automation moved this from Approved by Reviewer to Merged May 13, 2021
@duncanriach duncanriach deleted the crop-and-resize-nond9m-exceptions branch June 7, 2021 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes comp:core issues related to core part of tensorflow ready to pull PR ready for merge process size:L CL Change Size: Large
Projects
PR Queue
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet

4 participants