Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opt out DEVICE_GPU_XLA_JIT and DEVICE_XLA_GPU from ResizeNearestNeigh… #31012

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 7 additions & 1 deletion tensorflow/compiler/jit/compilability_check_util.cc
Original file line number Diff line number Diff line change
Expand Up @@ -263,9 +263,15 @@ bool RecursiveCompilabilityChecker::OpIsInaccurate(const Node& node) const {
bool RecursiveCompilabilityChecker::OpIsSlow(const Node& node) const {
// b/128001705: SelfAdjointEigV2 and Svd performance issues.
// b/135640736: MatrixInverse performance issues.
// https://github.com/tensorflow/tensorflow/pull/31012:
// ResizeNearestNeighbor, ResizeBilinear, and ResizeBilinearGrad sometimes
// create convolutions too large for CuDNN to handle.
return node.type_string() == "SelfAdjointEigV2" ||
node.type_string() == "Svd" || node.type_string() == "Qr" ||
node.type_string() == "MatrixInverse";
node.type_string() == "MatrixInverse" ||
node.type_string() == "ResizeNearestNeighbor" ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally we file a bug internally with some details on why the op is slow. Since you can't file an internal bug do you mind adding some comments on why these ops are slow / undesirable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be precise, I am opting out these ops, no because they are slow, but because they create convolutions that CuDNN can't handle. I looked around for lists similar to OpIsSlow, but didn't find anything more proper. Please let me if there is a better way, or whether to create another list.

I have explained how the error is triggered in my first comment in this PR. I am attaching a python code reproducing the error. You can switch on/off XLA to observe the difference. Hopefully it is sufficient for a bug description.

import tensorflow as tf
import numpy as np

image = tf.placeholder(tf.float32, shape=[16, 256, 256, 16], name='image')
resize_nearest_neighbor = tf.image.resize_images(image, size=[512,512], method=tf.image.ResizeMethod.NEAREST_NEIGHBOR, align_corners=True)
feed_dict={image: np.random.random_sample([16, 256, 256, 16])}

sess = tf.Session()
with sess.as_default():
actual_resize_nearest_neighbor = resize_nearest_neighbor.eval(feed_dict)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant comments in the code.

Although I guess just liking to this discussion thread would be fine too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanjoy, I have added comments that link to this thread.
BTW, I suppose that users can switch cluster/allow_slow_ops at run-time. Can you show me how to do that?
P.S. I will make merge to push forward #30336, as I don't change image_resize_ops.cc. I will need your help to review it again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanjoy, I have added comments that link to this thread.

Thanks!

BTW, I suppose that users can switch cluster/allow_slow_ops at run-time. Can you show me how to do that?

TF users can't change these options directly. These bits are mainly used to control the behavior of RecursiveCompilabilityChecker by the various clients of the class (none of which are directly user facing).

P.S. I will make merge to push forward #30336, as I don't change image_resize_ops.cc. I will need your help to review it again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any further comment for me to address? If not, please approve the PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already approved it the last time, but it looks like a CI build failed. I'll re-trigger the CI.

node.type_string() == "ResizeBilinear" ||
node.type_string() == "ResizeBilinearGrad";
}

bool RecursiveCompilabilityChecker::IsCompilableNode(
Expand Down