-
Notifications
You must be signed in to change notification settings - Fork 74.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU-deterministic tf.image.resize (bilinear) #39243
GPU-deterministic tf.image.resize (bilinear) #39243
Conversation
tf.image.resize
with method=ResizeMethod.BILINEAR
tf.image.resize
with method=ResizeMethod.BILINEAR
const int b = idx / original_height; | ||
|
||
int in_y_start = max(0, __float2int_ru( | ||
(out_y_center - 1 + 0.5) * inverse_height_scale - 0.5)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this would promote to double?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. The whole argument to __float2int_ru()
will end up being double. This issue happens on the next line of code as well, and then of course for the x dimension versions. I'm considering changing every instance of 0.5
to 0.5f
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most recent commit replaces the repeated magic number (0.5
) with the offset
variable (of type float
).
@@ -338,6 +389,55 @@ __global__ void LegacyResizeBilinearGradKernel( | |||
} | |||
} | |||
|
|||
template <typename T> | |||
__global__ void LegacyResizeBilinearDeterministicGradKernel( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same kernel as ResizeBilinearDeterministicGradKernel
, except in_x/y_start? If so, would it make sense to share the kernel code and pass in the pixel offset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the same code, except for the calculations of in_y_start
, out_y_start
, in_x_start
, and out_y_start
. I followed the existing pattern of replicating the code, as set by ResizeBilinearGradKernel
and LegacyResizeBilinearGradKernel
.
What you're suggesting is an improvement. I'm considering changing it for those existing, non-deterministic kernels as well. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just merged the legacy and non-legacy deterministic kernels (now running tests locally) and note that this makes the legacy kernel slightly slower than before, not only because of having to add the zero offsets, but also because the legacy kernel skipped an unnecessary clamping operation. That clamping operation always runs in the merged kernel. I still think it's better to have one kernel.
Further re-factoring, if its done, will be submitted in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit pushed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also just looked into combining the existing, non-deterministic back-prop kernels into a single kernel with an offset parameter. It think it's doable and probably worth it. I won't submit another pull request for that right now because it will collide with this current pull request.
I think it's also possible to combine two of the forward kernels into a single kernel with an offset parameter. That could probably be done in the same future pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Duncan! I only have some minor comments.
Thanks Christian! I'm working on some changes to address your comments. |
@chsigg Could you take a look? |
@chsigg Can you please review this PR ? Thanks! |
@chsigg Can you please review this PR ? Thanks! |
@duncanriach Sorry for the delay, can you please resolve conflicts? Thanks! |
Thanks for pushing, @gbaned. I'll be on vacation for a week. When I get back, I'll resolve the conflicts. |
It has been 14 days with no activity and the |
Yes, this is being worked on. Please leave open. |
51804c1
to
294065e
Compare
Hi @gbaned, The conflicts have been resolved and tested locally. I attempted to capture goodness from this conflicting commit that my changes obliterate. Please will you now move this towards merge? |
@chsigg can you finish reviewing? |
This pull request extends the deterministic functionality that is enabled by
TF_DETERMINISTIC_OPS
.Prior to the changes delivered by this pull-request, the CUDA back-prop kernels for
tf.image.resize
withmethod=ResizeMethod.BILINEAR
introduced uncontrollable noise.After application of this pull request, setting the environment variable
TF_DETERMINISTIC_OPS
to "true" or "1" selects a new, deterministic CUDA back-prop kernel for this op.The exact performance of the deterministic kernels relative to the pre-existing, non-deterministic kernels has not yet been tested. However, the performance is expected to be similar for pass-through (one-to-one re-sampling) and also for down-sampling (output has less pixels than input). The performance should degrade linearly in each dimension (horizontal and vertical) as that dimension is up-sampled (output has more pixels than input). For example, the back-prop for a 1:2 deterministic up-sample in one dimension could take up to twice as long. The slowdown is also proportional to the product of the up-sampling in the two dimensions. For example, the back-prop for a 1:2 deterministic up-sample in both dimensions could take up to four times (2*2) as long.
This pull request also significantly improves the test coverage of the existing, non-deterministic functionality of
method=ResizeMethod.BILINEAR
, and fixes one of the tests formethod=ResizeMethod.BICUBIC
.