GPU-deterministic tf.image.resize (bilinear) #39243

duncanriach · 2020-05-07T04:10:18Z

This pull request extends the deterministic functionality that is enabled by TF_DETERMINISTIC_OPS.

Prior to the changes delivered by this pull-request, the CUDA back-prop kernels for tf.image.resize with method=ResizeMethod.BILINEAR introduced uncontrollable noise.

After application of this pull request, setting the environment variable TF_DETERMINISTIC_OPS to "true" or "1" selects a new, deterministic CUDA back-prop kernel for this op.

The exact performance of the deterministic kernels relative to the pre-existing, non-deterministic kernels has not yet been tested. However, the performance is expected to be similar for pass-through (one-to-one re-sampling) and also for down-sampling (output has less pixels than input). The performance should degrade linearly in each dimension (horizontal and vertical) as that dimension is up-sampled (output has more pixels than input). For example, the back-prop for a 1:2 deterministic up-sample in one dimension could take up to twice as long. The slowdown is also proportional to the product of the up-sampling in the two dimensions. For example, the back-prop for a 1:2 deterministic up-sample in both dimensions could take up to four times (2*2) as long.

This pull request also significantly improves the test coverage of the existing, non-deterministic functionality of method=ResizeMethod.BILINEAR, and fixes one of the tests for method=ResizeMethod.BICUBIC.

chsigg · 2020-05-11T15:31:13Z

tensorflow/core/kernels/resize_bilinear_op_gpu.cu.cc

+    const int b = idx / original_height;
+
+    int in_y_start = max(0, __float2int_ru(
+        (out_y_center - 1 + 0.5) * inverse_height_scale - 0.5));


It looks like this would promote to double?

Good point. The whole argument to __float2int_ru() will end up being double. This issue happens on the next line of code as well, and then of course for the x dimension versions. I'm considering changing every instance of 0.5 to 0.5f.

The most recent commit replaces the repeated magic number (0.5) with the offset variable (of type float).

chsigg · 2020-05-11T15:36:19Z

tensorflow/core/kernels/resize_bilinear_op_gpu.cu.cc

@@ -338,6 +389,55 @@ __global__ void LegacyResizeBilinearGradKernel(
  }
 }

+template <typename T>
+__global__ void LegacyResizeBilinearDeterministicGradKernel(


Is this the same kernel as ResizeBilinearDeterministicGradKernel, except in_x/y_start? If so, would it make sense to share the kernel code and pass in the pixel offset?

Yes, this is the same code, except for the calculations of in_y_start, out_y_start, in_x_start, and out_y_start. I followed the existing pattern of replicating the code, as set by ResizeBilinearGradKernel and LegacyResizeBilinearGradKernel.

What you're suggesting is an improvement. I'm considering changing it for those existing, non-deterministic kernels as well. What do you think?

I just merged the legacy and non-legacy deterministic kernels (now running tests locally) and note that this makes the legacy kernel slightly slower than before, not only because of having to add the zero offsets, but also because the legacy kernel skipped an unnecessary clamping operation. That clamping operation always runs in the merged kernel. I still think it's better to have one kernel.

Further re-factoring, if its done, will be submitted in a separate PR.

Commit pushed.

I also just looked into combining the existing, non-deterministic back-prop kernels into a single kernel with an offset parameter. It think it's doable and probably worth it. I won't submit another pull request for that right now because it will collide with this current pull request.

I think it's also possible to combine two of the forward kernels into a single kernel with an offset parameter. That could probably be done in the same future pull request.

chsigg

Thanks Duncan! I only have some minor comments.

duncanriach · 2020-05-12T00:51:29Z

Thanks Duncan! I only have some minor comments.

Thanks Christian! I'm working on some changes to address your comments.

cheshire · 2020-06-08T17:08:42Z

@chsigg Could you take a look?

gbaned · 2020-07-03T16:18:53Z

@chsigg Can you please review this PR ? Thanks!

gbaned · 2020-07-24T18:37:14Z

@chsigg Can you please review this PR ? Thanks!

duncanriach · 2020-08-11T19:50:33Z

Hi @gbaned, I addressed @chsigg's original review comments the same day he made them (2020-05-11) and he seems to have not been able to re-review for three months. Is there someone else who could be assigned to review?

cheshire · 2020-08-11T19:57:53Z

@reedwm

gbaned · 2020-08-13T13:44:39Z

@duncanriach Sorry for the delay, can you please resolve conflicts? Thanks!

duncanriach · 2020-08-15T06:21:15Z

Thanks for pushing, @gbaned. I'll be on vacation for a week. When I get back, I'll resolve the conflicts.

tensorflowbutler · 2020-09-01T01:14:53Z

It has been 14 days with no activity and the awaiting response label was assigned. Is this PR still valid? Assigning the stalled label. Please comment to reassure me that this is still being worked on.

duncanriach · 2020-09-01T01:15:52Z

Yes, this is being worked on. Please leave open.

…-prop kernels

duncanriach · 2020-09-02T01:19:22Z

Hi @gbaned,

The conflicts have been resolved and tested locally. I attempted to capture goodness from this conflicting commit that my changes obliterate.

Please will you now move this towards merge?

reedwm · 2020-09-04T00:54:30Z

@chsigg can you finish reviewing?

duncanriach · 2020-09-22T21:02:36Z

Thank you, @chsigg, @gbaned, @cheshire, and @reedwm.

google-ml-butler bot added the size:XL CL Change Size:Extra Large label May 7, 2020

googlebot added the cla: yes label May 7, 2020

duncanriach changed the title ~~GPU-deterministic tf.image.resize(method=ResizeMethod.BILINEAR)~~ GPU-deterministic tf.image.resize with method=ResizeMethod.BILINEAR May 7, 2020

duncanriach changed the title ~~GPU-deterministic tf.image.resize with method=ResizeMethod.BILINEAR~~ GPU-deterministic tf.image.resize with method=ResizeMethod.BILINEAR May 7, 2020

duncanriach changed the title ~~GPU-deterministic tf.image.resize with method=ResizeMethod.BILINEAR~~ GPU-deterministic tf.image.resize (BILINEAR) May 7, 2020

gbaned self-assigned this May 7, 2020

gbaned requested a review from chsigg May 7, 2020 04:14

duncanriach changed the title ~~GPU-deterministic tf.image.resize (BILINEAR)~~ GPU-deterministic tf.image.resize (bilinear) May 7, 2020

chsigg reviewed May 11, 2020

View reviewed changes

duncanriach requested a review from chsigg May 12, 2020 03:12

gbaned added the awaiting review Pull request awaiting review label May 15, 2020

gbaned requested a review from cheshire June 8, 2020 17:01

tensorflowbutler removed the awaiting review Pull request awaiting review label Jun 24, 2020

gbaned added the awaiting review Pull request awaiting review label Jun 26, 2020

emiliocoutinho mentioned this pull request Jul 20, 2020

Running on stock TensorFlow version >= 2.1 NVIDIA/framework-reproducibility#22

Closed

duncanriach mentioned this pull request Jul 21, 2020

non-determinism on tf.keras.layers.UpSampling2D(..., interpolation='nearest') NVIDIA/framework-reproducibility#24

Closed

duncanriach mentioned this pull request Aug 4, 2020

Add deterministic tf.image.crop_and_resize backprop #42033

Closed

tensorflowbutler removed the awaiting review Pull request awaiting review label Aug 13, 2020

gbaned requested a review from reedwm August 13, 2020 13:41

gbaned added the stat:awaiting response Status - Awaiting response from author label Aug 17, 2020

tensorflowbutler added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Sep 1, 2020

google-ml-butler bot removed the stale This label marks the issue/pr stale - to be closed automatically if no activity label Sep 1, 2020

duncanriach added 4 commits August 31, 2020 21:36

Move image_grad_test.py to image_grad_test_base.py

3f532b9

Fix test that would have passed in presence of bug

038edfb

Add deterministic mode for resize_bilinear back-prop

116db32

Combine legacy and non-legacy deterministic resize_bilinear CUDA back…

294065e

…-prop kernels

duncanriach force-pushed the deterministic-image-resize-bilinear branch from 51804c1 to 294065e Compare September 2, 2020 01:13

gbaned removed the stat:awaiting response Status - Awaiting response from author label Sep 2, 2020

gbaned requested review from chsigg and removed request for chsigg September 2, 2020 15:31

reedwm removed their request for review September 4, 2020 00:54

gbaned added the awaiting review Pull request awaiting review label Sep 7, 2020

chsigg approved these changes Sep 21, 2020

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Sep 21, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Sep 21, 2020

gbaned removed the awaiting review Pull request awaiting review label Sep 21, 2020

tensorflow-copybara merged commit 2ef1ff6 into tensorflow:master Sep 22, 2020

duncanriach deleted the deterministic-image-resize-bilinear branch September 22, 2020 21:03

duncanriach mentioned this pull request Sep 29, 2020

Add GPU-deterministic back-prop for fused softmax/cross-entropy ops #38185

Closed

duncanriach mentioned this pull request Nov 10, 2020

Add to release notes: deterministic tf.image.resize (bilinear) #44717

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU-deterministic tf.image.resize (bilinear) #39243

GPU-deterministic tf.image.resize (bilinear) #39243

duncanriach commented May 7, 2020 •

edited

Loading

chsigg May 11, 2020

duncanriach May 12, 2020 •

edited

Loading

duncanriach May 12, 2020 •

edited

Loading

chsigg May 11, 2020

duncanriach May 12, 2020 •

edited

Loading

duncanriach May 12, 2020 •

edited

Loading

duncanriach May 12, 2020

duncanriach May 12, 2020

chsigg left a comment

duncanriach commented May 12, 2020

cheshire commented Jun 8, 2020

gbaned commented Jul 3, 2020

gbaned commented Jul 24, 2020

duncanriach commented Aug 11, 2020

cheshire commented Aug 11, 2020

gbaned commented Aug 13, 2020

duncanriach commented Aug 15, 2020

tensorflowbutler commented Sep 1, 2020

duncanriach commented Sep 1, 2020

duncanriach commented Sep 2, 2020

reedwm commented Sep 4, 2020

duncanriach commented Sep 22, 2020

GPU-deterministic tf.image.resize (bilinear) #39243

GPU-deterministic tf.image.resize (bilinear) #39243

Conversation

duncanriach commented May 7, 2020 • edited Loading

chsigg May 11, 2020

Choose a reason for hiding this comment

duncanriach May 12, 2020 • edited Loading

Choose a reason for hiding this comment

duncanriach May 12, 2020 • edited Loading

Choose a reason for hiding this comment

chsigg May 11, 2020

Choose a reason for hiding this comment

duncanriach May 12, 2020 • edited Loading

Choose a reason for hiding this comment

duncanriach May 12, 2020 • edited Loading

Choose a reason for hiding this comment

duncanriach May 12, 2020

Choose a reason for hiding this comment

duncanriach May 12, 2020

Choose a reason for hiding this comment

chsigg left a comment

Choose a reason for hiding this comment

duncanriach commented May 12, 2020

cheshire commented Jun 8, 2020

gbaned commented Jul 3, 2020

gbaned commented Jul 24, 2020

duncanriach commented Aug 11, 2020

cheshire commented Aug 11, 2020

gbaned commented Aug 13, 2020

duncanriach commented Aug 15, 2020

tensorflowbutler commented Sep 1, 2020

duncanriach commented Sep 1, 2020

duncanriach commented Sep 2, 2020

reedwm commented Sep 4, 2020

duncanriach commented Sep 22, 2020

duncanriach commented May 7, 2020 •

edited

Loading

duncanriach May 12, 2020 •

edited

Loading

duncanriach May 12, 2020 •

edited

Loading

duncanriach May 12, 2020 •

edited

Loading

duncanriach May 12, 2020 •

edited

Loading