Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU-deterministic tf.image.resize (bilinear) #39243

Conversation

duncanriach
Copy link
Contributor

@duncanriach duncanriach commented May 7, 2020

This pull request extends the deterministic functionality that is enabled by TF_DETERMINISTIC_OPS.

Prior to the changes delivered by this pull-request, the CUDA back-prop kernels for tf.image.resize with method=ResizeMethod.BILINEAR introduced uncontrollable noise.

After application of this pull request, setting the environment variable TF_DETERMINISTIC_OPS to "true" or "1" selects a new, deterministic CUDA back-prop kernel for this op.

The exact performance of the deterministic kernels relative to the pre-existing, non-deterministic kernels has not yet been tested. However, the performance is expected to be similar for pass-through (one-to-one re-sampling) and also for down-sampling (output has less pixels than input). The performance should degrade linearly in each dimension (horizontal and vertical) as that dimension is up-sampled (output has more pixels than input). For example, the back-prop for a 1:2 deterministic up-sample in one dimension could take up to twice as long. The slowdown is also proportional to the product of the up-sampling in the two dimensions. For example, the back-prop for a 1:2 deterministic up-sample in both dimensions could take up to four times (2*2) as long.

This pull request also significantly improves the test coverage of the existing, non-deterministic functionality of method=ResizeMethod.BILINEAR, and fixes one of the tests for method=ResizeMethod.BICUBIC.

@google-ml-butler google-ml-butler bot added the size:XL CL Change Size:Extra Large label May 7, 2020
@duncanriach duncanriach changed the title GPU-deterministic tf.image.resize(method=ResizeMethod.BILINEAR) GPU-deterministic tf.image.resize with method=ResizeMethod.BILINEAR May 7, 2020
@duncanriach duncanriach changed the title GPU-deterministic tf.image.resize with method=ResizeMethod.BILINEAR GPU-deterministic tf.image.resize with method=ResizeMethod.BILINEAR May 7, 2020
@duncanriach duncanriach changed the title GPU-deterministic tf.image.resize with method=ResizeMethod.BILINEAR GPU-deterministic tf.image.resize (BILINEAR) May 7, 2020
@gbaned gbaned self-assigned this May 7, 2020
@gbaned gbaned requested a review from chsigg May 7, 2020 04:14
@duncanriach duncanriach changed the title GPU-deterministic tf.image.resize (BILINEAR) GPU-deterministic tf.image.resize (bilinear) May 7, 2020
const int b = idx / original_height;

int in_y_start = max(0, __float2int_ru(
(out_y_center - 1 + 0.5) * inverse_height_scale - 0.5));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this would promote to double?

Copy link
Contributor Author

@duncanriach duncanriach May 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The whole argument to __float2int_ru() will end up being double. This issue happens on the next line of code as well, and then of course for the x dimension versions. I'm considering changing every instance of 0.5 to 0.5f.

Copy link
Contributor Author

@duncanriach duncanriach May 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most recent commit replaces the repeated magic number (0.5) with the offset variable (of type float).

@@ -338,6 +389,55 @@ __global__ void LegacyResizeBilinearGradKernel(
}
}

template <typename T>
__global__ void LegacyResizeBilinearDeterministicGradKernel(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same kernel as ResizeBilinearDeterministicGradKernel, except in_x/y_start? If so, would it make sense to share the kernel code and pass in the pixel offset?

Copy link
Contributor Author

@duncanriach duncanriach May 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the same code, except for the calculations of in_y_start, out_y_start, in_x_start, and out_y_start. I followed the existing pattern of replicating the code, as set by ResizeBilinearGradKernel and LegacyResizeBilinearGradKernel.

What you're suggesting is an improvement. I'm considering changing it for those existing, non-deterministic kernels as well. What do you think?

Copy link
Contributor Author

@duncanriach duncanriach May 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just merged the legacy and non-legacy deterministic kernels (now running tests locally) and note that this makes the legacy kernel slightly slower than before, not only because of having to add the zero offsets, but also because the legacy kernel skipped an unnecessary clamping operation. That clamping operation always runs in the merged kernel. I still think it's better to have one kernel.

Further re-factoring, if its done, will be submitted in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit pushed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also just looked into combining the existing, non-deterministic back-prop kernels into a single kernel with an offset parameter. It think it's doable and probably worth it. I won't submit another pull request for that right now because it will collide with this current pull request.

I think it's also possible to combine two of the forward kernels into a single kernel with an offset parameter. That could probably be done in the same future pull request.

Copy link
Contributor

@chsigg chsigg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Duncan! I only have some minor comments.

@duncanriach
Copy link
Contributor Author

Thanks Duncan! I only have some minor comments.

Thanks Christian! I'm working on some changes to address your comments.

@duncanriach duncanriach requested a review from chsigg May 12, 2020 03:12
@gbaned gbaned added the awaiting review Pull request awaiting review label May 15, 2020
@gbaned gbaned requested a review from cheshire June 8, 2020 17:01
@cheshire
Copy link
Contributor

cheshire commented Jun 8, 2020

@chsigg Could you take a look?

@tensorflowbutler tensorflowbutler removed the awaiting review Pull request awaiting review label Jun 24, 2020
@gbaned gbaned added the awaiting review Pull request awaiting review label Jun 26, 2020
@gbaned
Copy link
Contributor

gbaned commented Jul 3, 2020

@chsigg Can you please review this PR ? Thanks!

@gbaned
Copy link
Contributor

gbaned commented Jul 24, 2020

@chsigg Can you please review this PR ? Thanks!

@duncanriach
Copy link
Contributor Author

Hi @gbaned, I addressed @chsigg's original review comments the same day he made them (2020-05-11) and he seems to have not been able to re-review for three months. Is there someone else who could be assigned to review?

@cheshire
Copy link
Contributor

@reedwm

@tensorflowbutler tensorflowbutler removed the awaiting review Pull request awaiting review label Aug 13, 2020
@gbaned gbaned requested a review from reedwm August 13, 2020 13:41
@gbaned
Copy link
Contributor

gbaned commented Aug 13, 2020

@duncanriach Sorry for the delay, can you please resolve conflicts? Thanks!

@duncanriach
Copy link
Contributor Author

Thanks for pushing, @gbaned. I'll be on vacation for a week. When I get back, I'll resolve the conflicts.

@gbaned gbaned added the stat:awaiting response Status - Awaiting response from author label Aug 17, 2020
@tensorflowbutler tensorflowbutler added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Sep 1, 2020
@tensorflowbutler
Copy link
Member

It has been 14 days with no activity and the awaiting response label was assigned. Is this PR still valid? Assigning the stalled label. Please comment to reassure me that this is still being worked on.

@duncanriach
Copy link
Contributor Author

Yes, this is being worked on. Please leave open.

@google-ml-butler google-ml-butler bot removed the stale This label marks the issue/pr stale - to be closed automatically if no activity label Sep 1, 2020
@duncanriach duncanriach force-pushed the deterministic-image-resize-bilinear branch from 51804c1 to 294065e Compare September 2, 2020 01:13
@duncanriach
Copy link
Contributor Author

Hi @gbaned,

The conflicts have been resolved and tested locally. I attempted to capture goodness from this conflicting commit that my changes obliterate.

Please will you now move this towards merge?

@gbaned gbaned removed the stat:awaiting response Status - Awaiting response from author label Sep 2, 2020
@gbaned gbaned requested review from chsigg and removed request for chsigg September 2, 2020 15:31
@reedwm
Copy link
Member

reedwm commented Sep 4, 2020

@chsigg can you finish reviewing?

@reedwm reedwm removed their request for review September 4, 2020 00:54
@gbaned gbaned added the awaiting review Pull request awaiting review label Sep 7, 2020
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Sep 21, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Sep 21, 2020
@gbaned gbaned removed the awaiting review Pull request awaiting review label Sep 21, 2020
@tensorflow-copybara tensorflow-copybara merged commit 2ef1ff6 into tensorflow:master Sep 22, 2020
@duncanriach
Copy link
Contributor Author

Thank you, @chsigg, @gbaned, @cheshire, and @reedwm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes ready to pull PR ready for merge process size:XL CL Change Size:Extra Large
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants