Use new GPU kernel for [unsorted] segment reductions #51392

benbarsdell · 2021-08-09T12:01:53Z

Optionally replaces the old atomics-based kernels with calls to SegmentReduceGPU (the same kernel already used for sparse segment reductions). This behavior is enabled by default, but the old kernels can be re-enabled by setting the environment variable TF_USE_ATOMIC_SEGMENT_REDUCTIONS=1. On Windows, the old kernels are always used due to a build issue with the new kernel.
This improves performance, and guarantees that these ops are deterministic. In future it is hoped that the old kernels can be removed completely.
Also adds a GPU kernel registration for SegmentMean, which didn't previously exist.

sanjoy · 2021-08-10T04:26:49Z

tensorflow/core/kernels/segment_reduction_ops_gpu_0.cu.cc

@@ -20,6 +20,22 @@ limitations under the License.

 namespace tensorflow {

+bool UseAtomicSegmentReductions() {


Can we call this UseNonDeterministicSegmentReductions?

sanjoy · 2021-08-10T05:26:04Z

tensorflow/core/kernels/segment_reduction_ops_gpu_0.cu.cc

-                                          functor::NonAtomicMaxOpGpu<T>,  \
-                                          functor::AtomicMaxOpGpu<T>>;
+#define DEFINE_SORTED_GPU_SPECS_INDEX(T, Index)                            \
+  template struct SegmentReductionFunctor<T, Index, functor::Zero<T>,      \


Can you please add /*EmptySegmentValueF=*/ and /*InitialValueF=*/ here to make it clear which arg is which? Same elsewhere.

duncanriach · 2021-08-12T04:05:50Z

First pass through this. Nice work, @benbarsdell. A couple of thoughts:

The original code differentiated between atomic and nonatomic reductions (because there were sometimes both), but not for reasons of determinism. And, of course, atomics can be used in deterministic code. So, there's some lingering old symbology around atomic/nonatomic which has gotten tangled up with (or repurposed for) deterministic/nondeterministic operation. @sanjoy picked out one example of that above. There are other places in the code where perhaps "non-atomic" would ideally be replaced with "deterministic" and where "atomic" would ideally be replaced with "nondeterministic."
I'm wondering if we should add tests to confirm determinism, or if we should rely on it being deterministic by design. There will, of course, be backup testing from @reedwm's determinism auto-checker.

- Optionally replaces the old atomics-based kernels with calls to SegmentReduceGPU (the same kernel already used for sparse segment reductions). This behavior is enabled by default, but the old kernels can be re-enabled by setting the environment variable TF_USE_ATOMIC_SEGMENT_REDUCTIONS=1. On Windows, the old kernels are always used due to a build issue with the new kernel. - This improves performance, and guarantees that these ops are deterministic. In future it is hoped that the old kernels can be removed completely. - Also adds a GPU kernel registration for SegmentMean, which didn't previously exist.

- Extends existing tests to cover several different inner dimension sizes (important for the GPU implementations).

benbarsdell · 2021-08-12T13:52:34Z

Thanks for the suggestions Duncan. I'm facing some build issues right now but will push the additional atomic->deterministic rewordings tomorrow along with a rebase.

I haven't looked closely at how the determinism tests work but will take a look.

reedwm · 2021-08-26T01:54:57Z

This was rolled back in 44cdcda. Will investigate.

PiperOrigin-RevId: 394502922 Change-Id: I4e183dc73bf209e4623682d8d622117e9f1de28a

reedwm · 2021-09-02T19:41:18Z

Rolled forward in 9a1072e

Causes an internal performance regression. PiperOrigin-RevId: 394788152 Change-Id: I702fb4ec245823b96ce82f58b1b0d6c505b674c3

reedwm · 2021-09-04T01:16:43Z

Unfortunately this was rolled back again in 51ee415, since it caused a performance regression in an internal model. Also, the test segment_reduction_ops_deterministic_test.py failed on Windows, since the deterministic algorithms were disabled on Windows.

I think the best approach here is to keep using the old nondeterministic kernels by default, and only use the new ones if either determinism is enabled or an environmental variable is set (say, TF_USE_DETERMINISTIC_SEGMENT_REDUCTIONS). Then the determinism API is unblocked and we can internally debug the performance issue, and potentially send you an example to reproduce.

@benbarsdell, do you want to create a new PR with the new kernels disabled by default, by this Wednesday? If not, I can rollforward, disabling the new kernels by default. I want to get this in as soon as possible since it is required for determinism.

PiperOrigin-RevId: 394502922 Change-Id: I4e183dc73bf209e4623682d8d622117e9f1de28a

Causes an internal performance regression. PiperOrigin-RevId: 394788152 Change-Id: I702fb4ec245823b96ce82f58b1b0d6c505b674c3

google-ml-butler bot added the size:L CL Change Size: Large label Aug 9, 2021

google-cla bot added the cla: yes label Aug 9, 2021

gbaned self-assigned this Aug 9, 2021

gbaned added the comp:core issues related to core part of tensorflow label Aug 9, 2021

gbaned added this to Assigned Reviewer in PR Queue via automation Aug 9, 2021

gbaned requested a review from reedwm August 9, 2021 14:08

google-ml-butler bot added the awaiting review Pull request awaiting review label Aug 9, 2021

benbarsdell force-pushed the gpu-SegmentReductions-new-rebased2 branch from 3d67ee5 to bb4bf3f Compare August 9, 2021 14:49

reedwm requested review from sanjoy and removed request for reedwm August 9, 2021 22:30

sanjoy suggested changes Aug 10, 2021

View reviewed changes

PR Queue automation moved this from Assigned Reviewer to Reviewer Requested Changes Aug 10, 2021

tensorflowbutler removed the awaiting review Pull request awaiting review label Aug 12, 2021

benbarsdell added 6 commits August 12, 2021 19:20

Extend tests for [unsorted] segment reductions

4ff201a

- Extends existing tests to cover several different inner dimension sizes (important for the GPU implementations).

Add inline comments to segment reduction ops GPU

dd54f67

Rename UseAtomicSegmentReductions helper function

dbba7f6

Fix formatting in segment_reduction_ops_gpu.cu.h

50dcf52

Reword "atomic" to "deterministic" in comments/errors

0532f7f

benbarsdell force-pushed the gpu-SegmentReductions-new-rebased2 branch from 1988dd4 to 0532f7f Compare August 12, 2021 23:04

gbaned requested a review from sanjoy August 18, 2021 14:37

google-ml-butler bot added the awaiting review Pull request awaiting review label Aug 18, 2021

duncanriach mentioned this pull request Aug 19, 2021

[determinism] Add segment reduction op exceptions for GPU determinism #47772

Merged

sanjoy approved these changes Aug 25, 2021

View reviewed changes

PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer Aug 25, 2021

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Aug 25, 2021

kokoro-team removed the kokoro:force-run Tests on submitted change label Aug 25, 2021

gbaned added kokoro:force-run Tests on submitted change and removed awaiting review Pull request awaiting review labels Aug 25, 2021

kokoro-team removed the kokoro:force-run Tests on submitted change label Aug 25, 2021

copybara-service bot merged commit 948248c into tensorflow:master Aug 26, 2021

PR Queue automation moved this from Approved by Reviewer to Merged Aug 26, 2021

google-ml-butler bot removed the ready to pull PR ready for merge process label Aug 26, 2021

duncanriach mentioned this pull request Aug 28, 2021

Message passing neural network determinism thwarted by tf.math.segment_sum and tf.gather NVIDIA/framework-reproducibility#25

Closed

copybara-service bot pushed a commit that referenced this pull request Sep 2, 2021

Rollforward of #51392.

9a1072e

PiperOrigin-RevId: 394502922 Change-Id: I4e183dc73bf209e4623682d8d622117e9f1de28a

copybara-service bot pushed a commit that referenced this pull request Sep 4, 2021

Rollback of #51392 (for the second time).

51ee415

Causes an internal performance regression. PiperOrigin-RevId: 394788152 Change-Id: I702fb4ec245823b96ce82f58b1b0d6c505b674c3

benbarsdell added a commit to benbarsdell/tensorflow that referenced this pull request Sep 7, 2021

Roll forward tensorflow#51392 again

0d53fe6

benbarsdell mentioned this pull request Sep 7, 2021

Replacement for #51392 #51861

Merged

arovir01 pushed a commit to arovir01/tensorflow that referenced this pull request Sep 17, 2021

Rollforward of tensorflow#51392.

d88a34d

PiperOrigin-RevId: 394502922 Change-Id: I4e183dc73bf209e4623682d8d622117e9f1de28a

arovir01 pushed a commit to arovir01/tensorflow that referenced this pull request Sep 17, 2021

Rollback of tensorflow#51392 (for the second time).

a6f745d

Causes an internal performance regression. PiperOrigin-RevId: 394788152 Change-Id: I702fb4ec245823b96ce82f58b1b0d6c505b674c3

arovir01 pushed a commit to arovir01/tensorflow that referenced this pull request Sep 17, 2021

Roll forward tensorflow#51392 again

f672b74

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use new GPU kernel for [unsorted] segment reductions #51392

Use new GPU kernel for [unsorted] segment reductions #51392

benbarsdell commented Aug 9, 2021

sanjoy Aug 10, 2021

benbarsdell Aug 10, 2021

sanjoy Aug 10, 2021

benbarsdell Aug 10, 2021

duncanriach commented Aug 12, 2021 •

edited

benbarsdell commented Aug 12, 2021

reedwm commented Aug 26, 2021

reedwm commented Sep 2, 2021

reedwm commented Sep 4, 2021

		@@ -20,6 +20,22 @@ limitations under the License.

		namespace tensorflow {

		bool UseAtomicSegmentReductions() {

Use new GPU kernel for [unsorted] segment reductions #51392

Use new GPU kernel for [unsorted] segment reductions #51392

Conversation

benbarsdell commented Aug 9, 2021

sanjoy Aug 10, 2021

Choose a reason for hiding this comment

benbarsdell Aug 10, 2021

Choose a reason for hiding this comment

sanjoy Aug 10, 2021

Choose a reason for hiding this comment

benbarsdell Aug 10, 2021

Choose a reason for hiding this comment

duncanriach commented Aug 12, 2021 • edited

benbarsdell commented Aug 12, 2021

reedwm commented Aug 26, 2021

reedwm commented Sep 2, 2021

reedwm commented Sep 4, 2021

duncanriach commented Aug 12, 2021 •

edited