New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use new GPU kernel for [unsorted] segment reductions #51392
Use new GPU kernel for [unsorted] segment reductions #51392
Conversation
3d67ee5
to
bb4bf3f
Compare
@@ -20,6 +20,22 @@ limitations under the License. | |||
|
|||
namespace tensorflow { | |||
|
|||
bool UseAtomicSegmentReductions() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this UseNonDeterministicSegmentReductions
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
functor::NonAtomicMaxOpGpu<T>, \ | ||
functor::AtomicMaxOpGpu<T>>; | ||
#define DEFINE_SORTED_GPU_SPECS_INDEX(T, Index) \ | ||
template struct SegmentReductionFunctor<T, Index, functor::Zero<T>, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add /*EmptySegmentValueF=*/
and /*InitialValueF=*/
here to make it clear which arg is which? Same elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
First pass through this. Nice work, @benbarsdell. A couple of thoughts:
|
- Optionally replaces the old atomics-based kernels with calls to SegmentReduceGPU (the same kernel already used for sparse segment reductions). This behavior is enabled by default, but the old kernels can be re-enabled by setting the environment variable TF_USE_ATOMIC_SEGMENT_REDUCTIONS=1. On Windows, the old kernels are always used due to a build issue with the new kernel. - This improves performance, and guarantees that these ops are deterministic. In future it is hoped that the old kernels can be removed completely. - Also adds a GPU kernel registration for SegmentMean, which didn't previously exist.
- Extends existing tests to cover several different inner dimension sizes (important for the GPU implementations).
Thanks for the suggestions Duncan. I'm facing some build issues right now but will push the additional atomic->deterministic rewordings tomorrow along with a rebase. I haven't looked closely at how the determinism tests work but will take a look. |
1988dd4
to
0532f7f
Compare
This was rolled back in 44cdcda. Will investigate. |
PiperOrigin-RevId: 394502922 Change-Id: I4e183dc73bf209e4623682d8d622117e9f1de28a
Rolled forward in 9a1072e |
Causes an internal performance regression. PiperOrigin-RevId: 394788152 Change-Id: I702fb4ec245823b96ce82f58b1b0d6c505b674c3
Unfortunately this was rolled back again in 51ee415, since it caused a performance regression in an internal model. Also, the test I think the best approach here is to keep using the old nondeterministic kernels by default, and only use the new ones if either determinism is enabled or an environmental variable is set (say, @benbarsdell, do you want to create a new PR with the new kernels disabled by default, by this Wednesday? If not, I can rollforward, disabling the new kernels by default. I want to get this in as soon as possible since it is required for determinism. |
PiperOrigin-RevId: 394502922 Change-Id: I4e183dc73bf209e4623682d8d622117e9f1de28a
Causes an internal performance regression. PiperOrigin-RevId: 394788152 Change-Id: I702fb4ec245823b96ce82f58b1b0d6c505b674c3
SegmentReduceGPU
(the same kernel already used for sparse segment reductions). This behavior is enabled by default, but the old kernels can be re-enabled by setting the environment variableTF_USE_ATOMIC_SEGMENT_REDUCTIONS=1
. On Windows, the old kernels are always used due to a build issue with the new kernel.SegmentMean
, which didn't previously exist.cc @nluehr @reedwm @duncanriach