Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GPU implementation of SparseReshape #47251

Merged

Conversation

benbarsdell
Copy link
Contributor

This follows #46275.

cc @nluehr

@google-ml-butler google-ml-butler bot added the size:M CL Change Size: Medium label Feb 19, 2021
@google-cla google-cla bot added the cla: yes label Feb 19, 2021
@gbaned gbaned self-assigned this Feb 19, 2021
@gbaned gbaned added the comp:core issues related to core part of tensorflow label Feb 19, 2021
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation Feb 19, 2021
@gbaned gbaned requested a review from sanjoy February 19, 2021 04:24
@@ -94,7 +94,7 @@ def testPropagatesFullyKnownDenseShapeWhenShapePartiallyKnown(self):
self.assertAllEqual((2, 3 * 4), sp_output.shape)

def testSameShape(self):
with self.session(use_gpu=False) as sess:
with self.session(use_gpu=True) as sess:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You no longer need to set use_gpu=True explicitly, it default to True.

Comment on lines 36 to 49
GPU_1D_KERNEL_LOOP(sparse_index, nnz) {
const Tindex* input_index = &input_indices[sparse_index * input_rank];
Tindex* output_index = &output_indices[sparse_index * output_rank];
Tindex dense_index = 0;
// Flatten input index from slowest- to fastest-changing dimension.
for (int i = 0; i < input_rank; ++i) {
dense_index = dense_index * input_shape[i] + input_index[i];
}
// Compute output index from fastest- to slowest-changing dimension.
for (int i = output_rank; i-- > 0;) {
Tindex output_size = output_shape[i];
output_index[i] = dense_index % output_size;
dense_index /= output_size;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to care about integer overflow in case the indices are in int32? Maybe always use int64 for dense_index?

dense_index = dense_index * input_shape[i] + input_index[i];
}
// Compute output index from fastest- to slowest-changing dimension.
for (int i = output_rank; i-- > 0;) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the more idiomatic i >= 0; i--. :)

auto config = GetGpuLaunchConfig(nnz, device);
return GpuLaunchKernel(ReshapeSparseTensorKernel<int64>, config.block_count,
config.thread_per_block, 0, device.stream(), nnz,
/* input_rank = */ input_rank,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please spell these as /*input_rank=*/; our internal tooling will then check that the param names match up.

PR Queue automation moved this from Assigned Reviewer to Reviewer Requested Changes Feb 25, 2021
- Remove use_gpu=True because it is already the default.
- Use int64 for dense_index inside kernel to avoid integer overflow.
- Change reverse for-loop style.
- Reformat inline comments to ensure internal tooling picks them up.
PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer Feb 25, 2021
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Feb 25, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Feb 25, 2021
@gbaned gbaned added ready to pull PR ready for merge process and removed ready to pull PR ready for merge process labels Feb 26, 2021
@copybara-service copybara-service bot merged commit e37b5e1 into tensorflow:master Feb 27, 2021
PR Queue automation moved this from Approved by Reviewer to Merged Feb 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes comp:core issues related to core part of tensorflow ready to pull PR ready for merge process size:M CL Change Size: Medium
Projects
PR Queue
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet

4 participants