New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GPU implementation of SparseReshape #47251
Add GPU implementation of SparseReshape #47251
Conversation
@@ -94,7 +94,7 @@ def testPropagatesFullyKnownDenseShapeWhenShapePartiallyKnown(self): | |||
self.assertAllEqual((2, 3 * 4), sp_output.shape) | |||
|
|||
def testSameShape(self): | |||
with self.session(use_gpu=False) as sess: | |||
with self.session(use_gpu=True) as sess: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You no longer need to set use_gpu=True
explicitly, it default to True.
GPU_1D_KERNEL_LOOP(sparse_index, nnz) { | ||
const Tindex* input_index = &input_indices[sparse_index * input_rank]; | ||
Tindex* output_index = &output_indices[sparse_index * output_rank]; | ||
Tindex dense_index = 0; | ||
// Flatten input index from slowest- to fastest-changing dimension. | ||
for (int i = 0; i < input_rank; ++i) { | ||
dense_index = dense_index * input_shape[i] + input_index[i]; | ||
} | ||
// Compute output index from fastest- to slowest-changing dimension. | ||
for (int i = output_rank; i-- > 0;) { | ||
Tindex output_size = output_shape[i]; | ||
output_index[i] = dense_index % output_size; | ||
dense_index /= output_size; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to care about integer overflow in case the indices are in int32? Maybe always use int64 for dense_index
?
dense_index = dense_index * input_shape[i] + input_index[i]; | ||
} | ||
// Compute output index from fastest- to slowest-changing dimension. | ||
for (int i = output_rank; i-- > 0;) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the more idiomatic i >= 0; i--
. :)
auto config = GetGpuLaunchConfig(nnz, device); | ||
return GpuLaunchKernel(ReshapeSparseTensorKernel<int64>, config.block_count, | ||
config.thread_per_block, 0, device.stream(), nnz, | ||
/* input_rank = */ input_rank, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please spell these as /*input_rank=*/
; our internal tooling will then check that the param names match up.
- Remove use_gpu=True because it is already the default. - Use int64 for dense_index inside kernel to avoid integer overflow. - Change reverse for-loop style. - Reformat inline comments to ensure internal tooling picks them up.
This follows #46275.
cc @nluehr