Skip to content

Fix CUDA GatherND batch dimension validation regression #25069

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Jun 14, 2025

Fixes a regression where GatherND operations would fail with CUDAExecutionProvider but work correctly with CPUExecutionProvider, causing the error:

gather_nd.cc:30 CheckBatchDimensionsMatch Batch dimensions differ at index 0: 1 != 3, tensor indices: 0, 1

Root Cause

The CUDA implementation had an additional CheckBatchDimensionsMatch validation that enforced strict matching of batch dimensions between input and indices tensors. This validation was not present in the CPU implementation, creating inconsistent behavior between execution providers.

Solution

Removed the overly restrictive batch dimension validation from the CUDA implementation to align with CPU behavior. The CPU implementation has been working correctly without this validation, demonstrating that it's safe to remove.

Changes

  • onnxruntime/core/providers/cuda/tensor/gather_nd.cc: Removed CheckBatchDimensionsMatch call that was causing the regression
  • onnxruntime/test/providers/cpu/tensor/gather_nd_op_test.cc: Added regression test GatherND_flexible_input_shapes_regression to prevent this issue from recurring

Testing

The added test case validates that GatherND works correctly with flexible input shapes when using the default batch_dims=0, ensuring this regression doesn't happen again.

Fixes #25053.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
@Copilot Copilot AI changed the title [WIP] onnxruntime with CUDAExecutionProvider crashes: gather_nd.cc:30 CheckBatchDimensionsMatch Batch dimensions differ at index 0: 1 != 3, tensor indices: 0, 1 Fix CUDA GatherND batch dimension validation regression Jun 14, 2025
@Copilot Copilot AI requested a review from justinchuby June 14, 2025 22:41
Copilot finished work on behalf of justinchuby June 14, 2025 22:41
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Comment on lines +333 to +335
test.AddOutput<float>("output", {3}, {0.625f, 0.0608f, 1.0f});

// This test should pass on all providers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test.AddOutput<float>("output", {3}, {0.625f, 0.0608f, 1.0f});
// This test should pass on all providers
test.AddOutput<float>("output", {3}, {0.625f, 0.0608f, 1.0f});
// This test should pass on all providers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

onnxruntime with CUDAExecutionProvider crashes: gather_nd.cc:30 CheckBatchDimensionsMatch Batch dimensions differ at index 0: 1 != 3, tensor indices: 0, 1
2 participants