Fix CUDA GatherND batch dimension validation regression #25069

Copilot · 2025-06-14T22:29:35Z

Fixes a regression where GatherND operations would fail with CUDAExecutionProvider but work correctly with CPUExecutionProvider, causing the error:

gather_nd.cc:30 CheckBatchDimensionsMatch Batch dimensions differ at index 0: 1 != 3, tensor indices: 0, 1

Root Cause

The CUDA implementation had an additional CheckBatchDimensionsMatch validation that enforced strict matching of batch dimensions between input and indices tensors. This validation was not present in the CPU implementation, creating inconsistent behavior between execution providers.

Solution

Removed the overly restrictive batch dimension validation from the CUDA implementation to align with CPU behavior. The CPU implementation has been working correctly without this validation, demonstrating that it's safe to remove.

Changes

onnxruntime/core/providers/cuda/tensor/gather_nd.cc: Removed CheckBatchDimensionsMatch call that was causing the regression
onnxruntime/test/providers/cpu/tensor/gather_nd_op_test.cc: Added regression test GatherND_flexible_input_shapes_regression to prevent this issue from recurring

Testing

The added test case validates that GatherND works correctly with flexible input shapes when using the default batch_dims=0, ensuring this regression doesn't happen again.

Fixes #25053.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

github-actions

You can commit the suggested changes from lintrunner.

github-actions · 2025-06-16T14:30:07Z

onnxruntime/test/providers/cpu/tensor/gather_nd_op_test.cc

+  test.AddOutput<float>("output", {3}, {0.625f, 0.0608f, 1.0f});
+
+  // This test should pass on all providers


Suggested change

test.AddOutput<float>("output", {3}, {0.625f, 0.0608f, 1.0f});

// This test should pass on all providers

test.AddOutput<float>("output", {3}, {0.625f, 0.0608f, 1.0f});

// This test should pass on all providers

Initial plan for issue

590f5b0

Copilot AI assigned Copilot and justinchuby Jun 14, 2025

Copilot started work on behalf of justinchuby June 14, 2025 22:29 View session

Fix CUDA GatherND batch dimension validation regression

7eaf4bc

Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] onnxruntime with CUDAExecutionProvider crashes: gather_nd.cc:30 CheckBatchDimensionsMatch Batch dimensions differ at index 0: 1 != 3, tensor indices: 0, 1~~ Fix CUDA GatherND batch dimension validation regression Jun 14, 2025

Copilot AI requested a review from justinchuby June 14, 2025 22:41

Copilot finished work on behalf of justinchuby June 14, 2025 22:41

github-actions bot reviewed Jun 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix CUDA GatherND batch dimension validation regression #25069

Fix CUDA GatherND batch dimension validation regression #25069

Copilot AI commented Jun 14, 2025 •

edited

Loading

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot Jun 16, 2025

Uh oh!

Uh oh!

		test.AddOutput<float>("output", {3}, {0.625f, 0.0608f, 1.0f});

		// This test should pass on all providers

Fix CUDA GatherND batch dimension validation regression #25069

Are you sure you want to change the base?

Fix CUDA GatherND batch dimension validation regression #25069

Conversation

Copilot AI commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root Cause

Solution

Changes

Testing

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI commented Jun 14, 2025 •

edited

Loading