[python/webgpu] faster indexing along multiple dimensions (w/o unnecessary copies) #23217
Unanswered
sluijs
asked this question in
Performance Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm working with large tensors of rank 5 (e.g. w/ shape
[20, 60, 512, 512, 1]
). In the model outlined below, I'm simply trying to index the tensor along multiple dimensions. Executing this in PyTorch-CPU and onnxruntime in Python takes around ~10-30 microseconds. However, executing w/ onnxruntime-web and the WebGPU execution provider this takes ~120 ms using pre-allocated GPU buffers.The following graph seems to indicate that ONNX first indexes the first dim
x[index[0]]
and sequentially indexes the other dims. If I understand correctly, each step would create a copy of the underlying data. Is there a way to prevent these copies from happening, or only a copy of the final slice?Beta Was this translation helpful? Give feedback.
All reactions