-
Notifications
You must be signed in to change notification settings - Fork 3.3k
[webgpu] support intel subgroup matrix on matmul_nbits #24898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The patch enables intel subgroup matrix on matmul_bits operator, and temporarily supports it on vulkan backend and xe-2lpg arch, we will extend the functions on more subgroup matrix configs and platforms.
|
Need to merge to latest main branch to fix the CI pipeline issue. |
Please do NOT merge it to upstream today, I have some optimization to merge it tomorrow, thanks. |
xhcao, does this work on windows. Would it be possible to share instructions on how to try vk backend with ort webgpu ? |
@sushraja-msft The PR enables the feature on intel+"xe-2lpg" platform, on windows, you could build onnxruntime with below command, But after upgrading dawn yesterday, the compilation of shader would fail. If I output
|
After tuning the work group size (128 -> 256), tile size of A and removing the tile shared memory of C, the performance is better (~20%) than dp4a code path on LNL on win+vulkan. |
@sushraja-msft Could help review this PR, thanks? |
onnxruntime/contrib_ops/webgpu/quantization/subgroup_matrix_matmul_nbits.cc
Outdated
Show resolved
Hide resolved
onnxruntime/contrib_ops/webgpu/quantization/subgroup_matrix_matmul_nbits.cc
Show resolved
Hide resolved
onnxruntime/contrib_ops/webgpu/quantization/subgroup_matrix_matmul_nbits.cc
Show resolved
Hide resolved
7a712bc
to
fec4b17
Compare
It looks like all code path will not run subgroup matrix in wasm. (correct me if I'm wrong) maybe simple exclude subgroup_matrix_matmul_nbits.cc and subgroup_matrix_matmul_nbits.h from web assembly build is easier to resolve the wasm build? |
You are right. Done. Thanks. |
Sorry. It was my fault to exclude one more code for wasm target, I had built wasm target on local machine, but not found the failure. |
/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline |
Azure Pipelines successfully started running 5 pipeline(s). |
The patch enables intel subgroup matrix on matmul_bits operator, and temporarily supports it on vulkan backend and xe-2lpg arch, we will extend the functions on more subgroup matrix configs and platforms.
Description
Motivation and Context