Skip to content

[webgpu] Add Matmul8bits Support #24546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
May 6, 2025
Merged

[webgpu] Add Matmul8bits Support #24546

merged 22 commits into from
May 6, 2025

Conversation

qjia7
Copy link
Contributor

@qjia7 qjia7 commented Apr 25, 2025

Description

This PR adds the support for 8-bit quantization in the MatMulNBits operation in WebGPU.

It does below things:

  1. Unify to use MatMulNBitsProgram as the fallback path which is the original generation path for block size = 32. Now make it support any blocks size without limitations. And remove the original complicated programs.
  2. Enable MatMulNBitsWideTileProgram for all platforms.

@qjia7 qjia7 marked this pull request as ready for review April 28, 2025 14:39
@qjia7 qjia7 requested review from sushraja-msft and guschmue April 28, 2025 14:39
@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Apr 28, 2025
@qjia7 qjia7 marked this pull request as draft April 29, 2025 08:58
@qjia7
Copy link
Contributor Author

qjia7 commented Apr 29, 2025

@sushanthr As we discussed offline, split the int8 support for dp4/subgroupMatrix into a separate PR #24590 to simplify the review. The corresponding comments have been resolved in that PR. Thanks.

guschmue pushed a commit that referenced this pull request Apr 30, 2025
This PR enables matmul8bits for the dp4/subgroupMatrix path in webgpu.

This PR is separated from #24546 for easier review.
@qjia7 qjia7 marked this pull request as ready for review April 30, 2025 12:38
sushraja-msft
sushraja-msft previously approved these changes May 2, 2025
@sushraja-msft sushraja-msft merged commit 5160c67 into main May 6, 2025
91 of 98 checks passed
@sushraja-msft sushraja-msft deleted the matmul8bits branch May 6, 2025 16:38
ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request May 12, 2025
)

This PR enables matmul8bits for the dp4/subgroupMatrix path in webgpu.

This PR is separated from microsoft#24546 for easier review.
baijumeswani pushed a commit that referenced this pull request May 14, 2025
This PR enables matmul8bits for the dp4/subgroupMatrix path in webgpu.

This PR is separated from #24546 for easier review.
baijumeswani pushed a commit that referenced this pull request May 14, 2025
This PR enables matmul8bits for the dp4/subgroupMatrix path in webgpu.

This PR is separated from #24546 for easier review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:WebGPU ort-web webgpu provider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants