-
Notifications
You must be signed in to change notification settings - Fork 3.2k
[webgpu] Add Matmul8bits Support #24546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sushanthr
reviewed
Apr 28, 2025
onnxruntime/contrib_ops/webgpu/quantization/subgroup_matrix_matmul_nbits.cc
Outdated
Show resolved
Hide resolved
sushanthr
reviewed
Apr 28, 2025
onnxruntime/contrib_ops/webgpu/quantization/dp4a_matmul_nbits.cc
Outdated
Show resolved
Hide resolved
sushanthr
reviewed
Apr 28, 2025
onnxruntime/contrib_ops/webgpu/quantization/dp4a_matmul_nbits.cc
Outdated
Show resolved
Hide resolved
sushanthr
reviewed
Apr 28, 2025
sushanthr
reviewed
Apr 28, 2025
@sushanthr As we discussed offline, split the int8 support for dp4/subgroupMatrix into a separate PR #24590 to simplify the review. The corresponding comments have been resolved in that PR. Thanks. |
guschmue
pushed a commit
that referenced
this pull request
Apr 30, 2025
This PR enables matmul8bits for the dp4/subgroupMatrix path in webgpu. This PR is separated from #24546 for easier review.
sushraja-msft
approved these changes
May 2, 2025
sushraja-msft
previously approved these changes
May 2, 2025
guschmue
approved these changes
May 6, 2025
ankitm3k
pushed a commit
to intel/onnxruntime
that referenced
this pull request
May 12, 2025
) This PR enables matmul8bits for the dp4/subgroupMatrix path in webgpu. This PR is separated from microsoft#24546 for easier review.
baijumeswani
pushed a commit
that referenced
this pull request
May 14, 2025
This PR enables matmul8bits for the dp4/subgroupMatrix path in webgpu. This PR is separated from #24546 for easier review.
baijumeswani
pushed a commit
that referenced
this pull request
May 14, 2025
This PR enables matmul8bits for the dp4/subgroupMatrix path in webgpu. This PR is separated from #24546 for easier review.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR adds the support for 8-bit quantization in the
MatMulNBits
operation in WebGPU.It does below things:
MatMulNBitsProgram
as the fallback path which is the original generation path for block size = 32. Now make it support any blocks size without limitations. And remove the original complicated programs.MatMulNBitsWideTileProgram
for all platforms.