Skip to content

[WebGPU] Unify core implementations of GEMM and MatMul #24586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

xiaofeihan1
Copy link
Contributor

@xiaofeihan1 xiaofeihan1 commented Apr 29, 2025

Description

This PR extract core implementations into gemm_utils.cc which is used to generate shader both GEMM and Matmul ops. The core implemenations included scalar and vec4 versions of GEMM and Matmul.

Motivation and Context

There are many common codes for GEMM and Matmul, so we want to extra common code to unify their implementations.
Blank diagram (1)

@xiaofeihan1 xiaofeihan1 changed the title [WIP][WebGPU] Add vec1 implementation for GEMM [WIP][WebGPU] Unify core implementations of GEMM and MatMul May 12, 2025
@xiaofeihan1 xiaofeihan1 marked this pull request as ready for review May 14, 2025 02:18
@xiaofeihan1 xiaofeihan1 changed the title [WIP][WebGPU] Unify core implementations of GEMM and MatMul [WebGPU] Unify core implementations of GEMM and MatMul May 14, 2025
@fs-eire fs-eire added the ep:WebGPU ort-web webgpu provider label May 14, 2025
@xiaofeihan1 xiaofeihan1 requested a review from qjia7 May 16, 2025 09:17
qjia7
qjia7 previously approved these changes May 19, 2025
Copy link
Contributor

@qjia7 qjia7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
@fs-eire @guschmue Please take a look, thanks.

@qjia7 qjia7 requested review from fs-eire and guschmue May 19, 2025 01:40
@guschmue
Copy link
Contributor

testing it with a bunch of models that use Gemm

@guschmue
Copy link
Contributor

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@guschmue
Copy link
Contributor

I see some shader compile issue for couple of models in MatMul. Let me try to isolate it.

@guschmue
Copy link
Contributor

guschmue commented May 19, 2025

In HandleMaybeBiasForMatMul:
output_element_t needs to be output_value_t ?

if (has_bias) {
shader.AdditionalImplementation() << " value = value + output_element_t(" << (is_channels_last ? "bias[colIn]" : "bias[row]") << ");\n";
}

Surprised that no unit test runs into it.

@guschmue
Copy link
Contributor

With the issue above fixed locally, 90 model test now reports only numerical issues:
total: 93, setup: 92, pass: 64
pass should be around 84.

Simplest model with numerical issues:
https://huggingface.co/onnx-community/mobilenetv4_conv_small.e2400_r224_in1k

@xiaofeihan1
Copy link
Contributor Author

Surprised that no unit test runs into it.

Thanks for the callout!!! I will add some test cases to cover the scenario. Before there is only small matrix(M,N,K<8) with bias test case.

@xiaofeihan1
Copy link
Contributor Author

With the issue above fixed locally, 90 model test now reports only numerical issues: total: 93, setup: 92, pass: 64 pass should be around 84.

Simplest model with numerical issues: https://huggingface.co/onnx-community/mobilenetv4_conv_small.e2400_r224_in1k

Thanks for helping with the testing! I'll look into why there's a problem.

@fs-eire
Copy link
Contributor

fs-eire commented May 22, 2025

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@xiaofeihan1
Copy link
Contributor Author

With the issue above fixed locally, 90 model test now reports only numerical issues: total: 93, setup: 92, pass: 64 pass should be around 84.

Simplest model with numerical issues: https://huggingface.co/onnx-community/mobilenetv4_conv_small.e2400_r224_in1k

Thanks Guenther! I have fixed the numerical issue and add more test to cover this scenario. Could you help me to run the 90 model test again with the latest commit?

@fs-eire
Copy link
Contributor

fs-eire commented May 22, 2025

seems like build break with the following error:

/onnxruntime_src/onnxruntime/test/providers/cpu/math/gemm_test.cc:1060:15: error: unused variable ‘bias_value’ [-Werror=unused-variable]
 1060 |         float bias_value = 0.0f;
      |               ^~~~~~~~~~

@xiaofeihan1 xiaofeihan1 requested a review from guschmue May 23, 2025 03:50
@xiaofeihan1
Copy link
Contributor Author

seems like build break with the following error:

/onnxruntime_src/onnxruntime/test/providers/cpu/math/gemm_test.cc:1060:15: error: unused variable ‘bias_value’ [-Werror=unused-variable]
 1060 |         float bias_value = 0.0f;
      |               ^~~~~~~~~~

Thanks. Done

@fs-eire
Copy link
Contributor

fs-eire commented May 23, 2025

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@guschmue
Copy link
Contributor

let me kick the 90 model test

@guschmue
Copy link
Contributor

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@guschmue
Copy link
Contributor

90 model test looks good now

@xiaofeihan1
Copy link
Contributor Author

90 model test looks good now
Cool!!Thanks Guenther for testing!
@fs-eire Let’s move forward with merging this PR. Any comment is welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:WebGPU ort-web webgpu provider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants