[wasm] Optimize WASM relaxed simd MlasGemmQuantKernel #25048

fanchenkong1 · 2025-06-13T08:18:04Z

Description

This change introduced a 6x8 QGEMM micro kernel for WASM relaxed SIMD build.

Motivation and Context

This change optimizes the performance of QGEMM on x64 devices with AVX-VNNI.

Mlas bench/RPL laptop/node v24.1.0	baseline	opt	diff
QGEMM/UnsignedANoPackB/M:384/N:1024/K:1024/Batch:1/Threads:4/real_time	2452212	1708338	44%
QGEMM/UnsignedANoPackB/M:384/N:1024/K:3072/Batch:1/Threads:4/real_time	9053789	6395584	42%
QGEMM/UnsignedANoPackB/M:384/N:1024/K:4096/Batch:1/Threads:4/real_time	12109727	`8189719`	48%
QGEMM/UnsignedANoPackB/M:384/N:4096/K:1024/Batch:1/Threads:4/real_time	11787607	7926226	49%

This change optimizes MlasGemmQuantKernel for WASM relaxed simd build. | Mlas bench/RPL laptop/node v24.1.0 | baseline | opt | diff | |------------------------------------------------------------------------|----------|---------|------| | QGEMM/UnsignedANoPackB/M:384/N:1024/K:1024/Batch:1/Threads:4/real_time | 2452212 | 1708338 | 44% | | QGEMM/UnsignedANoPackB/M:384/N:1024/K:3072/Batch:1/Threads:4/real_time | 9053789 | 6395584 | 42% | | QGEMM/UnsignedANoPackB/M:384/N:1024/K:4096/Batch:1/Threads:4/real_time | 12109727 | 8189719 | 48% | | QGEMM/UnsignedANoPackB/M:384/N:4096/K:1024/Batch:1/Threads:4/real_time | 11787607 | 7926226 | 49% |

Copilot

Pull Request Overview

This PR optimizes the WASM relaxed SIMD QGEMM micro kernel by introducing a 6x8 kernel implementation to improve performance. Key changes include:

Introduction of a generic row-count based implementation (6x8 and 1x8) using templated function GemmQuantKernelNx8Impl.
Refactoring the accumulation and pointer management logic, including a new DotPairAdd helper for FMA operations.
Updated kernel stride configuration in the dispatch structure to align with the 6x8 kernel design.

Comments suppressed due to low confidence (2)

onnxruntime/core/mlas/lib/qgemm_kernel_wasmrelaxedsimd.cpp:493

[nitpick] Consider renaming the lambda 'Tail' to a more descriptive name (e.g., 'outputTail' or 'storeTail') to clarify its purpose in handling partial column outputs.

auto Tail = [&](size_t cols, auto load_c, auto store_c) {

onnxruntime/core/mlas/lib/qgemm_kernel_wasmrelaxedsimd.cpp:541

[nitpick] It would be helpful to add an inline comment clarifying why the CountM parameter is passed as 0 (since it is ignored) to aid reader understanding.

return GemmQuantKernelNx8Impl<6>(A, B, C, PackedCountK, 0, CountN, ldc,

onnxruntime/core/mlas/lib/qgemm_kernel_wasmrelaxedsimd.cpp

fanchenkong1 · 2025-06-17T05:37:16Z

Thanks @guschmue for reviewing this change!

The latest change incorporates Copilot's suggestions. PTAL, thanks!

As for the CI failure in the last run (Windows GPU CUDA CI), the log shows four identical failures related to HuggingFace model downloads, which appear unrelated to this change. e.g.

transformers/test_optimizer_huggingface_bert.py::TestHuggingfaceBertModelOptimization::test_xlm_roberta - OSError: We couldn't connect to '[https://huggingface.co'](https://huggingface.co/) to load this file, couldn't find it in the cached files and it looks like hf-internal-testing/tiny-xlm-roberta is not the path to a directory containing a file named config.json.

A CI re-run might help clear the transient failures.

guschmue · 2025-06-19T16:49:58Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

azure-pipelines · 2025-06-19T16:50:18Z

Azure Pipelines successfully started running 5 pipeline(s).

fanchenkong1 force-pushed the wasm-relaxed-simd-qgemm-opt branch from c3246c9 to 729830d Compare June 16, 2025 05:24

guschmue requested a review from Copilot June 16, 2025 16:24

Copilot AI reviewed Jun 16, 2025

View reviewed changes

onnxruntime/core/mlas/lib/qgemm_kernel_wasmrelaxedsimd.cpp Outdated Show resolved Hide resolved

update comment

0bb065d

guschmue approved these changes Jun 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[wasm] Optimize WASM relaxed simd MlasGemmQuantKernel #25048

[wasm] Optimize WASM relaxed simd MlasGemmQuantKernel #25048

fanchenkong1 commented Jun 13, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

fanchenkong1 commented Jun 17, 2025 •

edited

Loading

Uh oh!

guschmue commented Jun 19, 2025

Uh oh!

azure-pipelines bot commented Jun 19, 2025

Uh oh!

Uh oh!

[wasm] Optimize WASM relaxed simd MlasGemmQuantKernel #25048

Are you sure you want to change the base?

[wasm] Optimize WASM relaxed simd MlasGemmQuantKernel #25048

Conversation

fanchenkong1 commented Jun 13, 2025

Description

Motivation and Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

fanchenkong1 commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guschmue commented Jun 19, 2025

Uh oh!

azure-pipelines bot commented Jun 19, 2025

Uh oh!

Uh oh!

fanchenkong1 commented Jun 17, 2025 •

edited

Loading