graph: backend: dnnl: backend refactor and sdpa v1 kernel support quantize SDPA #3423

xiang1guo · 2025-06-16T03:25:56Z

Background

This is a follow-up work based on #2930 and #2931. The PR mainly focuses on supporting quantized SDPA with internal dnnl_sdpa. It helps to reduce graph compilation time and also simplifies the backend optimization pass.

Works

DNNL backend refactor:
- attach fusion info to op attr directly
- rename fusion_info_mgr to reflect it's current usage
Support compressed SDPA with internal dnnl_sdpa.
Support legacy GQA pattern with internal dnnl_sdpa.
Merge sdp_primitive_kernel_t and sdp_primitive_v1_kernel_t

TODO

Support CPU decompose kernel with internal dnnl_sdpa.

Testing results:

For all 218 mha test cases, we now have 67 ukernel-optimized cases that can run successfully in the sdp_primitive_v1_kernel_t kernel.

compressed SDPA dot graph:
before fusion

after fusion
legacy GQA dot graph
before fusion

after fusion

TaoLv

Can you please split the fusion info refactor into a separate PR, for better review experience?

xiang1guo · 2025-06-16T05:34:08Z

Can you please split the fusion info refactor into a separate PR, for better review experience?

ok, sure, will do that.

xiang1guo self-assigned this Jun 16, 2025

xiang1guo added the component:graph-api label Jun 16, 2025

xiang1guo requested a review from a team as a code owner June 16, 2025 03:25

github-actions bot added the component:tests label Jun 16, 2025

xiang1guo mentioned this pull request Jun 16, 2025

graph: backend: dnnl: sdpa v1 kernel support quantize SDPA and backend refactor #3334

Closed

3 tasks

TaoLv reviewed Jun 16, 2025

View reviewed changes

xiang1guo added 5 commits June 15, 2025 22:46

graph: backend: dnnl: refactor backend to make fusion info as attr

7ed3107

graph: backend: dnnl: rename fusion_info_mgr_t

e65a570

graph: backend: dnnl: support int8 pattern in sdpa v1 kernel

e487c17

graph: backend: dnnl: support legacy gqa pattern in v1 kernel

21ea4a8

graph: backend: dnnl: merge 2 sdpa primitive kernel

1f15393

xiang1guo force-pushed the xiang/main/backend-refactor branch from 5b6dac7 to 1f15393 Compare June 16, 2025 05:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

graph: backend: dnnl: backend refactor and sdpa v1 kernel support quantize SDPA #3423

graph: backend: dnnl: backend refactor and sdpa v1 kernel support quantize SDPA #3423

Uh oh!

xiang1guo commented Jun 16, 2025

Uh oh!

TaoLv left a comment

Uh oh!

xiang1guo commented Jun 16, 2025

Uh oh!

Uh oh!

graph: backend: dnnl: backend refactor and sdpa v1 kernel support quantize SDPA #3423

Are you sure you want to change the base?

graph: backend: dnnl: backend refactor and sdpa v1 kernel support quantize SDPA #3423

Uh oh!

Conversation

xiang1guo commented Jun 16, 2025

Background

Works

TODO

Testing results:

Uh oh!

TaoLv left a comment

Choose a reason for hiding this comment

Uh oh!

xiang1guo commented Jun 16, 2025

Uh oh!

Uh oh!