Skip to content

graph: backend: dnnl: backend refactor and sdpa v1 kernel support quantize SDPA #3423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

xiang1guo
Copy link
Contributor

Background

This is a follow-up work based on #2930 and #2931. The PR mainly focuses on supporting quantized SDPA with internal dnnl_sdpa. It helps to reduce graph compilation time and also simplifies the backend optimization pass.

Works

  • DNNL backend refactor:
    • attach fusion info to op attr directly
    • rename fusion_info_mgr to reflect it's current usage
  • Support compressed SDPA with internal dnnl_sdpa.
  • Support legacy GQA pattern with internal dnnl_sdpa.
  • Merge sdp_primitive_kernel_t and sdp_primitive_v1_kernel_t

TODO

  • Support CPU decompose kernel with internal dnnl_sdpa.

Testing results:

For all 218 mha test cases, we now have 67 ukernel-optimized cases that can run successfully in the sdp_primitive_v1_kernel_t kernel.

  • compressed SDPA dot graph:
    before fusion
    image
    after fusion
    image

  • legacy GQA dot graph
    before fusion
    image
    after fusion
    image

@xiang1guo xiang1guo self-assigned this Jun 16, 2025
@xiang1guo xiang1guo added the component:graph-api Codeowner: @oneapi-src/onednn-graph label Jun 16, 2025
@xiang1guo xiang1guo requested a review from a team as a code owner June 16, 2025 03:25
@github-actions github-actions bot added the component:tests Codeowner: @oneapi-src/onednn-arch label Jun 16, 2025
Copy link
Contributor

@TaoLv TaoLv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please split the fusion info refactor into a separate PR, for better review experience?

@xiang1guo
Copy link
Contributor Author

Can you please split the fusion info refactor into a separate PR, for better review experience?

ok, sure, will do that.

@xiang1guo xiang1guo force-pushed the xiang/main/backend-refactor branch from 5b6dac7 to 1f15393 Compare June 16, 2025 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:graph-api Codeowner: @oneapi-src/onednn-graph component:tests Codeowner: @oneapi-src/onednn-arch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants