Skip to content

graph: backend: dnnl: sdpa v1 kernel support quantize SDPA and backend refactor #3334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

xiang1guo
Copy link
Contributor

@xiang1guo xiang1guo commented May 27, 2025

Background

This is a follow-up work based on #2930 and #2931. The PR mainly focuses on supporting quantized SDPA with internal dnnl_sdpa. It helps to reduce graph compilation time and also simplifies the backend optimization pass.

Works

  • Support compressed SDPA with internal dnnl_sdpa.
  • Support legacy GQA pattern with internal dnnl_sdpa.
  • Merge sdp_primitive_kernel_t and sdp_primitive_v1_kernel_t

TODO

  • Support CPU decompose kernel with internal dnnl_sdpa.

Fused graph

  • compressed SDPA dot graph:
    before fusion
    image
    after fusion
    image

  • legacy GQA dot graph
    before fusion
    image
    after fusion
    image

@xiang1guo xiang1guo self-assigned this May 27, 2025
@xiang1guo xiang1guo added the component:graph-api Codeowner: @oneapi-src/onednn-graph label May 27, 2025
@xiang1guo xiang1guo force-pushed the xiang/main/int8-internal-sdpa branch from 6f82e35 to 0f9aa9a Compare May 27, 2025 13:51
@xiang1guo xiang1guo changed the title [Draft] graph: backend: dnnl: sdpa v1 kernel support int8 and backend refactor [Draft] graph: backend: dnnl: sdpa v1 kernel support quantize SDPA and backend refactor May 27, 2025
@xiang1guo xiang1guo marked this pull request as ready for review May 28, 2025 01:47
@xiang1guo xiang1guo requested a review from a team as a code owner May 28, 2025 01:47
@xiang1guo xiang1guo changed the title [Draft] graph: backend: dnnl: sdpa v1 kernel support quantize SDPA and backend refactor graph: backend: dnnl: sdpa v1 kernel support quantize SDPA and backend refactor May 28, 2025
@xiang1guo
Copy link
Contributor Author

We have alternative solution in #3423

@xiang1guo xiang1guo closed this Jun 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:graph-api Codeowner: @oneapi-src/onednn-graph
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant