head size 512 for sdpa by syurkevi · Pull Request #2976 · uxlfoundation/oneDNN

syurkevi · 2025-03-28T21:47:58Z

Description

This PR enables a head size<=512 for the SDPA microkernel. Good configurations were profiled over PVC, BMG, LNL and DG2.
Most shapes with head size 512 still give good speedups vs a primitives based implementation. Even larger head sizes may still be beneficial, more investigation to follow.

Checklist

General

Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
Have you formatted the code using clang-format?

syurkevi · 2025-03-28T21:52:02Z

make test
disable test_device_cpu
disable build_cpu_runtime_omp
disable build_cpu_runtime_sycl
disable build_cpu_runtime_tbb
disable benchdnn_all
enable benchdnn_graph
enable test_device_gpu
enable arch_gpu_xe-hpc
enable arch_gpu_xe-hpg-atsm
enable arch_gpu_xe-hpg-dg2
enable arch_gpu_xe-lp
enable arch_gpu_xe-lpg
enable arch_gpu_xe-lpg+
enable arch_gpu_xe2-hpg-bmg
enable arch_gpu_xe2-lpg

src/gpu/intel/ocl/micro_sdpa.cpp

tests/gtests/internals/test_sdpa.cpp

TaoLv · 2025-04-02T02:03:03Z

@petercad @syurkevi Is it possible to further extend this to 576 to support MLA? There is the same request for flash attention.

syurkevi · 2025-04-02T03:36:36Z

@petercad @syurkevi Is it possible to further extend this to 576 to support MLA? There is the same request for flash attention.

576 is close enough to 512 that the current configs may be applicable as is. I've pushed a test branch internally with this change if you'd like to try it out (I have some concerns wrt/correctness). If 576 is all that's needed, I'll perform some additional testing and bump the number in this PR.

If we want even greater head sizes, at some point there will be too much register demands to be performant (or run at all depending on the hardware). Something like head_size=1024 will require more tuning+testing.

TaoLv · 2025-04-02T13:57:50Z

@syurkevi Good to know it. Feel free to add it in another PR. I think 576 will be enough as for now.

atkassen · 2025-04-02T17:16:58Z

In src/gpu/intel/ocl/micro_sdpa.cpp,

on (new) line 79:

        if (!q_config_str.empty() && quantized) config_str = std::move(q_config_str);

before (new) line 133:
```
            break;
```

to address some coverity issues. Thanks!

syurkevi · 2025-04-03T02:42:39Z

@syurkevi Good to know it. Feel free to add it in another PR. I think 576 will be enough as for now.

Bad news here after some testing.
There are no good quantized configs for head size=576. There are a few ok configs with fp16 (except DG2) that I did enable.
We'd likely need to make some updates to the microkernel generation or change the kernel to iterate over the head size w/the existing configs, both options quite involved.

syurkevi · 2025-04-03T02:56:05Z

make test
disable test_device_cpu
disable build_cpu_runtime_omp
disable build_cpu_runtime_sycl
disable build_cpu_runtime_tbb
disable benchdnn_all
enable benchdnn_graph
enable test_device_gpu
enable arch_gpu_xe-hpc
enable arch_gpu_xe-hpg-atsm
enable arch_gpu_xe-hpg-dg2
enable arch_gpu_xe-lp
enable arch_gpu_xe-lpg
enable arch_gpu_xe-lpg+
enable arch_gpu_xe2-hpg-bmg
enable arch_gpu_xe2-lpg

syurkevi requested review from a team as code owners March 28, 2025 21:47

github-actions bot added platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel component:tests Codeowner: @oneapi-src/onednn-arch labels Mar 28, 2025

dzarukin reviewed Mar 31, 2025

View reviewed changes

src/gpu/intel/ocl/micro_sdpa.cpp Outdated Show resolved Hide resolved

tests/gtests/internals/test_sdpa.cpp Outdated Show resolved Hide resolved

petercad approved these changes Apr 1, 2025

View reviewed changes

syurkevi added 3 commits April 1, 2025 19:01

xe: sdpa: add configs for head_size of 512

c0d0e96

tests: sdpa: add complex_fusion tests for head size 512

dcc4c0e

xe: sdpa: enable 32-wide block loads for DG2

57ae168

xe: sdpa: refactor config selection to separate header

36ead63

syurkevi force-pushed the syurkevi/sdpa_h512 branch from 48978d1 to 6778889 Compare April 2, 2025 03:18

syurkevi requested a review from a team as a code owner April 2, 2025 03:18

dzarukin approved these changes Apr 2, 2025

View reviewed changes

TaoLv approved these changes Apr 2, 2025

View reviewed changes

syurkevi added 3 commits April 2, 2025 19:38

xe: sdpa: update configs for xe2 granularity

d8ff633

xe: sdpa: address coverity issues

2b2b35f

xe: sdpa: enable head size 576 for f16

5e59499

syurkevi force-pushed the syurkevi/sdpa_h512 branch from 6778889 to 5e59499 Compare April 3, 2025 02:41

syurkevi merged commit 814d0e9 into main Apr 3, 2025
22 of 23 checks passed

syurkevi deleted the syurkevi/sdpa_h512 branch April 3, 2025 17:09

TaoLv mentioned this pull request Apr 7, 2025

doc: graph: a few document fixes #3035

Merged

vpirogov mentioned this pull request May 10, 2025

oneDNN v3.8 release notes #3064

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

head size 512 for sdpa#2976

head size 512 for sdpa#2976
syurkevi merged 7 commits intomainfrom
syurkevi/sdpa_h512

syurkevi commented Mar 28, 2025

Uh oh!

syurkevi commented Mar 28, 2025

Uh oh!

Uh oh!

Uh oh!

TaoLv commented Apr 2, 2025

Uh oh!

syurkevi commented Apr 2, 2025 •

edited

Loading

Uh oh!

TaoLv commented Apr 2, 2025

Uh oh!

atkassen commented Apr 2, 2025 •

edited

Loading

Uh oh!

syurkevi commented Apr 3, 2025 •

edited

Loading

Uh oh!

syurkevi commented Apr 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

syurkevi commented Mar 28, 2025

Description

Checklist

General

Uh oh!

syurkevi commented Mar 28, 2025

Uh oh!

Uh oh!

Uh oh!

TaoLv commented Apr 2, 2025

Uh oh!

syurkevi commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TaoLv commented Apr 2, 2025

Uh oh!

atkassen commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

syurkevi commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

syurkevi commented Apr 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

syurkevi commented Apr 2, 2025 •

edited

Loading

atkassen commented Apr 2, 2025 •

edited

Loading

syurkevi commented Apr 3, 2025 •

edited

Loading