Skip to content

sdpa fma f16 #3422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

sdpa fma f16 #3422

wants to merge 4 commits into from

Conversation

syurkevi
Copy link
Contributor

@syurkevi syurkevi commented Jun 13, 2025

This PR adds FMA support to SDPA for the fp16 data type. This allows SDPA to work on older platforms that don't have systolic support.

  • Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • Have you formatted the code using clang-format?

Performance improvements

Performance was measured and tuned relative to a primitives only based implementation:
image_720

@syurkevi syurkevi requested a review from a team as a code owner June 13, 2025 16:39
@github-actions github-actions bot added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Jun 13, 2025
@syurkevi syurkevi force-pushed the syurkevi/sdpa_fma_f16 branch from 8574d21 to 66937bb Compare June 13, 2025 16:55
@syurkevi syurkevi requested a review from a team as a code owner June 13, 2025 16:55
@github-actions github-actions bot added the component:tests Codeowner: @oneapi-src/onednn-arch label Jun 13, 2025
@syurkevi syurkevi force-pushed the syurkevi/sdpa_fma_f16 branch 2 times, most recently from 5c9dc1f to 90c7bfb Compare June 13, 2025 17:13
@syurkevi syurkevi force-pushed the syurkevi/sdpa_fma_f16 branch from 90c7bfb to 51d3588 Compare June 13, 2025 18:05
@github-actions github-actions bot removed the component:tests Codeowner: @oneapi-src/onednn-arch label Jun 13, 2025
@syurkevi syurkevi force-pushed the syurkevi/sdpa_fma_f16 branch from 51d3588 to 4285c75 Compare June 13, 2025 18:13
@TaoLv
Copy link
Contributor

TaoLv commented Jun 14, 2025

Hi @syurkevi what does this mean from user perspective? Is it f16 acc automatically or users will need to turn on any flag to enable? With this, do we expect the same numerical behavior between micro-kernel implementation and matmul primitive based implementation? Thanks!

@syurkevi
Copy link
Contributor Author

Hi @syurkevi what does this mean from user perspective? Is it f16 acc automatically or users will need to turn on any flag to enable? With this, do we expect the same numerical behavior between micro-kernel implementation and matmul primitive based implementation? Thanks!

F16 support is now automatically available on MTL where it was previously unsupported. The accumulation mode matches the behavior of the systolic SDPA kernel (f32 acc) so results should match our current implementation.

@syurkevi
Copy link
Contributor Author

make test
disable benchdnn_all
enable benchdnn_graph
enable test_device_gpu
enable arch_gpu_xe-hpc
enable arch_gpu_xe-hpg-atsm
enable arch_gpu_xe-hpg-dg2
enable arch_gpu_xe-lp
enable arch_gpu_xe-lpg
enable arch_gpu_xe-lpg+
enable arch_gpu_xe2-hpg-bmg
enable arch_gpu_xe2-lpg

@syurkevi syurkevi force-pushed the syurkevi/sdpa_fma_f16 branch from 4285c75 to 6b68be7 Compare June 19, 2025 04:01
@syurkevi syurkevi force-pushed the syurkevi/sdpa_fma_f16 branch from 6b68be7 to de3cdf9 Compare June 19, 2025 04:04
@syurkevi
Copy link
Contributor Author

make test
disable benchdnn_all
enable benchdnn_graph
enable test_device_gpu
enable arch_gpu_xe-hpc
enable arch_gpu_xe-hpg-atsm
enable arch_gpu_xe-hpg-dg2
enable arch_gpu_xe-lp
enable arch_gpu_xe-lpg
enable arch_gpu_xe-lpg+
enable arch_gpu_xe2-hpg-bmg
enable arch_gpu_xe2-lpg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants