Conversation
|
make test |
|
@petercad @syurkevi Is it possible to further extend this to 576 to support MLA? There is the same request for flash attention. |
48978d1 to
6778889
Compare
576 is close enough to 512 that the current configs may be applicable as is. I've pushed a test branch internally with this change if you'd like to try it out (I have some concerns wrt/correctness). If 576 is all that's needed, I'll perform some additional testing and bump the number in this PR. If we want even greater head sizes, at some point there will be too much register demands to be performant (or run at all depending on the hardware). Something like head_size=1024 will require more tuning+testing. |
|
@syurkevi Good to know it. Feel free to add it in another PR. I think 576 will be enough as for now. |
|
In
to address some coverity issues. Thanks! |
6778889 to
5e59499
Compare
Bad news here after some testing. |
|
make test |
Description
This PR enables a head size<=512 for the SDPA microkernel. Good configurations were profiled over PVC, BMG, LNL and DG2.
Most shapes with head size 512 still give good speedups vs a primitives based implementation. Even larger head sizes may still be beneficial, more investigation to follow.
Checklist
General
make testandmake test_benchdnn_*) pass locally for each commit?