-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[V1][Attention] Split triton_attn in triton-only and rocm specific backends #24648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[V1][Attention] Split triton_attn in triton-only and rocm specific backends #24648
Conversation
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR splits the triton_attn
backend into triton_attn
and rocm_attn
to facilitate easier maintenance and platform-specific adaptations. The review focuses on identifying potential issues related to correctness and maintainability, particularly in the new rocm_attn.py
file and modifications to platforms/rocm.py
.
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
"RocmAttentionImpl") | ||
|
||
self.fp8_dtype = current_platform.fp8_dtype() | ||
self.force_prefill_decode_attn = \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for simplicity we can remove this variable altogether, to not have 2 identical unified attention codepaths
The logic (for ROCm) could be just:
if backend == ROCM_ATTN_VLLM_V1:
use rocm_attn with split attention (or aiter)
else:
use triton_attn with unified attention
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be a good idea, but maybe as a follow-up PR? I think the focus of this PR should be to enable such optimizations easily later.
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…ckends (vllm-project#24648) Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
…ckends (vllm-project#24648) Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com> Signed-off-by: charlifu <charlifu@amd.com>
…ckends (#24648) Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
Purpose
This PR splits the
triton_attn
backend of V1 into two backends: One triton-only and platform-independenttriton_attn
and one rocm-specificrocm_attn
, including aiter kernels.This facilitates easier maintenance of both backends. Also, adaptations to the vllm-internal triton backend then don't need to ensure full compatibility with the external library aiter. For example, the standardization of kv-cache layouts in #21624 was blocked by this (and is also solved in this PR).
The selection of the backends is updated to select the new rocm_attn in case any of the aiter-specific variables are set.
Both backends still use the same triton kernels, but the logic to select them is now separated.
CC: @tdoublep @SageMoore @LucasWilkinson @jvlunteren
Test Plan
Correctness tests and manual comparison of the file diff between old and new split backends.
Test Result
Correctness
on main, H100:
with this PR on H100:
on main, MI300:
with this PR on MI300:
to test the "new" rocm backend, on MI300:
git diff
Initially, I renamed
triton_attn.py
torocm_attn.py
withgit mv...
. However, github now showsrocm_attn.py
as completely new file. However, the difference to the triton_attn in main is quite minimal, they are only renamings.So, to facilitate the PR review, I created the diff manually:
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.