-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[V0 Deprecation] Remove placeholder attn #25510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[V0 Deprecation] Remove placeholder attn #25510
Conversation
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request removes the placeholder attention backend and the associated is_attention_free
flag, which is a good cleanup as it's no longer needed for Mamba models in V1. The changes are consistent across the modified files. However, I've found a broken test case that needs to be addressed to ensure the integrity of the test suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
vllm-project/vllm#25510 Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
vllm-project/vllm#25510 Signed-off-by: Chendi Xue <Chendi.Xue@intel.com> Signed-off-by: slokesha <slokeshappa@habana.ai>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
vllm-project/vllm#25510 Signed-off-by: Chendi Xue <Chendi.Xue@intel.com> Signed-off-by: Iryna Boiko <iboiko@habana.ai>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
Purpose
Remove placeholder attention backend. It is no longer needed for Mamba models in V1, since each mamba/linear attention layer has its own "real" attention backend.
Test Plan
Let's see if CI passes
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.