Skip to content

Conversation

WoosukKwon
Copy link
Collaborator

Summary

  • restrict CUDA and ROCm attention backend selection to the V1 engine and raise errors when a removed V0 backend is requested
  • update runtime helpers and tests to consume the V1 attention metadata/backends and add local ALiBi utilities for kernel tests
  • skip V0-only model initialization coverage and drop the legacy attention selector tests that depended on V0 backends

Testing

  • pytest tests/kernels/attention/test_prefix_prefill.py::test_contexted_kv_attention --maxfail=1 (fails: missing tblib dependency)

https://chatgpt.com/codex/tasks/task_b_68d02a781064832dacffd35e5f979636

@mergify mergify bot added documentation Improvements or additions to documentation deepseek Related to DeepSeek models qwen Related to Qwen models rocm Related to AMD ROCm kv-connector labels Sep 21, 2025
@WoosukKwon WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 21, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request undertakes a significant refactoring to remove the legacy V0 attention backends, streamlining the codebase to exclusively use V1 backends. The changes are comprehensive, involving the deletion of numerous V0 backend files, updating tests to align with V1 backends, and modifying platform-specific code to raise errors for requests of removed backends. This consolidation simplifies the attention backend infrastructure. The changes appear consistent and well-executed, with the addition of RuntimeError for V0 backend requests providing a clear failure mode for this breaking change.

@hmellor hmellor moved this to In Progress in V0 Deprecation Sep 21, 2025
@WoosukKwon WoosukKwon force-pushed the codex/remove-v0-attention-backends-and-tests branch from 9e88a2e to bb26845 Compare September 21, 2025 22:40
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
@WoosukKwon WoosukKwon merged commit bc6e542 into main Sep 21, 2025
12 of 19 checks passed
@WoosukKwon WoosukKwon deleted the codex/remove-v0-attention-backends-and-tests branch September 21, 2025 23:03
@github-project-automation github-project-automation bot moved this from In Progress to Done in V0 Deprecation Sep 21, 2025
kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Sep 22, 2025
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: charlifu <charlifu@amd.com>
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codex deepseek Related to DeepSeek models documentation Improvements or additions to documentation kv-connector qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

1 participant