Remove V0 attention backends #25351

WoosukKwon · 2025-09-21T17:16:34Z

Summary

restrict CUDA and ROCm attention backend selection to the V1 engine and raise errors when a removed V0 backend is requested
update runtime helpers and tests to consume the V1 attention metadata/backends and add local ALiBi utilities for kernel tests
skip V0-only model initialization coverage and drop the legacy attention selector tests that depended on V0 backends

Testing

pytest tests/kernels/attention/test_prefix_prefill.py::test_contexted_kv_attention --maxfail=1 (fails: missing tblib dependency)

https://chatgpt.com/codex/tasks/task_b_68d02a781064832dacffd35e5f979636

gemini-code-assist

Code Review

This pull request undertakes a significant refactoring to remove the legacy V0 attention backends, streamlining the codebase to exclusively use V1 backends. The changes are comprehensive, involving the deletion of numerous V0 backend files, updating tests to align with V1 backends, and modifying platform-specific code to raise errors for requests of removed backends. This consolidation simplifies the attention backend infrastructure. The changes appear consistent and well-executed, with the addition of RuntimeError for V0 backend requests providing a clear failure mode for this breaking change.

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: charlifu <charlifu@amd.com>

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: yewentao256 <zhyanwentao@126.com>

WoosukKwon requested a review from sighingnow as a code owner September 21, 2025 17:16

WoosukKwon added the codex label Sep 21, 2025 — with ChatGPT Codex Connector

WoosukKwon requested review from DarkLight1337, ywang96, mgoin, tlrmchlsmth, yewentao256, simon-mo, youkaichao, robertgshaw2-redhat, houseroad, hmellor, ProExpertProg, NickLucche, ApostaC, tdoublep and LucasWilkinson as code owners September 21, 2025 17:16

mergify bot added documentation Improvements or additions to documentation deepseek Related to DeepSeek models qwen Related to Qwen models rocm Related to AMD ROCm kv-connector labels Sep 21, 2025

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 21, 2025

gemini-code-assist bot reviewed Sep 21, 2025

View reviewed changes

hmellor added this to V0 Deprecation Sep 21, 2025

hmellor moved this to In Progress in V0 Deprecation Sep 21, 2025

WoosukKwon added 2 commits September 21, 2025 15:39

redo

c56c8e9

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

Merge branch 'main' into codex/remove-v0-attention-backends-and-tests

bb26845

WoosukKwon force-pushed the codex/remove-v0-attention-backends-and-tests branch from 9e88a2e to bb26845 Compare September 21, 2025 22:40

restore tests

196e0c8

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

WoosukKwon merged commit bc6e542 into main Sep 21, 2025
12 of 19 checks passed

WoosukKwon deleted the codex/remove-v0-attention-backends-and-tests branch September 21, 2025 23:03

github-project-automation bot moved this from In Progress to Done in V0 Deprecation Sep 21, 2025

kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Sep 22, 2025

Remove V0 attention backends (vllm-project#25351)

6217239

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

xuebwang-amd mentioned this pull request Sep 22, 2025

[ROCm][Quantization] extend AMD Quark to support mixed-precision quantized model #24239

Open

5 tasks

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

Remove V0 attention backends (vllm-project#25351)

b837818

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025

Remove V0 attention backends (vllm-project#25351)

04aabb9

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: charlifu <charlifu@amd.com>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

Remove V0 attention backends (#25351)

a815d82

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: yewentao256 <zhyanwentao@126.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Remove V0 attention backends #25351

Remove V0 attention backends #25351

Uh oh!

WoosukKwon commented Sep 21, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Remove V0 attention backends #25351

Remove V0 attention backends #25351

Uh oh!

Conversation

WoosukKwon commented Sep 21, 2025

Summary

Testing

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!