[NIXL] Add support for MLA caches with different latent dim #25902

NickLucche · 2025-09-29T19:56:55Z

This PR enables the transfer of KV caches with different shapes (last dim only here) for MLA models, and in particular for the new DeepseekV3-2, allowing to send/rcv its Indexer cache as well in a disaggregated setup.

It does so by extending block_len (NHD in a regular KV cache) to block_len_per_layer, allowing each layer to define its own "stride".
This approach has the potential of being re-used for dense models too, although for now this is only restricted to MLA, as an alignment with the ongoing HMA integration effort should be due in this case.

Related to #25101.

gemini-code-assist

Code Review

This pull request introduces support for variable KV cache shapes per layer in MLA models by replacing the scalar block_len with a list block_lens. The changes are primarily within nixl_connector.py and appear to correctly implement the intended feature. However, I've identified a critical issue in the calculation of remote_block_size when using FlashInfer with MLA models, which would lead to a failed assertion. I have provided a suggested fix for this issue. The rest of the changes are consistent and well-implemented.

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

heheda12345

LGTM!

Signed-off-by: NickLucche <nlucches@redhat.com>

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

Signed-off-by: NickLucche <nlucches@redhat.com>

Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: simon-mo <simon.mo@hey.com>

princepride · 2025-10-02T07:48:00Z

Why the shape of KV caches is different in MLA, it seems that each layer's KV cache shape is same. https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp/blob/main/config.json

NickLucche · 2025-10-02T07:59:07Z

Check out the Indexer cache

…ject#25902) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>

Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

NickLucche requested a review from ApostaC as a code owner September 29, 2025 19:56

mergify bot added the kv-connector label Sep 29, 2025

NickLucche requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac, alexm-redhat and heheda12345 as code owners September 29, 2025 19:58

mergify bot added the v1 label Sep 29, 2025

gemini-code-assist bot reviewed Sep 29, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Show resolved Hide resolved

heheda12345 added this to the v0.11.0 Cherry Picks milestone Sep 29, 2025

heheda12345 approved these changes Sep 29, 2025

View reviewed changes

heheda12345 enabled auto-merge (squash) September 29, 2025 22:36

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 29, 2025

NickLucche and others added 2 commits September 30, 2025 08:46

init: nixl support

d378af8

Signed-off-by: NickLucche <nlucches@redhat.com>

fix unify kv cache spec

5d90305

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

NickLucche force-pushed the nixl-mla-different-latent-dim branch from 32fc471 to 5d90305 Compare September 30, 2025 08:46

update tests to use *per_layer values

7f9a353

Signed-off-by: NickLucche <nlucches@redhat.com>

heheda12345 merged commit 80608ba into vllm-project:main Sep 30, 2025
46 checks passed

xuechendi mentioned this pull request Sep 30, 2025

[FIX_FOR_VLLM_LATEST] fix issue introduced by PR25896 and comment out still failing tests vllm-project/vllm-gaudi#292

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[NIXL] Add support for MLA caches with different latent dim #25902

[NIXL] Add support for MLA caches with different latent dim #25902

Uh oh!

NickLucche commented Sep 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

heheda12345 left a comment

Uh oh!

Uh oh!

princepride commented Oct 2, 2025

Uh oh!

NickLucche commented Oct 2, 2025

Uh oh!

Uh oh!

Uh oh!

[NIXL] Add support for MLA caches with different latent dim #25902

[NIXL] Add support for MLA caches with different latent dim #25902

Uh oh!

Conversation

NickLucche commented Sep 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

heheda12345 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

princepride commented Oct 2, 2025

Uh oh!

NickLucche commented Oct 2, 2025

Uh oh!

Uh oh!

NickLucche commented Sep 29, 2025 •

edited by github-actions bot

Loading