Skip to content

Conversation

NickLucche
Copy link
Collaborator

@NickLucche NickLucche commented Sep 29, 2025

This PR enables the transfer of KV caches with different shapes (last dim only here) for MLA models, and in particular for the new DeepseekV3-2, allowing to send/rcv its Indexer cache as well in a disaggregated setup.

It does so by extending block_len (NHD in a regular KV cache) to block_len_per_layer, allowing each layer to define its own "stride".
This approach has the potential of being re-used for dense models too, although for now this is only restricted to MLA, as an alignment with the ongoing HMA integration effort should be due in this case.

Related to #25101.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for variable KV cache shapes per layer in MLA models by replacing the scalar block_len with a list block_lens. The changes are primarily within nixl_connector.py and appear to correctly implement the intended feature. However, I've identified a critical issue in the calculation of remote_block_size when using FlashInfer with MLA models, which would lead to a failed assertion. I have provided a suggested fix for this issue. The rest of the changes are consistent and well-implemented.

@heheda12345 heheda12345 added this to the v0.11.0 Cherry Picks milestone Sep 29, 2025
Copy link
Collaborator

@heheda12345 heheda12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@heheda12345 heheda12345 enabled auto-merge (squash) September 29, 2025 22:36
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 29, 2025
NickLucche and others added 2 commits September 30, 2025 08:46
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
@NickLucche NickLucche force-pushed the nixl-mla-different-latent-dim branch from 32fc471 to 5d90305 Compare September 30, 2025 08:46
Signed-off-by: NickLucche <nlucches@redhat.com>
@heheda12345 heheda12345 merged commit 80608ba into vllm-project:main Sep 30, 2025
46 checks passed
simon-mo pushed a commit that referenced this pull request Oct 1, 2025
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
@princepride
Copy link
Contributor

Why the shape of KV caches is different in MLA, it seems that each layer's KV cache shape is same. https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp/blob/main/config.json

@NickLucche
Copy link
Collaborator Author

Check out the Indexer cache

pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025
…ject#25902)

Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kv-connector ready ONLY add when PR is ready to merge/full CI is needed v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants