Skip to content

Conversation

Jialin
Copy link
Contributor

@Jialin Jialin commented Sep 16, 2025

Purpose

Majority of the the Generation 0 collect objects are related to KVCacheBlocks, and we found most of them are actually referring to empty blocks.
Screenshot 2025-09-28 at 7 52 23 AM

In this PR, we mainly made 2 changes

  • change KVCacheBlocks.blocks from tuple[list] to tuple[tuple] to indicate immutable after creating
  • pre-construct a empty KVCacheBlock member variable in KVCacheManager for reuse (in order to avoid GC)

Test Plan & Test Result

Model: facebook/opt-125m
Prefill-Heavy workload: Input 2000, Output 48
Decode-Heavy workload: Input 48, Output 2000

We could see, GC costs are significantly smaller with the PR (especially for decode-heavy workload). It could roughly convert to 3-4% throughput improvements.
Screenshot 2025-09-28 at 7 41 05 AM


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added frontend multi-modality Related to multi-modality (#4194) v1 tpu Related to Google TPUs and removed tpu Related to Google TPUs labels Sep 16, 2025
@Jialin Jialin changed the title Gc none block [Core] Replace empty list with None in KVCacheBlocks for GC optimization Sep 16, 2025
@Jialin Jialin marked this pull request as ready for review September 16, 2025 14:14
@Jialin Jialin requested a review from heheda12345 as a code owner September 16, 2025 14:14
@Jialin
Copy link
Contributor Author

Jialin commented Sep 16, 2025

Resolve #24321

@Jialin
Copy link
Contributor Author

Jialin commented Sep 19, 2025

Discussed with @heheda12345 offline. The main discussion point is to come up with solutions to minimize the code signature changes and avoid if else:

  • A potential solution would be let the allocate_new_blocks interface to returned a static KVCacheBlocks contains ImmutableEmptyList (where ImmutableEmptyList extends list with all mutations got removed). In this way, all signatures are kept as is, and all object allocations around empty KVCacheBlocks are removed.

Change the PR to draft before addressing the concern.

@Jialin Jialin marked this pull request as draft September 19, 2025 18:50
@Jialin Jialin marked this pull request as ready for review September 28, 2025 14:55
Copy link
Collaborator

@heheda12345 heheda12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Can't imagine it can be implemented in such a clean way.

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Jialin

Comment on lines 412 to 417
if any(blocks):
# Only create new KVCacheBlocks for non-empty blocks
return KVCacheBlocks(
tuple(tuple(block_per_group) for block_per_group in blocks))
else:
return self.empty_block_list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: simplify

Suggested change
if any(blocks):
# Only create new KVCacheBlocks for non-empty blocks
return KVCacheBlocks(
tuple(tuple(block_per_group) for block_per_group in blocks))
else:
return self.empty_block_list
if not any(blocks):
return self.empty_block_list
# Only create new KVCacheBlocks for non-empty blocks
return KVCacheBlocks(
tuple(tuple(block_per_group) for block_per_group in blocks))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also per other comment, in the non-empty case we should maybe just return KVCacheBlocks(blocks)? And allow the sub block lists to be tuple or list ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick discussion with @heheda12345 2 weeks ago about the design. As we're creating new precomputed empty blocks, we would love to claim immutability of KVCacheBlocks in order to avoid mis-operations (e.g. append items on the empty KVCacheBlocks or additional if else checks on types or existence (our first approach was replacing tuple as optional[tuple]).

So if possible, we would love to prefer keeping it as tuple.

Copy link

mergify bot commented Oct 2, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Jialin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 2, 2025
Jialin added 3 commits October 3, 2025 11:36
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Jialin added 2 commits October 3, 2025 12:06
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
@heheda12345
Copy link
Collaborator

@Jialin can you rebase the PR?
See https://vllm-dev.slack.com/archives/C07R5Q1Q2BB/p1759663228844749 for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
frontend multi-modality Related to multi-modality (#4194) v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants