Skip to content

Conversation

Jialin
Copy link
Contributor

@Jialin Jialin commented Sep 14, 2025

Purpose

dict[block_id, KVCacheBlock] is the currently the top GC objects, however, most of the time, each BlockHashWithGroupId simply map to a single KVCacheBlock. So we replace dict[block_id, KVCacheBlock] with Union[KVCacheBlock, dict[block_id, KVCacheBlock]] in block cache, and use KVCacheBlock as much as possible to reduce the GC overhead.

Test Plan & Test Result

Patch #24829 locally with a breakdown analysis, we could see that the GC cost is left shifted as expected.

E2E Metrics

Model: facebook/opt-125m
Prefill-heavy work: prefill 2000 decode 48
Decode-heavy work: prefill 48 decode 2000
max-concurrency:

|Request-Rate|Before|After|
|Prefill-heavy|212.98|217.40 (+2%)|
|Decode-heavy|13.89|14.10 (+1.5%)|

facebook/opt-125m decode heavy workload: prefill 48 decode 2000

GC cost reduced by 17% (per histogram of GC elapsed time)
Screenshot 2025-09-14 at 6 43 17 AM


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the v1 label Sep 14, 2025
@Jialin Jialin marked this pull request as ready for review September 14, 2025 13:53
@Jialin Jialin requested a review from heheda12345 as a code owner September 14, 2025 13:53
Copy link
Collaborator

@heheda12345 heheda12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the metric in your figure and table? And what is the e2e speedup?

@Jialin
Copy link
Contributor Author

Jialin commented Sep 14, 2025

Can you explain the metric in your figure and table? And what is the e2e speedup?

Currently away from keyboard, thanks for your reviews. Will address them later in the day.

The metrics are distributions of GC elapsed time. And will provide more data for e2e speedup.

@Jialin
Copy link
Contributor Author

Jialin commented Sep 15, 2025

Can you explain the metric in your figure and table? And what is the e2e speedup?

Done. Added more metric descriptions in the summary and also added e2e speedup measurements.

@Jialin
Copy link
Contributor Author

Jialin commented Sep 16, 2025

Resolve #24321

Copy link
Collaborator

@heheda12345 heheda12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only a small nit.

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
@Jialin Jialin requested a review from ApostaC as a code owner September 19, 2025 12:24
@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 19, 2025
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Copy link
Collaborator

@heheda12345 heheda12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks very much.

@heheda12345 heheda12345 merged commit 4f8c4b8 into vllm-project:main Sep 23, 2025
40 checks passed
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
… KVCacheBlock] (vllm-project#24830)

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
… KVCacheBlock] (#24830)

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants