[Core] Use KVCacheBlock as much as possible instead of dict[block_id, KVCacheBlock] #24830

Jialin · 2025-09-14T13:44:23Z

Purpose

dict[block_id, KVCacheBlock] is the currently the top GC objects, however, most of the time, each BlockHashWithGroupId simply map to a single KVCacheBlock. So we replace dict[block_id, KVCacheBlock] with Union[KVCacheBlock, dict[block_id, KVCacheBlock]] in block cache, and use KVCacheBlock as much as possible to reduce the GC overhead.

Test Plan & Test Result

Patch #24829 locally with a breakdown analysis, we could see that the GC cost is left shifted as expected.

E2E Metrics

Model: facebook/opt-125m
Prefill-heavy work: prefill 2000 decode 48
Decode-heavy work: prefill 48 decode 2000
max-concurrency:

|Request-Rate|Before|After|
|Prefill-heavy|212.98|217.40 (+2%)|
|Decode-heavy|13.89|14.10 (+1.5%)|

facebook/opt-125m decode heavy workload: prefill 48 decode 2000

GC cost reduced by 17% (per histogram of GC elapsed time)

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

heheda12345

Can you explain the metric in your figure and table? And what is the e2e speedup?

vllm/v1/core/block_pool.py

Jialin · 2025-09-14T22:37:49Z

Can you explain the metric in your figure and table? And what is the e2e speedup?

Currently away from keyboard, thanks for your reviews. Will address them later in the day.

The metrics are distributions of GC elapsed time. And will provide more data for e2e speedup.

Jialin · 2025-09-15T18:51:00Z

Can you explain the metric in your figure and table? And what is the e2e speedup?

Done. Added more metric descriptions in the summary and also added e2e speedup measurements.

vllm/v1/core/block_pool.py

Jialin · 2025-09-16T16:18:42Z

Resolve #24321

vllm/v1/core/block_pool.py

heheda12345

LGTM. Only a small nit.

vllm/v1/core/block_pool.py

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

heheda12345

LGTM! Thanks very much.

… KVCacheBlock] (vllm-project#24830) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

… KVCacheBlock] (#24830) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

mergify bot added the v1 label Sep 14, 2025

Jialin marked this pull request as ready for review September 14, 2025 13:53

Jialin requested a review from heheda12345 as a code owner September 14, 2025 13:53

heheda12345 reviewed Sep 14, 2025

View reviewed changes

Jialin force-pushed the single_block branch from c21f4b4 to 04f3b45 Compare September 15, 2025 17:48

njhill reviewed Sep 15, 2025

View reviewed changes

vllm/v1/core/block_pool.py Outdated Show resolved Hide resolved

vllm/v1/core/block_pool.py Outdated Show resolved Hide resolved

vllm/v1/core/block_pool.py Show resolved Hide resolved

heheda12345 reviewed Sep 16, 2025

View reviewed changes

vllm/v1/core/block_pool.py Outdated Show resolved Hide resolved

vllm/v1/core/block_pool.py Outdated Show resolved Hide resolved

vllm/v1/core/block_pool.py Show resolved Hide resolved

njhill reviewed Sep 16, 2025

View reviewed changes

vllm/v1/core/block_pool.py Outdated Show resolved Hide resolved

Jialin mentioned this pull request Sep 16, 2025

[Core] Replace empty list with None in KVCacheBlocks for GC optimization #24964

Open

5 tasks

njhill reviewed Sep 16, 2025

View reviewed changes

vllm/v1/core/block_pool.py Outdated Show resolved Hide resolved

Jialin requested review from WoosukKwon, robertgshaw2-redhat, ywang96, comaniac and alexm-redhat as code owners September 16, 2025 22:38

heheda12345 reviewed Sep 18, 2025

View reviewed changes

vllm/v1/core/block_pool.py Show resolved Hide resolved

Jialin added 8 commits September 19, 2025 05:15

BlockLookupCache

6e21377

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

Add unit test

d7be251

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

Address comments

a11c930

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

Fix unit tests

f7bfdaa

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

Address leftover comments

07fcabf

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

pop then insert back if needed

277ed13

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

Cleaner pop approach

a63e86c

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

Add assertion followup

018644e

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

Jialin force-pushed the single_block branch from 225100b to 018644e Compare September 19, 2025 12:24

Jialin requested a review from ApostaC as a code owner September 19, 2025 12:24

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 19, 2025

Fix unit tests

d7d5b88

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

heheda12345 approved these changes Sep 23, 2025

View reviewed changes

heheda12345 merged commit 4f8c4b8 into vllm-project:main Sep 23, 2025
40 checks passed

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Core] Use KVCacheBlock as much as possible instead of dict[block_id,…

6cb9682

… KVCacheBlock] (vllm-project#24830) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[Core] Use KVCacheBlock as much as possible instead of dict[block_id,…

0c11617

… KVCacheBlock] (#24830) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

Uh oh!

[Core] Use KVCacheBlock as much as possible instead of dict[block_id, KVCacheBlock] #24830

[Core] Use KVCacheBlock as much as possible instead of dict[block_id, KVCacheBlock] #24830

Uh oh!

Conversation

Jialin commented Sep 14, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan & Test Result

E2E Metrics

facebook/opt-125m decode heavy workload: prefill 48 decode 2000

Uh oh!

heheda12345 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jialin commented Sep 14, 2025

Uh oh!

Jialin commented Sep 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jialin commented Sep 16, 2025

Uh oh!

Uh oh!

heheda12345 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

heheda12345 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Jialin commented Sep 14, 2025 •

edited by github-actions bot

Loading