[Core] Reuse created spec tokens lists to mitigate GC cost #28917

Jialin · 2025-11-18T07:13:26Z

Purpose

Reuse the create spec tokens lists in InputBatch to mitigate GC costs.

In the current implementation, InputBatch.spec_token_ids would be frequently replaced by tokens newly created in scheduler output. So we're keep introducing medium / long living lists in the system, which means high GC costs.

Test Plan & Test Result

CI signals

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

gemini-code-assist

Code Review

This pull request aims to reduce garbage collection costs by reusing spec_token_ids lists in InputBatch instead of creating new ones. The changes correctly replace list re-assignments with clear() and extend() operations in most places. However, I've found a critical bug in the condense method where the list reuse logic leads to data loss. My review includes a fix for this issue.

vllm/v1/worker/gpu_input_batch.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/v1/worker/gpu_input_batch.py

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

njhill

@Jialin sorry missed this before it was merged

vllm/v1/worker/gpu_input_batch.py

njhill · 2025-11-19T20:35:55Z

vllm/v1/worker/gpu_model_runner.py

+            self.input_batch.spec_token_ids[req_index].clear()
+            self.input_batch.spec_token_ids[req_index].extend(spec_token_ids)


I don't understand this one, how is it better to repopulate an existing list and then discard the other one. Surely better to avoid that extra work and just use the other one? Either way a list is going to get garbage collected..

Great question! The magic is hidden behind gc.freeze.

If we preallocate spec_token_ids lists, then run gc.freeze. The preallocate lists will NOT be GC tracked. And the spec_token_ids in this function is short living, majority of the cases, it would be gone due to reference counting, it will not be handled by gc.collect in most of the cases.

Let me know if you want more clarification :)

Thanks @Jialin, my general concern about these changes is that they sometimes come at the cost of added complexity and can negatively affect the readability/maintainability.

There is a balance where it's reasonable I think to give up minor perf benefits to keep the code simpler (of course case by case weighing magnitude of complexity vs magnitude of perf benefit).

This particular change for example I think is quite fragile, especially without comments explain why we are expending extra cycles here to optimize the object lifecycle in relation to GC. It's very likely someone could change this in future without being aware. I'm not sure whether it's realistic to enforce this (actually these lists will go away anyhow soon with MRV2).

@njhill Totally agree with you for the tradeoff. And @zhuohan123 actually raised similar concerns with my other PRs as well. And I totally aligned with it.

We had internal TPGS runs to justify the impact. But moving forward, I think I should provide more data which is accessible for open access to avoid such confusion. And I'll hold off other GC changes landing to OSS side, until I came up with e2e benchmark to show e2e wins (both latency and GC costs) to justify such changes.

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

Jialin · 2025-11-19T20:52:50Z

@Jialin sorry missed this before it was merged

No worry. Really really appreciate for your inputs!

…ect#28917) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

…ect#28917) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: LuminolT <lumischen01@gmail.com>

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: jiang1.li <jiang1.li@intel.com>

…ect#28917) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

[Core] Reuse created spec tokens lists to mitigate GC cost

3348f48

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

Jialin requested review from 22quinn and zhuohan123 November 18, 2025 07:13

mergify bot added the v1 label Nov 18, 2025

gemini-code-assist bot reviewed Nov 18, 2025

View reviewed changes

vllm/v1/worker/gpu_input_batch.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Nov 18, 2025

View reviewed changes

vllm/v1/worker/gpu_input_batch.py Outdated Show resolved Hide resolved

Jialin added 2 commits November 17, 2025 23:17

Fix errors

d0c915c

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

Preallocate spec tokens

87981ea

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

22quinn approved these changes Nov 19, 2025

View reviewed changes

22quinn enabled auto-merge (squash) November 19, 2025 05:40

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 19, 2025

Merge branch 'main' into spec

8ecc149

22quinn merged commit 3319a49 into vllm-project:main Nov 19, 2025
43 checks passed

njhill reviewed Nov 19, 2025

View reviewed changes

Jialin mentioned this pull request Nov 19, 2025

[Core] DO NOT trim spec token lists for future reuse #29031

Closed

5 tasks

Jialin added a commit to Jialin/vllm that referenced this pull request Nov 19, 2025

Address post-merge comments in vllm-project#28917

f9ceede

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

Victor49152 pushed a commit to Victor49152/vllm that referenced this pull request Nov 20, 2025

[Core] Reuse created spec tokens lists to mitigate GC cost (vllm-proj…

3d3a9cf

…ect#28917) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

LuminolT pushed a commit to LuminolT/vllm that referenced this pull request Nov 21, 2025

[Core] Reuse created spec tokens lists to mitigate GC cost (vllm-proj…

249026b

…ect#28917) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: LuminolT <lumischen01@gmail.com>

bigPYJ1151 pushed a commit that referenced this pull request Nov 25, 2025

[Core] Reuse created spec tokens lists to mitigate GC cost (#28917)

20e8e88

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: jiang1.li <jiang1.li@intel.com>

bringlein pushed a commit to bringlein/vllm that referenced this pull request Nov 26, 2025

[Core] Reuse created spec tokens lists to mitigate GC cost (vllm-proj…

5a3278d

…ect#28917) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Core] Reuse created spec tokens lists to mitigate GC cost #28917

[Core] Reuse created spec tokens lists to mitigate GC cost #28917

Uh oh!

Jialin commented Nov 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

njhill left a comment

Uh oh!

Uh oh!

njhill Nov 19, 2025

Uh oh!

Jialin Nov 19, 2025 •

edited

Loading

Uh oh!

njhill Nov 20, 2025

Uh oh!

Jialin Nov 20, 2025

Uh oh!

Jialin commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		self.input_batch.spec_token_ids[req_index].clear()
		self.input_batch.spec_token_ids[req_index].extend(spec_token_ids)

Uh oh!

[Core] Reuse created spec tokens lists to mitigate GC cost #28917

[Core] Reuse created spec tokens lists to mitigate GC cost #28917

Uh oh!

Conversation

Jialin commented Nov 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan & Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

njhill Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Jialin Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

njhill Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Jialin Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Jialin commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Jialin commented Nov 18, 2025 •

edited by github-actions bot

Loading

Jialin Nov 19, 2025 •

edited

Loading