Skip to content

Conversation

@Jialin
Copy link
Collaborator

@Jialin Jialin commented Nov 18, 2025

Purpose

Reuse the create spec tokens lists in InputBatch to mitigate GC costs.

In the current implementation, InputBatch.spec_token_ids would be frequently replaced by tokens newly created in scheduler output. So we're keep introducing medium / long living lists in the system, which means high GC costs.

Test Plan & Test Result

CI signals


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
@Jialin Jialin requested review from 22quinn and zhuohan123 November 18, 2025 07:13
@mergify mergify bot added the v1 label Nov 18, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to reduce garbage collection costs by reusing spec_token_ids lists in InputBatch instead of creating new ones. The changes correctly replace list re-assignments with clear() and extend() operations in most places. However, I've found a critical bug in the condense method where the list reuse logic leads to data loss. My review includes a fix for this issue.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
@22quinn 22quinn enabled auto-merge (squash) November 19, 2025 05:40
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 19, 2025
@22quinn 22quinn merged commit 3319a49 into vllm-project:main Nov 19, 2025
43 checks passed
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jialin sorry missed this before it was merged

Comment on lines +895 to +896
self.input_batch.spec_token_ids[req_index].clear()
self.input_batch.spec_token_ids[req_index].extend(spec_token_ids)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this one, how is it better to repopulate an existing list and then discard the other one. Surely better to avoid that extra work and just use the other one? Either way a list is going to get garbage collected..

Copy link
Collaborator Author

@Jialin Jialin Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! The magic is hidden behind gc.freeze.

If we preallocate spec_token_ids lists, then run gc.freeze. The preallocate lists will NOT be GC tracked. And the spec_token_ids in this function is short living, majority of the cases, it would be gone due to reference counting, it will not be handled by gc.collect in most of the cases.

Let me know if you want more clarification :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Jialin, my general concern about these changes is that they sometimes come at the cost of added complexity and can negatively affect the readability/maintainability.

There is a balance where it's reasonable I think to give up minor perf benefits to keep the code simpler (of course case by case weighing magnitude of complexity vs magnitude of perf benefit).

This particular change for example I think is quite fragile, especially without comments explain why we are expending extra cycles here to optimize the object lifecycle in relation to GC. It's very likely someone could change this in future without being aware. I'm not sure whether it's realistic to enforce this (actually these lists will go away anyhow soon with MRV2).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@njhill Totally agree with you for the tradeoff. And @zhuohan123 actually raised similar concerns with my other PRs as well. And I totally aligned with it.

We had internal TPGS runs to justify the impact. But moving forward, I think I should provide more data which is accessible for open access to avoid such confusion. And I'll hold off other GC changes landing to OSS side, until I came up with e2e benchmark to show e2e wins (both latency and GC costs) to justify such changes.

Jialin added a commit to Jialin/vllm that referenced this pull request Nov 19, 2025
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
@Jialin
Copy link
Collaborator Author

Jialin commented Nov 19, 2025

@Jialin sorry missed this before it was merged

No worry. Really really appreciate for your inputs!

Victor49152 pushed a commit to Victor49152/vllm that referenced this pull request Nov 20, 2025
…ect#28917)

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
LuminolT pushed a commit to LuminolT/vllm that referenced this pull request Nov 21, 2025
…ect#28917)

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: LuminolT <lumischen01@gmail.com>
bigPYJ1151 pushed a commit that referenced this pull request Nov 25, 2025
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
bringlein pushed a commit to bringlein/vllm that referenced this pull request Nov 26, 2025
…ect#28917)

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants