-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
[Core] Reuse created spec tokens lists to mitigate GC cost #28917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to reduce garbage collection costs by reusing spec_token_ids lists in InputBatch instead of creating new ones. The changes correctly replace list re-assignments with clear() and extend() operations in most places. However, I've found a critical bug in the condense method where the list reuse logic leads to data loss. My review includes a fix for this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
njhill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jialin sorry missed this before it was merged
| self.input_batch.spec_token_ids[req_index].clear() | ||
| self.input_batch.spec_token_ids[req_index].extend(spec_token_ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this one, how is it better to repopulate an existing list and then discard the other one. Surely better to avoid that extra work and just use the other one? Either way a list is going to get garbage collected..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question! The magic is hidden behind gc.freeze.
If we preallocate spec_token_ids lists, then run gc.freeze. The preallocate lists will NOT be GC tracked. And the spec_token_ids in this function is short living, majority of the cases, it would be gone due to reference counting, it will not be handled by gc.collect in most of the cases.
Let me know if you want more clarification :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Jialin, my general concern about these changes is that they sometimes come at the cost of added complexity and can negatively affect the readability/maintainability.
There is a balance where it's reasonable I think to give up minor perf benefits to keep the code simpler (of course case by case weighing magnitude of complexity vs magnitude of perf benefit).
This particular change for example I think is quite fragile, especially without comments explain why we are expending extra cycles here to optimize the object lifecycle in relation to GC. It's very likely someone could change this in future without being aware. I'm not sure whether it's realistic to enforce this (actually these lists will go away anyhow soon with MRV2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@njhill Totally agree with you for the tradeoff. And @zhuohan123 actually raised similar concerns with my other PRs as well. And I totally aligned with it.
We had internal TPGS runs to justify the impact. But moving forward, I think I should provide more data which is accessible for open access to avoid such confusion. And I'll hold off other GC changes landing to OSS side, until I came up with e2e benchmark to show e2e wins (both latency and GC costs) to justify such changes.
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
No worry. Really really appreciate for your inputs! |
…ect#28917) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
…ect#28917) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: LuminolT <lumischen01@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: jiang1.li <jiang1.li@intel.com>
…ect#28917) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Purpose
Reuse the create spec tokens lists in InputBatch to mitigate GC costs.
In the current implementation, InputBatch.spec_token_ids would be frequently replaced by tokens newly created in scheduler output. So we're keep introducing medium / long living lists in the system, which means high GC costs.
Test Plan & Test Result
CI signals
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.