-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[gpt-oss] Add IncompleteDetails to ResponsesRepsonse #24561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[gpt-oss] Add IncompleteDetails to ResponsesRepsonse #24561
Conversation
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@meta.com>
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Andrew Xia <axia@meta.com>
1fe6099
to
1718d05
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the test plan! cc: @heheda12345 @houseroad
# TODO: implement the other reason for incomplete_details, | ||
# which is content_filter | ||
# incomplete_details = IncompleteDetails(reason='content_filter') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's missing from current logic btw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think VLLM baseline implementation supports content filter as an abort reason currently: https://github.com/vllm-project/vllm/blob/main/vllm/v1/request.py#L206
vllm/entrypoints/harmony_utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the parser still has messages (ie if the generator got cut abruptly, this should be incomplete and not completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense please move your comment to the code :)
# TODO: implement the other reason for incomplete_details, | ||
# which is content_filter | ||
# incomplete_details = IncompleteDetails(reason='content_filter') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think VLLM baseline implementation supports content filter as an abort reason currently: https://github.com/vllm-project/vllm/blob/main/vllm/v1/request.py#L206
Signed-off-by: Andrew Xia <axia@meta.com>
@qandrew @houseroad this needs a rebase over the structured output disable commit |
Signed-off-by: bbartels <benjamin@bartels.dev> [gpt-oss] Add IncompleteDetails to ResponsesRepsonse (vllm-project#24561) Signed-off-by: Andrew Xia <axia@meta.com> [gpt-oss][1a] create_responses stream outputs BaseModel type, api server is SSE still (vllm-project#24759) Signed-off-by: Andrew Xia <axia@meta.com> [Performance] Remove redundant clone() calls in cutlass_mla (vllm-project#24891) [Bug] Fix Cutlass Scaled MM Compilation Error (vllm-project#24887) Signed-off-by: yewentao256 <zhyanwentao@126.com> [ci] fix wheel names for arm wheels (vllm-project#24898) Signed-off-by: simon-mo <simon.mo@hey.com> [Tests] fix initialization of kv hash in tests (vllm-project#24273) Signed-off-by: Mickael Seznec <mickael@mistral.ai> [Compile] Fix noop_elimination pass and add tests for noop_elimination (vllm-project#24880) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Propagate entire tokens to connector for resumed preemptions Signed-off-by: Qier Li <kevin44036@gmail.com> Fix pre-commit Signed-off-by: Qier Li <kevin44036@gmail.com> Rename field and nullify empty lists Signed-off-by: Qier Li <kevin44036@gmail.com> Update vllm/v1/core/sched/scheduler.py Co-authored-by: Nick Hill <nhill@redhat.com> Signed-off-by: Qier Li <kevin44036@gmail.com> Add unit test for preemption resumption Signed-off-by: Qier Li <kevin44036@gmail.com>
Purpose
I do realize there are other things we need to fix for IncompleteDetails; for now we'll guarantee that IF incomplete details is outputted, it is true. We'll still need to test flows like if the generator was interrupted abruptly, and output the reason why. And I don't think GPT-OSS does content filters for now but that is another IncompleteDetails implementation.
Test Plan
Server
Client
Client, where we don't hit max token limit
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.