-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Improve output when failing json.loads() on structured output test #25483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve output when failing json.loads() on structured output test #25483
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to resolve a flaky test in structured output generation by introducing a _load_json
helper for string cleaning, adding more robust error handling, and making test prompts and schemas more specific. The approach is sound and the changes are mostly consistent. However, I've identified one instance where the new helper function was not used, which contradicts the PR's goal of consistent and robust JSON parsing. My feedback focuses on correcting this inconsistency to ensure the flakiness is fully resolved.
@@ -446,7 +461,13 @@ def test_structured_output( | |||
generated_text = output.outputs[0].text | |||
assert generated_text is not None | |||
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") | |||
output_json = json.loads(generated_text) | |||
try: | |||
output_json = json.loads(generated_text.strip()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with the changes in Test 1 (line 161) and Test 9 (line 415), and to fully resolve the flakiness this PR targets, this should be updated to use the _load_json
helper function. The current implementation calls json.loads()
directly, bypassing the new character cleaning logic in _load_json
which is essential for handling edge cases with certain backends like lm-format-enforcer
.
output_json = json.loads(generated_text.strip()) | |
output_json = _load_json(generated_text, backend) |
4f58643
to
92a0120
Compare
s = re.sub(r'[\x00-\x1F\x7F-\xFF]', '', s) | ||
if backend == "xgrammar": | ||
# xgrammar specific workarounds | ||
# https://github.com/mlc-ai/xgrammar/issues/286 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bug is fixed now, so we shouldn't need this hack anymore. I'd actually prefer removing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed instead!
# We parse the first JSON value because sometimes you get trailing garbage from xgrammar. | ||
# Example: '{"description": "A green string beanerneser green olive."}PRINT(\'B "}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to file a bug against xgrammar if this is happening.
Is it always xgrammar? Is it always the same test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SO! Here's the fun part. I had like 5-6 reproductions in a row, now... I can't get a reproduction 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it was always xgrammar.
My initial inspiration test to look into it was here: https://buildkite.com/vllm/ci/builds/31698#019969c7-4605-4d11-b894-973260ba0898/203-3827
And then, here's one of the tests where I saw that trailing garbage: https://buildkite.com/vllm/ci/builds/32069#0199772b-d466-42a4-aca9-fbdbe4dd0f92/165-1122
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it was always xgrammar.
My initial inspiration test to look into it was here: https://buildkite.com/vllm/ci/builds/31698#019969c7-4605-4d11-b894-973260ba0898/203-3827
This one is lm-format-enforcer instead of xgrammar
v1/entrypoints/llm/test_struct_output_generate.py::test_structured_output[mistralai/Ministral-8B-Instruct-2410-lm-format-enforcer-auto-None] - json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 25 (char 24)
And then, here's one of the tests where I saw that trailing garbage: https://buildkite.com/vllm/ci/builds/32069#0199772b-d466-42a4-aca9-fbdbe4dd0f92/165-1122
not good! definitely a bug -- either in xgrammar or vllm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually: Pivoting on this pull request, going to just improve the output.
92a0120
to
1d9191e
Compare
see also: #24402 |
1f9aed8
to
77f3191
Compare
e3437b7
to
62e2afc
Compare
@@ -85,9 +85,6 @@ def _load_json(s: str, backend: str) -> str: | |||
if backend != "xgrammar": | |||
return json.loads(s) | |||
|
|||
# xgrammar specific workarounds | |||
# https://github.com/mlc-ai/xgrammar/issues/286 | |||
s = re.sub(r'[\x00-\x1F\x7F-\xFF]', '', s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole method can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
62e2afc
to
26d5f6e
Compare
Head branch was pushed to by a user without write access
26d5f6e
to
e640dfe
Compare
pushed fixes for pre-commit |
To help output in diagnosing flakey PR in issue vllm-project#24402 Signed-off-by: dougbtv <dosmith@redhat.com>
e640dfe
to
fc83ed4
Compare
and deleted my own sign-off in the commit message, re-added. |
…llm-project#25483) Signed-off-by: dougbtv <dosmith@redhat.com>
…25483) Signed-off-by: dougbtv <dosmith@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
Improve output when failing json.loads() on structured output
Purpose
Adds further output in diagnosing #24402
Test Plan
Expect a clean test of "v1 test entrypoints" -- but expect better output if we hit the failure mode.
Test Result
Logging output only build: https://buildkite.com/vllm/ci/builds/32148