Skip to content

Conversation

dougbtv
Copy link
Contributor

@dougbtv dougbtv commented Sep 23, 2025

Improve output when failing json.loads() on structured output

Purpose

Adds further output in diagnosing #24402

Test Plan

Expect a clean test of "v1 test entrypoints" -- but expect better output if we hit the failure mode.

Test Result

Logging output only build: https://buildkite.com/vllm/ci/builds/32148

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to resolve a flaky test in structured output generation by introducing a _load_json helper for string cleaning, adding more robust error handling, and making test prompts and schemas more specific. The approach is sound and the changes are mostly consistent. However, I've identified one instance where the new helper function was not used, which contradicts the PR's goal of consistent and robust JSON parsing. My feedback focuses on correcting this inconsistency to ensure the flakiness is fully resolved.

@@ -446,7 +461,13 @@ def test_structured_output(
generated_text = output.outputs[0].text
assert generated_text is not None
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
output_json = json.loads(generated_text)
try:
output_json = json.loads(generated_text.strip())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For consistency with the changes in Test 1 (line 161) and Test 9 (line 415), and to fully resolve the flakiness this PR targets, this should be updated to use the _load_json helper function. The current implementation calls json.loads() directly, bypassing the new character cleaning logic in _load_json which is essential for handling edge cases with certain backends like lm-format-enforcer.

Suggested change
output_json = json.loads(generated_text.strip())
output_json = _load_json(generated_text, backend)

@dougbtv dougbtv force-pushed the investigate/struct-out-unterminated branch 3 times, most recently from 4f58643 to 92a0120 Compare September 23, 2025 18:57
s = re.sub(r'[\x00-\x1F\x7F-\xFF]', '', s)
if backend == "xgrammar":
# xgrammar specific workarounds
# https://github.com/mlc-ai/xgrammar/issues/286
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug is fixed now, so we shouldn't need this hack anymore. I'd actually prefer removing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed instead!

Comment on lines 91 to 92
# We parse the first JSON value because sometimes you get trailing garbage from xgrammar.
# Example: '{"description": "A green string beanerneser green olive."}PRINT(\'B "}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to file a bug against xgrammar if this is happening.

Is it always xgrammar? Is it always the same test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SO! Here's the fun part. I had like 5-6 reproductions in a row, now... I can't get a reproduction 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it was always xgrammar.

My initial inspiration test to look into it was here: https://buildkite.com/vllm/ci/builds/31698#019969c7-4605-4d11-b894-973260ba0898/203-3827

And then, here's one of the tests where I saw that trailing garbage: https://buildkite.com/vllm/ci/builds/32069#0199772b-d466-42a4-aca9-fbdbe4dd0f92/165-1122

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it was always xgrammar.

My initial inspiration test to look into it was here: https://buildkite.com/vllm/ci/builds/31698#019969c7-4605-4d11-b894-973260ba0898/203-3827

This one is lm-format-enforcer instead of xgrammar

v1/entrypoints/llm/test_struct_output_generate.py::test_structured_output[mistralai/Ministral-8B-Instruct-2410-lm-format-enforcer-auto-None] - json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 25 (char 24)  

And then, here's one of the tests where I saw that trailing garbage: https://buildkite.com/vllm/ci/builds/32069#0199772b-d466-42a4-aca9-fbdbe4dd0f92/165-1122

not good! definitely a bug -- either in xgrammar or vllm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually: Pivoting on this pull request, going to just improve the output.

@dougbtv dougbtv force-pushed the investigate/struct-out-unterminated branch from 92a0120 to 1d9191e Compare September 23, 2025 19:29
@dougbtv
Copy link
Contributor Author

dougbtv commented Sep 23, 2025

see also: #24402

@dougbtv dougbtv force-pushed the investigate/struct-out-unterminated branch 2 times, most recently from 1f9aed8 to 77f3191 Compare September 23, 2025 19:51
@dougbtv dougbtv changed the title [prototype] fix(tests): resolve JSON parsing flake in structured output tests Improve output when failing json.loads() on structured output Sep 23, 2025
@dougbtv dougbtv force-pushed the investigate/struct-out-unterminated branch 3 times, most recently from e3437b7 to 62e2afc Compare September 23, 2025 19:58
@@ -85,9 +85,6 @@ def _load_json(s: str, backend: str) -> str:
if backend != "xgrammar":
return json.loads(s)

# xgrammar specific workarounds
# https://github.com/mlc-ai/xgrammar/issues/286
s = re.sub(r'[\x00-\x1F\x7F-\xFF]', '', s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole method can be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@dougbtv dougbtv changed the title Improve output when failing json.loads() on structured output Improve output when failing json.loads() on structured output test Sep 23, 2025
@dougbtv dougbtv force-pushed the investigate/struct-out-unterminated branch from 62e2afc to 26d5f6e Compare September 23, 2025 20:10
@russellb russellb enabled auto-merge (squash) September 23, 2025 20:12
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 23, 2025
auto-merge was automatically disabled September 23, 2025 20:18

Head branch was pushed to by a user without write access

@dougbtv dougbtv force-pushed the investigate/struct-out-unterminated branch from 26d5f6e to e640dfe Compare September 23, 2025 20:18
@dougbtv
Copy link
Contributor Author

dougbtv commented Sep 23, 2025

pushed fixes for pre-commit

To help output in diagnosing flakey PR in issue vllm-project#24402

Signed-off-by: dougbtv <dosmith@redhat.com>
@dougbtv dougbtv force-pushed the investigate/struct-out-unterminated branch from e640dfe to fc83ed4 Compare September 23, 2025 20:20
@dougbtv
Copy link
Contributor Author

dougbtv commented Sep 23, 2025

and deleted my own sign-off in the commit message, re-added.

@mgoin mgoin merged commit 7ad5e50 into vllm-project:main Sep 24, 2025
19 checks passed
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
…25483)

Signed-off-by: dougbtv <dosmith@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed structured-output v1
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants