Improve output when failing json.loads() on structured output test #25483

dougbtv · 2025-09-23T15:20:52Z

Improve output when failing json.loads() on structured output

Purpose

Adds further output in diagnosing #24402

Test Plan

Expect a clean test of "v1 test entrypoints" -- but expect better output if we hit the failure mode.

Test Result

Logging output only build: https://buildkite.com/vllm/ci/builds/32148

gemini-code-assist

Code Review

This pull request aims to resolve a flaky test in structured output generation by introducing a _load_json helper for string cleaning, adding more robust error handling, and making test prompts and schemas more specific. The approach is sound and the changes are mostly consistent. However, I've identified one instance where the new helper function was not used, which contradicts the PR's goal of consistent and robust JSON parsing. My feedback focuses on correcting this inconsistency to ensure the flakiness is fully resolved.

gemini-code-assist · 2025-09-23T15:22:16Z

tests/v1/entrypoints/llm/test_struct_output_generate.py

@@ -446,7 +461,13 @@ def test_structured_output(
        generated_text = output.outputs[0].text
        assert generated_text is not None
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
-        output_json = json.loads(generated_text)
+        try:
+            output_json = json.loads(generated_text.strip())


For consistency with the changes in Test 1 (line 161) and Test 9 (line 415), and to fully resolve the flakiness this PR targets, this should be updated to use the _load_json helper function. The current implementation calls json.loads() directly, bypassing the new character cleaning logic in _load_json which is essential for handling edge cases with certain backends like lm-format-enforcer.

Suggested change

output_json = json.loads(generated_text.strip())

output_json = _load_json(generated_text, backend)

russellb · 2025-09-23T19:08:26Z

tests/v1/entrypoints/llm/test_struct_output_generate.py

-    s = re.sub(r'[\x00-\x1F\x7F-\xFF]', '', s)
+    if backend == "xgrammar":
+        # xgrammar specific workarounds
+        # https://github.com/mlc-ai/xgrammar/issues/286


This bug is fixed now, so we shouldn't need this hack anymore. I'd actually prefer removing it.

removed instead!

russellb · 2025-09-23T19:09:01Z

tests/v1/entrypoints/llm/test_struct_output_generate.py

+        # We parse the first JSON value because sometimes you get trailing garbage from xgrammar.
+        # Example: '{"description": "A green string beanerneser green olive."}PRINT(\'B "}'


We need to file a bug against xgrammar if this is happening.

Is it always xgrammar? Is it always the same test?

SO! Here's the fun part. I had like 5-6 reproductions in a row, now... I can't get a reproduction 😅

But it was always xgrammar.

My initial inspiration test to look into it was here: https://buildkite.com/vllm/ci/builds/31698#019969c7-4605-4d11-b894-973260ba0898/203-3827

And then, here's one of the tests where I saw that trailing garbage: https://buildkite.com/vllm/ci/builds/32069#0199772b-d466-42a4-aca9-fbdbe4dd0f92/165-1122

But it was always xgrammar.

My initial inspiration test to look into it was here: https://buildkite.com/vllm/ci/builds/31698#019969c7-4605-4d11-b894-973260ba0898/203-3827

This one is lm-format-enforcer instead of xgrammar

v1/entrypoints/llm/test_struct_output_generate.py::test_structured_output[mistralai/Ministral-8B-Instruct-2410-lm-format-enforcer-auto-None] - json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 25 (char 24)

And then, here's one of the tests where I saw that trailing garbage: https://buildkite.com/vllm/ci/builds/32069#0199772b-d466-42a4-aca9-fbdbe4dd0f92/165-1122

not good! definitely a bug -- either in xgrammar or vllm

Actually: Pivoting on this pull request, going to just improve the output.

dougbtv · 2025-09-23T19:40:48Z

see also: #24402

russellb · 2025-09-23T20:06:44Z

tests/v1/entrypoints/llm/test_struct_output_generate.py

@@ -85,9 +85,6 @@ def _load_json(s: str, backend: str) -> str:
    if backend != "xgrammar":
        return json.loads(s)

-    # xgrammar specific workarounds
-    # https://github.com/mlc-ai/xgrammar/issues/286
-    s = re.sub(r'[\x00-\x1F\x7F-\xFF]', '', s)


This whole method can be removed

dougbtv · 2025-09-23T20:18:24Z

pushed fixes for pre-commit

To help output in diagnosing flakey PR in issue vllm-project#24402 Signed-off-by: dougbtv <dosmith@redhat.com>

dougbtv · 2025-09-23T20:21:00Z

and deleted my own sign-off in the commit message, re-added.

…llm-project#25483) Signed-off-by: dougbtv <dosmith@redhat.com>

…25483) Signed-off-by: dougbtv <dosmith@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

dougbtv requested review from mgoin, russellb and aarnphm as code owners September 23, 2025 15:20

mergify bot added structured-output v1 labels Sep 23, 2025

github-project-automation bot added this to Structured Output Sep 23, 2025

gemini-code-assist bot reviewed Sep 23, 2025

View reviewed changes

dougbtv force-pushed the investigate/struct-out-unterminated branch 3 times, most recently from 4f58643 to 92a0120 Compare September 23, 2025 18:57

russellb requested changes Sep 23, 2025

View reviewed changes

dougbtv force-pushed the investigate/struct-out-unterminated branch from 92a0120 to 1d9191e Compare September 23, 2025 19:29

dougbtv force-pushed the investigate/struct-out-unterminated branch 2 times, most recently from 1f9aed8 to 77f3191 Compare September 23, 2025 19:51

dougbtv changed the title ~~[prototype] fix(tests): resolve JSON parsing flake in structured output tests~~ Improve output when failing json.loads() on structured output Sep 23, 2025

dougbtv force-pushed the investigate/struct-out-unterminated branch 3 times, most recently from e3437b7 to 62e2afc Compare September 23, 2025 19:58

russellb requested changes Sep 23, 2025

View reviewed changes

dougbtv changed the title ~~Improve output when failing json.loads() on structured output~~ Improve output when failing json.loads() on structured output test Sep 23, 2025

dougbtv force-pushed the investigate/struct-out-unterminated branch from 62e2afc to 26d5f6e Compare September 23, 2025 20:10

russellb approved these changes Sep 23, 2025

View reviewed changes

russellb enabled auto-merge (squash) September 23, 2025 20:12

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 23, 2025

auto-merge was automatically disabled September 23, 2025 20:18
Head branch was pushed to by a user without write access

dougbtv force-pushed the investigate/struct-out-unterminated branch from 26d5f6e to e640dfe Compare September 23, 2025 20:18

Improve output when failing json.loads() on structured output

fc83ed4

To help output in diagnosing flakey PR in issue vllm-project#24402 Signed-off-by: dougbtv <dosmith@redhat.com>

dougbtv force-pushed the investigate/struct-out-unterminated branch from e640dfe to fc83ed4 Compare September 23, 2025 20:20

mgoin approved these changes Sep 24, 2025

View reviewed changes

mgoin merged commit 7ad5e50 into vllm-project:main Sep 24, 2025
19 checks passed

github-project-automation bot moved this to Done in Structured Output Sep 24, 2025

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

Improve output when failing json.loads() on structured output test (v…

95b8d95

…llm-project#25483) Signed-off-by: dougbtv <dosmith@redhat.com>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

Improve output when failing json.loads() on structured output test (#…

b50fa00

…25483) Signed-off-by: dougbtv <dosmith@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve output when failing json.loads() on structured output test #25483

Improve output when failing json.loads() on structured output test #25483

Uh oh!

dougbtv commented Sep 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 23, 2025

Uh oh!

russellb Sep 23, 2025

Uh oh!

dougbtv Sep 23, 2025

Uh oh!

russellb Sep 23, 2025

Uh oh!

dougbtv Sep 23, 2025

Uh oh!

dougbtv Sep 23, 2025

Uh oh!

russellb Sep 23, 2025

Uh oh!

dougbtv Sep 23, 2025

Uh oh!

dougbtv commented Sep 23, 2025

Uh oh!

russellb Sep 23, 2025

Uh oh!

dougbtv Sep 23, 2025

Uh oh!

dougbtv commented Sep 23, 2025

Uh oh!

dougbtv commented Sep 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

	output_json = json.loads(generated_text.strip())
	output_json = _load_json(generated_text, backend)

		# We parse the first JSON value because sometimes you get trailing garbage from xgrammar.
		# Example: '{"description": "A green string beanerneser green olive."}PRINT(\'B "}'

Uh oh!

Improve output when failing json.loads() on structured output test #25483

Improve output when failing json.loads() on structured output test #25483

Uh oh!

Conversation

dougbtv commented Sep 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dougbtv commented Sep 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dougbtv commented Sep 23, 2025

Uh oh!

dougbtv commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dougbtv commented Sep 23, 2025 •

edited by github-actions bot

Loading

dougbtv commented Sep 23, 2025 •

edited

Loading