[Bug]: Broken Structured Output (Guided Decoding) with Qwen3 models when `enable_thinking=False`

### Your current environment

_Detailed env may not be needed for this issue_
vLLM 0.9.0
RTX A6000

Arguments:
```
--device cuda --served-model-name Qwen3-30B-A3B --quantization gptq_marlin --host 0.0.0.0 --port 8888 --max-model-len 32768 --gpu-memory-utilization 0.85 --disable-log-stats --enable-auto-tool-choice --tool-call-parser hermes --reasoning-parser qwen3
```

### 🐛 Describe the bug

When serving Qwen3 series model on OpenAI compatible server mode, if `enable_thinking` is `false` and we specified a `guided_json`. The output json will most likely not a valid json. It can have an extra '{' or '[' or have "```" in the beginning, and can even be complete gibberish in some cases.

However, if we switch `enable_thinking` to `true`, the model thinks and the output json will be valid.
Furthermore, if leave `enable_thinking` as `true` and we append "/no_think" manually to user prompt, the model doesn't think and output json is also valid.

If we straight up don't use any reasoning parser, the output json is also valid regardless of `enable_thinking` setting.

Reproducible on both `Qwen3-32B-INT8` and `Qwen3-30B-A3B-INT4` models. Both `xgrammar` and `guidance` backend are tested.

Minimum code to reproduce:
```python
def reproduce_qwen3_parser_bug(text: str):
    client = openai.OpenAI(
        base_url="http://something:someport/v1",
        api_key="nope",
        timeout=8888,
    )
    message_list = [
        SystemMessage(
            "You'll need to extract keywords from input text. Output a JSON array of strings."
        )
    ]
    message_list.append(HumanMessage(text))
    answer = client.chat.completions.create(
        model="Qwen3-30B-A3B",
        messages=message_list,
        max_completion_tokens=512,
        temperature=0.7,
        top_p=0.8,
        presence_penalty=-0.05,
        frequency_penalty=0,
        extra_body={
            "guided_json": TypeAdapter(list[str]).json_schema(),
            "chat_template_kwargs": {"enable_thinking": False},
        },
    )
    print(f"Output: {answer.choices[0].message.content}")


if __name__ == "__main__":
    reproduce_qwen3_parser_bug("Write a hello world program in C language. Give detailed explanation as well.")
```
Output of enable_thinking=False:
```
["[", "]
```
Output of enable_think=True (an extra '\n' is generated but it's still valid json. Also had to increase max_completion_tokens for this one):
```
[

    "hello world",
    "C language",
    "program",
    "detailed explanation"
]
```
Output of enable_think=True and append /no_think (an extra '\n' is generated but it's still valid json):
```
[

    "C language",
    "hello world program",
    "programming",
    "code example",
    "syntax",
    "main function",
    "printf function",
    "compilation",
    "execution",
    "programming concepts"
]
```

Related?: https://github.com/vllm-project/vllm/issues/17393

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Broken Structured Output (Guided Decoding) with Qwen3 models when `enable_thinking=False` #18819

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Broken Structured Output (Guided Decoding) with Qwen3 models when enable_thinking=False #18819

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug]: Broken Structured Output (Guided Decoding) with Qwen3 models when `enable_thinking=False` #18819