Skip to content

OpenAI filter sends 3 consecutive system messages → 400 on strict chat templates (e.g. Qwen3 via llama.cpp), silently dropping all messages #45

Description

@joryirving

Summary

OpenAIFilterer.Filter builds its request with three consecutive system messages followed by the user message. Many chat templates only allow a single, leading system message and raise on anything else. When the backend enforces this (e.g. Qwen3 served by llama.cpp with --jinja), every request returns HTTP 400 before the model runs. Because the filter returns FilterOnFailure on any error — and FilterOnFailure defaults to trueevery message is silently filtered out. For an OpenAI-filter → Discord pipeline, this looks like the integration has simply stopped forwarding anything.

Version: v1.3.2 (latest). Backend: llama.cpp server (ghcr.io/ggml-org/llama.cpp:server-*) with --jinja, Qwen3-family GGUF.

Error returned by the backend

POST ".../chat/completions": 400 Bad Request
{"error":{"code":400,"message":"Unable to generate parser for this template. Automatic parser generation failed:
------------
While executing CallExpression at line 85, column 32 in source:
...first %}\n            {{- raise_exception('System message must be at the beginnin...
                                           ^
Error: Jinja Exception: System message must be at the beginning.","type":"invalid_request_error"}}

Followed by:

WARN  error filtering with OpenAIFilterer in step number 5: ... 400 Bad Request ...
INFO  message ... was filtered in step 5 by OpenAIFilterer

Root cause

In filter_openai.go, the request is assembled as:

chatCompletion, err := client.Chat.Completions.New(context.TODO(),
    openai.ChatCompletionNewParams{
        Messages: openai.F([]openai.ChatCompletionMessageParamUnion{
            openai.SystemMessage(OpenAISystemPrompt),     // system #1
            openai.SystemMessage(o.UserPrompt),           // system #2
            openai.SystemMessage(OpenAIFinalInstructions),// system #3
            openai.UserMessage(ms),
        }),
        ...

Three system messages in a row. Strict templates (Qwen3, and others) reject non-leading/repeated system messages via raise_exception(...), so the call 400s. On that error the function returns o.FilterOnFailure, which defaults to true, so the message is dropped.

Suggested fix

The sibling annotator already does this correctly — it concatenates everything into a single system message, see annotator_openai.go:125:

openai.SystemMessage(OpenAIAnnotatorFirstInstructions + a.UserPrompt + OpenAIAnnotatorFinalInstructions),
openai.UserMessage("Here is the message to evaluate:\n" + msg),

The filter should mirror this: collapse OpenAISystemPrompt + o.UserPrompt + OpenAIFinalInstructions into one SystemMessage (or move the instructions into the user turn). That keeps a single leading system message and works across both lenient and strict templates.

Impact / severity

  • Silent: with FilterOnFailure: true (the default), there is no user-facing error — messages just stop flowing, easily mistaken for "no matching traffic."
  • Affects any OpenAI-compatible backend that enforces single-leading-system templates; notably llama.cpp --jinja with Qwen3 models, a common self-hosted setup that the README's http://llama-server:8080/v1 examples point at.

Repro

  1. Run llama.cpp server with a Qwen3 GGUF and --jinja.
  2. Configure an OpenAI filter step pointing at it (URL: http://.../v1, any UserPrompt).
  3. Send any message with text. Backend returns 400 (System message must be at the beginning); the message is filtered out.

Workaround (until fixed)

Override the model's chat template to a lenient one (e.g. --chat-template chatml) so repeated system messages are accepted — at the cost of the model's native (thinking/tool-call) template — or set FilterOnFailure: false to fail open. Neither is a real fix; the message construction above is the bug.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions