Skip to content

fix(filter-openai): collapse 3 system messages into one#46

Open
joryirving wants to merge 1 commit into
tyzbit:mainfrom
joryirving:fix/openai-filter-single-system-message
Open

fix(filter-openai): collapse 3 system messages into one#46
joryirving wants to merge 1 commit into
tyzbit:mainfrom
joryirving:fix/openai-filter-single-system-message

Conversation

@joryirving

Copy link
Copy Markdown
Contributor

Summary

OpenAIFilterer.Filter builds its chat-completions request with three consecutive system messages (framing prompt, user prompt, closing instructions) followed by the user message. Strict chat templates (e.g. Qwen3 served by llama.cpp with --jinja, hardened Llama-3 instruct templates, and others) only allow a single leading system message and raise on anything else. The backend returns HTTP 400 before the model runs, and because OpenAIFilterer.Filter returns o.FilterOnFailure on any error — and FilterOnFailure defaults to trueevery message is silently dropped from the pipeline.

The sibling OpenAIAnnotator already does this correctly: it concatenates the framing prompt, user prompt, and closing instructions into a single system message. This PR makes the filter mirror that pattern.

Fixes #45

Changes

  • filter_openai.go — collapse OpenAISystemPrompt + o.UserPrompt + OpenAIFinalInstructions into one SystemMessage, matching annotator_openai.go.

  • filter_openai_test.go (new) — regression test that captures the outgoing request body via httptest.NewServer and asserts:

    1. Exactly one system message + one user message are sent (this is the direct regression guard for the issue).
    2. The system message contains the framing prompt, user prompt, and closing instructions in that order.
    3. The user message contains the original message text, unchanged.
    4. Authorization header is Bearer <apiKey>.
    5. Request path is /v1/chat/completions.

    Also pins the existing behavior called out in the issue:

    • FilterOnFailure struct-tag default is true (verified via mcuadros/go-defaults).
    • Backend error + FilterOnFailure: true → message is dropped (the silent-drop path).

Impact

  • Fixes silent message loss for any OpenAI-compatible backend that enforces a single leading system message: llama.cpp --jinja with Qwen3 (the example in the issue), hardened Llama-3 instruct templates, and others.
  • No behavior change for lenient backends (e.g. ChatML) — they accept both shapes, so collapsing three system messages into one is a no-op.
  • No config changeo.UserPrompt, OpenAISystemPrompt, and OpenAIFinalInstructions remain user-overridable; they just get concatenated.

Verification

go test -count=1 ./...   # all tests pass
go build ./...            # clean
go vet ./...              # clean (pre-existing IPv6 warnings in source_acarshub.go, unchanged)

The new test was also confirmed to fail when run against the un-patched code: it sees 3 system + 1 user messages and fails the count assertion.

Out of scope (intentionally)

  • Changing the FilterOnFailure: true default or making the silent-drop behavior more visible (e.g. emitting a metric/alert) — separate concern, happy to file a follow-up issue.
  • Refactoring the annotator/filter to share a "build system prompt" helper — out of scope for a bug-fix PR; trivial follow-up if maintainers want it.
  • Auditing other filters/annotators for the same pattern — should be a separate sweep.

Many strict chat templates (e.g. Qwen3 via llama.cpp --jinja) reject
requests with multiple consecutive system messages and raise
'System message must be at the beginning', returning HTTP 400. Because
OpenAIFilterer.Filter returns FilterOnFailure (default true) on any
error, every message was silently dropped from the pipeline.

Mirror the annotator pattern: concatenate the framing prompt, the
user prompt, and the closing instructions into a single leading
system message. The user message remains unchanged.

Fixes tyzbit#45

Adds a regression test (filter_openai_test.go) that captures the
outgoing request body via httptest and asserts exactly one system
message plus one user message, in the correct order. Also pins the
existing default behavior (FilterOnFailure=true) and the
backend-error -> silent drop path described in the issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenAI filter sends 3 consecutive system messages → 400 on strict chat templates (e.g. Qwen3 via llama.cpp), silently dropping all messages

1 participant