Skip to content

fix(agents): Agent Chat streaming fails on Ollama (400 'invalid message content type: <nil>') #342

@w7-mgfcode

Description

@w7-mgfcode

Summary

Interactive Agent Chat (Chat page → WebSocket /agents/streamAgentService.stream_chatagent.run_stream) fails on a local-Ollama agent. Both the primary (ollama:qwen3:8b) and fallback (ollama:llama3.1:8b) return the same Ollama-side 400, so FallbackModel raises a FallbackExceptionGroup and the UI shows "Stream error: All models from FallbackModel failed (2 sub-exceptions)".

Exact error (both sub-exceptions)

openai.BadRequestError: 400 - {'message': 'invalid message content type: <nil>', 'type': 'invalid_request_error'}
pydantic_ai.exceptions.ModelHTTPError: status_code: 400, model_name: qwen3:8b   (and llama3.1:8b)

Raised from pydantic_ai/models/openai.py:request_stream → _completions_create (stream=True) against Ollama's OpenAI-compatible /v1/chat/completions. Fast fail (~1s), first turn (history_length=0).

Root cause

Streaming-path incompatibility between PydanticAI's OpenAIChatModel.request_stream (OpenAI client + OllamaProvider, base_url …/v1) and Ollama's OpenAI-compat endpoint: a message in the streamed request is serialized with content: null, which Ollama rejects (stricter than the real OpenAI API, which tolerates it).

Key distinction (verified)

Proposed fix

Add a non-streaming fallback for the ollama provider in AgentService.stream_chat: when agent_default_model (and/or fallback) is an ollama: model, run the turn with agent.run() and emit the result through the existing event path — one text_delta with the full text, then the existing deps.pending_actionapproval_required handling, then complete. This sidesteps the broken streamed request while preserving the WS contract and the #336 HITL approval flow.

  • Cloud providers keep the true streaming path (must remain unaffected).
  • Alternative considered: sanitize the null-content message before the streamed request — rejected as brittle (it lives in PydanticAI/openai-client serialization). A PydanticAI version bump is out of scope (stop-and-ask per AGENTS.md).

Tests

  • stream_chat with an ollama:* agent_default_model emits text_delta + complete (and approval_required when a gated tool fires) without calling run_stream.
  • stream_chat with a cloud agent_default_model still uses run_stream (regression guard).

Notes

Found 2026-06-01 testing the #336 HITL approval card via Chat on a local-Ollama stack. Compounding local limitation: even on the working non-streaming path, qwen3:8b (8B) often doesn't emit tool calls (see related work).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions