Summary
Interactive Agent Chat (Chat page → WebSocket /agents/stream → AgentService.stream_chat → agent.run_stream) fails on a local-Ollama agent. Both the primary (ollama:qwen3:8b) and fallback (ollama:llama3.1:8b) return the same Ollama-side 400, so FallbackModel raises a FallbackExceptionGroup and the UI shows "Stream error: All models from FallbackModel failed (2 sub-exceptions)".
Exact error (both sub-exceptions)
openai.BadRequestError: 400 - {'message': 'invalid message content type: <nil>', 'type': 'invalid_request_error'}
pydantic_ai.exceptions.ModelHTTPError: status_code: 400, model_name: qwen3:8b (and llama3.1:8b)
Raised from pydantic_ai/models/openai.py:request_stream → _completions_create (stream=True) against Ollama's OpenAI-compatible /v1/chat/completions. Fast fail (~1s), first turn (history_length=0).
Root cause
Streaming-path incompatibility between PydanticAI's OpenAIChatModel.request_stream (OpenAI client + OllamaProvider, base_url …/v1) and Ollama's OpenAI-compat endpoint: a message in the streamed request is serialized with content: null, which Ollama rejects (stricter than the real OpenAI API, which tolerates it).
Key distinction (verified)
Proposed fix
Add a non-streaming fallback for the ollama provider in AgentService.stream_chat: when agent_default_model (and/or fallback) is an ollama: model, run the turn with agent.run() and emit the result through the existing event path — one text_delta with the full text, then the existing deps.pending_action → approval_required handling, then complete. This sidesteps the broken streamed request while preserving the WS contract and the #336 HITL approval flow.
- Cloud providers keep the true streaming path (must remain unaffected).
- Alternative considered: sanitize the null-content message before the streamed request — rejected as brittle (it lives in PydanticAI/openai-client serialization). A PydanticAI version bump is out of scope (stop-and-ask per AGENTS.md).
Tests
stream_chat with an ollama:* agent_default_model emits text_delta + complete (and approval_required when a gated tool fires) without calling run_stream.
stream_chat with a cloud agent_default_model still uses run_stream (regression guard).
Notes
Found 2026-06-01 testing the #336 HITL approval card via Chat on a local-Ollama stack. Compounding local limitation: even on the working non-streaming path, qwen3:8b (8B) often doesn't emit tool calls (see related work).
Summary
Interactive Agent Chat (Chat page → WebSocket
/agents/stream→AgentService.stream_chat→agent.run_stream) fails on a local-Ollama agent. Both the primary (ollama:qwen3:8b) and fallback (ollama:llama3.1:8b) return the same Ollama-side 400, soFallbackModelraises aFallbackExceptionGroupand the UI shows "Stream error: All models from FallbackModel failed (2 sub-exceptions)".Exact error (both sub-exceptions)
Raised from
pydantic_ai/models/openai.py:request_stream → _completions_create(stream=True) against Ollama's OpenAI-compatible/v1/chat/completions. Fast fail (~1s), first turn (history_length=0).Root cause
Streaming-path incompatibility between PydanticAI's
OpenAIChatModel.request_stream(OpenAI client +OllamaProvider, base_url…/v1) and Ollama's OpenAI-compat endpoint: a message in the streamed request is serialized withcontent: null, which Ollama rejects (stricter than the real OpenAI API, which tolerates it).Key distinction (verified)
agent.run()inAgentService.chat, used by the showcaseagent_hitl_flowstep) works on Ollama (ran ~15 s, HTTP 200, tokens returned).agent.run_streaminAgentService.stream_chat) hits the 400.Proposed fix
Add a non-streaming fallback for the
ollamaprovider inAgentService.stream_chat: whenagent_default_model(and/or fallback) is anollama:model, run the turn withagent.run()and emit the result through the existing event path — onetext_deltawith the full text, then the existingdeps.pending_action→approval_requiredhandling, thencomplete. This sidesteps the broken streamed request while preserving the WS contract and the #336 HITL approval flow.Tests
stream_chatwith anollama:*agent_default_modelemitstext_delta+complete(andapproval_requiredwhen a gated tool fires) without callingrun_stream.stream_chatwith a cloudagent_default_modelstill usesrun_stream(regression guard).Notes
Found 2026-06-01 testing the #336 HITL approval card via Chat on a local-Ollama stack. Compounding local limitation: even on the working non-streaming path, qwen3:8b (8B) often doesn't emit tool calls (see related work).