Summary
The experiment agent can invoke a gated mutation tool (e.g. save_scenario), and the backend logs confirm the tool ran with requires_approval=true — but the HITL approval is never persisted or surfaced, so the Chat UI shows only the assistant's prose and no Approve/Reject card.
Observed
- Experiment agent calls a gated tool such as
save_scenario.
- Backend logs:
save_scenario returned requires_approval=true.
- However
agent_session.pending_action remains null and status remains active (never awaiting_approval).
- No
approval_required WebSocket event is emitted.
- Frontend therefore renders only the assistant prose bubble and no Approve/Reject card.
- No
scenario_plan is persisted (correct), but there is also no pending approval to act on (incorrect).
Root cause
- Gated tools (
save_scenario, create_alias, …) return {"status": "approval_required", ...} to the model but do not write a machine-readable pending_action into AgentDeps / session state.
service.chat() and service.stream_chat() decide approval by inspecting final_result.pending_action (primary) / final_result.approval_required (fallback).
- The experiment agent's
output_type is ExperimentReport, which defines only run_id, status, summary, metrics, recommendations — it does not define pending_action or approval_required.
- Net: the approval signal is visible only to the model, which echoes it as prose in
summary; the structured output carries no approval field, so the service's check never fires → pending_action stays null, status stays active, and the approval_required event is never emitted.
Recommended fix
- Add a
pending_action slot to AgentDeps.
- Gated tools set
ctx.deps.pending_action (action_type, arguments, description) when requires_approval(<tool>) is true.
- Service layer (
chat() and stream_chat()) checks deps.pending_action first (deterministic, not model-dependent), then persists it to the session, flips status to awaiting_approval, and emits the approval_required event — falling back to the existing final_result.pending_action check.
Tests
- Regression test for the full chain: gated tool call →
deps.pending_action set → session pending_action persisted → status=awaiting_approval → approval_required event emitted (both chat() and stream_chat() paths).
- Note: the existing
test_chat_awaiting_approval_returns_pending only uses a pre-seeded awaiting_approval session fixture, so it does not cover the tool → pending_action propagation.
Notes
Summary
The experiment agent can invoke a gated mutation tool (e.g.
save_scenario), and the backend logs confirm the tool ran withrequires_approval=true— but the HITL approval is never persisted or surfaced, so the Chat UI shows only the assistant's prose and no Approve/Reject card.Observed
save_scenario.save_scenarioreturnedrequires_approval=true.agent_session.pending_actionremains null andstatusremains active (neverawaiting_approval).approval_requiredWebSocket event is emitted.scenario_planis persisted (correct), but there is also no pending approval to act on (incorrect).Root cause
save_scenario,create_alias, …) return{"status": "approval_required", ...}to the model but do not write a machine-readablepending_actionintoAgentDeps/ session state.service.chat()andservice.stream_chat()decide approval by inspectingfinal_result.pending_action(primary) /final_result.approval_required(fallback).output_typeisExperimentReport, which defines onlyrun_id, status, summary, metrics, recommendations— it does not definepending_actionorapproval_required.summary; the structured output carries no approval field, so the service's check never fires →pending_actionstays null, status staysactive, and theapproval_requiredevent is never emitted.Recommended fix
pending_actionslot toAgentDeps.ctx.deps.pending_action(action_type, arguments, description) whenrequires_approval(<tool>)is true.chat()andstream_chat()) checksdeps.pending_actionfirst (deterministic, not model-dependent), then persists it to the session, flipsstatustoawaiting_approval, and emits theapproval_requiredevent — falling back to the existingfinal_result.pending_actioncheck.Tests
deps.pending_actionset → sessionpending_actionpersisted →status=awaiting_approval→approval_requiredevent emitted (bothchat()andstream_chat()paths).test_chat_awaiting_approval_returns_pendingonly uses a pre-seededawaiting_approvalsession fixture, so it does not cover the tool → pending_action propagation.Notes