Skip to content

fix(agents): persist pending_action for gated tool calls #336

@w7-mgfcode

Description

@w7-mgfcode

Summary

The experiment agent can invoke a gated mutation tool (e.g. save_scenario), and the backend logs confirm the tool ran with requires_approval=true — but the HITL approval is never persisted or surfaced, so the Chat UI shows only the assistant's prose and no Approve/Reject card.

Observed

  • Experiment agent calls a gated tool such as save_scenario.
  • Backend logs: save_scenario returned requires_approval=true.
  • However agent_session.pending_action remains null and status remains active (never awaiting_approval).
  • No approval_required WebSocket event is emitted.
  • Frontend therefore renders only the assistant prose bubble and no Approve/Reject card.
  • No scenario_plan is persisted (correct), but there is also no pending approval to act on (incorrect).

Root cause

  • Gated tools (save_scenario, create_alias, …) return {"status": "approval_required", ...} to the model but do not write a machine-readable pending_action into AgentDeps / session state.
  • service.chat() and service.stream_chat() decide approval by inspecting final_result.pending_action (primary) / final_result.approval_required (fallback).
  • The experiment agent's output_type is ExperimentReport, which defines only run_id, status, summary, metrics, recommendations — it does not define pending_action or approval_required.
  • Net: the approval signal is visible only to the model, which echoes it as prose in summary; the structured output carries no approval field, so the service's check never fires → pending_action stays null, status stays active, and the approval_required event is never emitted.

Recommended fix

  • Add a pending_action slot to AgentDeps.
  • Gated tools set ctx.deps.pending_action (action_type, arguments, description) when requires_approval(<tool>) is true.
  • Service layer (chat() and stream_chat()) checks deps.pending_action first (deterministic, not model-dependent), then persists it to the session, flips status to awaiting_approval, and emits the approval_required event — falling back to the existing final_result.pending_action check.

Tests

  • Regression test for the full chain: gated tool call → deps.pending_action set → session pending_action persisted → status=awaiting_approvalapproval_required event emitted (both chat() and stream_chat() paths).
  • Note: the existing test_chat_awaiting_approval_returns_pending only uses a pre-seeded awaiting_approval session fixture, so it does not cover the tool → pending_action propagation.

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions