fix(agents): persist pending_action for gated tool calls

## Summary

The experiment agent can invoke a gated mutation tool (e.g. `save_scenario`), and the backend logs confirm the tool ran with `requires_approval=true` — but the HITL approval is never persisted or surfaced, so the Chat UI shows only the assistant's prose and **no Approve/Reject card**.

## Observed

- Experiment agent calls a gated tool such as `save_scenario`.
- Backend logs: `save_scenario` returned `requires_approval=true`.
- However `agent_session.pending_action` remains **null** and `status` remains **active** (never `awaiting_approval`).
- No `approval_required` WebSocket event is emitted.
- Frontend therefore renders only the assistant prose bubble and no Approve/Reject card.
- No `scenario_plan` is persisted (correct), but there is also no pending approval to act on (incorrect).

## Root cause

- Gated tools (`save_scenario`, `create_alias`, …) return `{"status": "approval_required", ...}` **to the model** but do **not** write a machine-readable `pending_action` into `AgentDeps` / session state.
- `service.chat()` and `service.stream_chat()` decide approval by inspecting `final_result.pending_action` (primary) / `final_result.approval_required` (fallback).
- The experiment agent's `output_type` is `ExperimentReport`, which defines only `run_id, status, summary, metrics, recommendations` — it does **not** define `pending_action` or `approval_required`.
- Net: the approval signal is visible only to the model, which echoes it as prose in `summary`; the structured output carries no approval field, so the service's check never fires → `pending_action` stays null, status stays `active`, and the `approval_required` event is never emitted.

## Recommended fix

- Add a `pending_action` slot to `AgentDeps`.
- Gated tools set `ctx.deps.pending_action` (action_type, arguments, description) when `requires_approval(<tool>)` is true.
- Service layer (`chat()` and `stream_chat()`) checks `deps.pending_action` **first** (deterministic, not model-dependent), then persists it to the session, flips `status` to `awaiting_approval`, and emits the `approval_required` event — falling back to the existing `final_result.pending_action` check.

## Tests

- Regression test for the full chain: gated tool call → `deps.pending_action` set → session `pending_action` persisted → `status=awaiting_approval` → `approval_required` event emitted (both `chat()` and `stream_chat()` paths).
- Note: the existing `test_chat_awaiting_approval_returns_pending` only uses a pre-seeded `awaiting_approval` session fixture, so it does not cover the tool → pending_action propagation.

## Notes

- Companion to #335 (surfacing fallback model failures) — both are agents-slice HITL/observability gaps.
- Do not include API keys, session secrets, or raw provider logs in any fix, test fixture, or surfaced error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agents): persist pending_action for gated tool calls #336

Summary

Observed

Root cause

Recommended fix

Tests

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

fix(agents): persist pending_action for gated tool calls #336

Description

Summary

Observed

Root cause

Recommended fix

Tests

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions