Skip to content

Agent-state monitor over the experimental control-mode engine#692

Open
tony wants to merge 50 commits into
engine-opsfrom
engine-ops-supatui
Open

Agent-state monitor over the experimental control-mode engine#692
tony wants to merge 50 commits into
engine-opsfrom
engine-ops-supatui

Conversation

@tony

@tony tony commented Jun 27, 2026

Copy link
Copy Markdown
Member

Summary

  • Add libtmux.experimental.agents — a headless, self-healing monitor that reports, per pane, which coding agent is RUNNING, AWAITING_INPUT, IDLE, EXITED, or UNKNOWN, without polling or scraping pane output. Answers the orchestration question "which of my parallel agents needs me right now?"
  • Ingest two cooperative signal channels — a local tmux option subscription (@agent_state%subscription-changed) and a remote OSC 3008 escape carried in control-mode %output — reconciled by a lock-free last-writer-wins merge so the two channels can race freely.
  • Extend AsyncControlModeEngine with a supervised reconnect loop, a death-sentinel that closes subscribers cleanly, and per-subscriber broadcast queues, so a tmux restart or socket blip self-heals and concurrent consumers never steal each other's events.
  • Add typed refresh-client -B/-C to the RefreshClient operation — the substrate the monitor subscribes through (debounced, server-side change detection) instead of raw %output.
  • Expose the monitor over MCPlist_agents, watch_agents, and install_agent_hooks tools, plus a lifespan that starts/stops the monitor and gates the tools behind an engine capability check.
  • Ship non-clobbering shell hooks for Claude Code (~/.claude/settings.json) and Codex (~/.codex/config.toml), installable into a running session at any time.

Stacks on #690. Everything is additive under libtmux.experimentalno existing public API is touched, and the package is mypy-strict clean.

Changes by area

src/libtmux/experimental/agents/ (new package)

  • state.py: the AgentState enum (with from_signal() that maps unknown values to UNKNOWN rather than raising) and the frozen Agent record (name, state, since, source, pid, alive, plus is_running/is_awaiting).
  • merge.py: Stamp(counter, writer) ordering and latest() — the convergent, idempotent, out-of-order-tolerant merge rule. The clock is a pluggable callable (MonotonicCounter now, HLC later).
  • store.py: the durable value tier — a frozen AgentStore, the pure apply(store, event, *, now) reducer, Observed/Vanished events, a Storage protocol, and JsonFile (atomic temp-write + rename). Only apply() mutates state.
  • signals.py: OptionSignal (local) and OscSignal (remote) — the latter reassembles the byte-fragmented OSC 3008 stream tmux delivers in %output.
  • health.py: is_alive(pid) via os.kill(pid, 0); None (a PID-less remote pane) is treated as alive so a remote agent is never falsely expired.
  • tree.py: panes_of() / diff_panes() derived from models snapshots — the live session→window→pane projection.
  • monitor.py: AgentMonitor — the supervised start/stop/reconcile/status/agents contract, the reducer pipeline, and the reconcile sweep that self-attaches (excluding its own tmux -C session) and self-heals across reconnects.
  • hooks/: the AgentHook protocol (detect/install/uninstall/status), the shared emit() (local set-option, else remote OSC 3008/dev/tty), ClaudeCodeHook, CodexHook, and a registry().

src/libtmux/experimental/engines/async_control_mode.py

  • Supervised reconnect (_supervisor, deterministic jittered _backoff, backoff reset on a healthy connect), desired-state replay (add_subscription/set_attach_targets_replay_subscriptions/_replay_attach), a _STREAM_END death-sentinel broadcast to per-subscriber queues, and a subscribe() that no longer hangs when called after aclose().

src/libtmux/experimental/ops/_ops/refresh_client.py

  • Typed subscribe (-B) and size (-C) parameters, version-gated and serializable like every other operation.

src/libtmux/experimental/mcp/

  • vocabulary/agents.py: register_agents() registers list_agents / watch_agents / install_agent_hooks, behind a supports_monitor() capability gate.
  • _lifespan.py, events.py, fastmcp_adapter.py: start/stop the monitor in the server lifespan and skip the agent tools when the lifespan won't bring a monitor up.

CHANGES

  • An Agent-state monitor entry under ### What's new.

Design decisions

  • Source of truth is split by kind. tmux is authoritative for the observed tree (derive it, never store it); the monitor is authoritative for intent/run-state (agent identity + state), which tmux cannot hold and never persists. Deltas drive the fast path; a periodic full list-* snapshot diff is the correctness backstop, because tmux's change feed has blind spots (pane-died, window-resized, and title changes emit no notification).
  • Both signal channels, by necessity. The option path is lossless and re-queryable (show-options -p -v self-heals a dropped notification) but slow (~1 s debounce); the OSC path is instant and the only one that survives SSH (a remote set-option can't reach the local socket) but rides the lossy %output stream. Neither alone is sufficient.
Channel Transport Latency Loss model Reach
OptionSignal @agent_state%subscription-changed ~1 s lossless, re-queryable local socket
OscSignal OSC 3008%output (byte-fragmented) instant lossy (stream) survives SSH
  • Never infer death from silence. A missing notification never marks an agent EXITED. Local panes expire only on a failed PID probe; PID-less remote panes stay at their last-known state.
  • Lock-free convergence. The (counter, writer) latest() guard runs before the coalescing value write, so the two channels merge deterministically without locks regardless of arrival order.
  • Decoupled from the classic ORM. The package depends only on the AsyncTmuxEngine protocol, the models snapshots, and a Storage sink — not on Server/Session/Window/Pane.
MCP tool Purpose
list_agents snapshot of every known pane's agent + state
watch_agents live stream of agent-state transitions
install_agent_hooks install Claude Code / Codex hooks into the running session

Verification

The package is decoupled from the classic ORM (depends only on the engine protocol + models):

$ rg -n "from libtmux\.(server|session|window|pane|common) import|import libtmux\b" src/libtmux/experimental/agents

The remote signal is written to the pane tty, not stdout (which agent hooks capture):

$ rg -n "/dev/tty" src/libtmux/experimental/agents/hooks/emit.py

Async tests follow the repo convention (asyncio.run, no pytest-asyncio):

$ rg -n "pytest_asyncio|pytest\.mark\.asyncio" tests/experimental/agents tests/experimental/engines/test_async_control_mode_supervisor.py

MCP agent tools are gated behind an engine capability check:

$ rg -n "supports_monitor" src/libtmux/experimental/mcp

Test plan

  • Pure tier, no tmuxstate / merge / store / signals / tree / health unit tests + doctests (reducer idempotence, byte-fragmented OSC reassembly, last-writer-wins).
  • Engine resiliencetest_async_control_mode_supervisor.py and test_async_control_mode_sentinel.py cover supervised reconnect, attach replay on reconnect, broadcast to concurrent subscribers, and subscribe-after-close.
  • Live monitortest_live_monitor.py validates self-attach, reconcile, and survives-reconnect against a real tmux server via the libtmux fixtures.
  • Hooks — non-clobbering install/uninstall/status for Claude Code and Codex, including the legacy-Codex notify fallback.
  • MCPtest_agents_tools.py / test_attach_reset.py cover tool registration and the capability gate.
  • Full repo gateruff check, ruff format --check, mypy src tests, pytest, and just build-docs (the CHANGES entry + catalog directive) all green.

Refs #688, #689. Builds on #690.

@codecov

codecov Bot commented Jun 27, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 89.64174% with 133 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.74%. Comparing base (8f2c6a1) to head (6a0a06a).

Files with missing lines Patch % Lines
src/libtmux/experimental/agents/monitor.py 76.07% 33 Missing and 17 partials ⚠️
...libtmux/experimental/engines/async_control_mode.py 86.18% 14 Missing and 7 partials ⚠️
src/libtmux/experimental/agents/hooks/codex.py 78.37% 10 Missing and 6 partials ⚠️
tests/experimental/agents/test_monitor.py 81.25% 6 Missing and 3 partials ⚠️
tests/experimental/mcp/test_agents_tools.py 86.53% 5 Missing and 2 partials ⚠️
src/libtmux/experimental/agents/hooks/claude.py 90.90% 5 Missing and 1 partial ⚠️
src/libtmux/experimental/agents/store.py 91.52% 4 Missing and 1 partial ⚠️
src/libtmux/experimental/agents/hooks/emit.py 84.00% 2 Missing and 2 partials ⚠️
tests/experimental/mcp/test_events.py 88.23% 4 Missing ⚠️
tests/experimental/agents/test_live_monitor.py 95.08% 1 Missing and 2 partials ⚠️
... and 5 more
Additional details and impacted files
@@              Coverage Diff               @@
##           engine-ops     #692      +/-   ##
==============================================
+ Coverage       74.22%   75.74%   +1.51%     
==============================================
  Files             214      246      +32     
  Lines           12563    13819    +1256     
  Branches         1671     1794     +123     
==============================================
+ Hits             9325    10467    +1142     
- Misses           2586     2661      +75     
- Partials          652      691      +39     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tony tony force-pushed the engine-ops-supatui branch from cd305cf to 68e6a55 Compare June 27, 2026 22:17
tony added 26 commits June 28, 2026 17:37
why: The shared vocabulary every agents module reads/writes.

what:
- AgentState (running/awaiting_input/idle/exited/unknown) with from_signal
- frozen Agent record + is_running/is_awaiting helpers
why: Out-of-order/replayed agent-state updates must converge to newest.

what:
- Stamp(counter, writer) with deterministic tie-break
- latest() guard + pluggable Clock + MonotonicCounter default
why: A dead stream left consumers hanging on queue.get(), so settle
reported a false 'settled' (success-shaped) instead of stream_end.

what:
- broadcast a _STREAM_END sentinel to subscriber queues on death/close
- subscribe() ends its async for on the sentinel
why: A reconnect left _attached_session set, so the next _ensure_attached
call skipped re-attaching and %output was silently missing.

what:
- Pin contract with test_reset_attach_clears_flag (Task 10)
- Add _attached_session to _StreamEngine Protocol; drop type: ignore
why: Tasks 14/15 need a stable protocol + registry to import from;
the monitor needs the canonical event→state map to translate hook names.

what:
- Add EVENT_STATE canonical lifecycle-event→state dict in base.py
- Add AgentHook runtime_checkable Protocol (name/detect/install/uninstall/status)
- Add registry() + get() with lazy imports to avoid import cycles
- Add stub ClaudeCodeHook (claude.py) + CodexHook (codex.py) for Tasks 14/15
- Add test_registry.py (3 tests: canonical map, registry members, KeyError)
why: Per-pane %subscription-changed only flows to an *attached*
control client, but AgentMonitor.start() never attached — so the
monitor was silent against a live server. The spec's "re-attach
declared sessions" step was deferred in Tasks 9/10 and never wired.
This closes that gap and proves the whole pipeline end to end.

what:
- monitor.start(): after add_subscription, pick a real session via
  list-sessions, set_attach_targets([id]), and perform the initial
  attach-session through the engine (set _attached_session). A
  tmux -C client creates its own throwaway session on connect, which
  sorts first in list-sessions; _own_session_id (display-message -p
  '#{session_id}') identifies it so _primary_session_id skips it and
  attaches to a real session. Single-session v1 limit documented.
- async_control_mode: add _replay_attach(), called after
  _replay_subscriptions on every (re)connect — mirrors the
  subscription replay (direct stdin write, swallowed pending future,
  _write_lock/FIFO discipline) so the engine re-attaches across
  reconnects. No-op when _desired_attach is empty.
- tests/experimental/agents/test_live_monitor.py: live tests against
  real tmux. test_monitor_observes_running sets @agent_state running
  and polls (no manual attach — asserts start() self-attached).
  test_reconcile_parses_live_panes proves _parse_pane_rows handles
  real list-panes -F output and the Vanished->EXITED path.
why: Complete the feature branch with user-facing documentation and a
changelog entry so the agent-state monitor is discoverable.

what:
- Add superpowers/** to exclude_patterns in docs/conf.py so the spec
  and plan are tracked in git without generating toctree warnings
- Add ## Agents section to docs/experimental.md with prose intro and
  MyST-role cross-refs for AgentMonitor, AgentState, Agent, and the
  three MCP tools
- Add Agent-state monitor deliverable to CHANGES under ### What's new
  in prose (not bullets), linking {class} roles for AgentMonitor,
  AgentState, and AgentHook
- Stage docs/superpowers/ plans and specs into version control
why: AgentMonitor._drain ran a single subscribe() that ended on the
engine's death-sentinel; the supervisor reconnected but nothing
re-subscribed and reconcile ran only once in start(), so after any
blip list_agents served a stale snapshot forever (broke acceptance
#3 / D2). health.is_alive was dead code and the docs described
behavior the code did not do.

what:
- monitor: replace _drain with a supervised _run loop that reconciles
  FIRST each iteration (so subscribe() only runs against a live
  engine), then drains until the stream ends; on disconnect it retries
  reconcile with a bounded _reconnect_poll until the supervisor
  reconnects (replaying subs + attach), then re-subscribes. Split
  reconcile() into a defensive public wrapper + a raising
  _reconcile_once so the loop can wait for the engine to revive.
  stop() sets _stopping then cancels (no hang on a closed engine).
- monitor: wire health in reconcile via _apply_health — refresh each
  tracked agent's pid/alive from the pane tree; mark a LOCAL pane
  (pid set) EXITED when its process is dead; never auto-EXIT a
  PID-less remote pane (D5). Note the receive-time clock seam (D1).
- tests: add live test_monitor_survives_engine_reconnect (kill the
  control proc, confirm a NEW state is observed after reconnect) and
  unit tests for _apply_health (dead-local→EXITED, live refresh,
  pidless-never-exits).
- docs/experimental.md: correct the liveness, reconcile-on-reconnect,
  and hook-install (settings.json / config.toml + libtmux-agent-emit,
  not set-hook) descriptions; add an AgentMonitor usage snippet.
why: The "keepalive TTL" / "age out" wording described a mechanism
that does not exist in v1 — there is no TTL, keepalive, or staleness
timer. Remote PID-less panes are simply never auto-expired; they are
left at last-known state and become EXITED only via the Vanished diff
when their tmux pane actually disappears.

what:
- docs/experimental.md: reword the reconcile paragraph to drop the
  keepalive-TTL claim and state the real behavior.
- monitor._apply_health docstring: remote pid-less panes expire only
  via the Vanished/pane-gone path, not a TTL.
- health.py module docstring: this probe never declares remote panes
  dead; a keepalive/TTL is a possible future enhancement, not v1.
why: If monitor.start() raised, the existing finally block was skipped
because start() ran between the two try blocks, leaking the drain task.

what:
- Move monitor.start() + yield inside try/finally within the
  if monitor is not None: branch so stop() always runs on exit
- Add else: yield branch to cover the monitor=None path
why: With lifespan=False the monitor is never started, so registering
agent tools against it yields a list_agents that silently returns [].

what:
- Guard register_agents call with `monitor_enabled and lifespan`
  so tools are only wired when the lifespan will actually start them
why: _BLOCK_RE.sub(new_block, content) replaces every match, so a
manually-malformed config with two marker blocks produced two copies.

what:
- Rewrite the existing-block branch to strip ALL marker blocks
  (via _BLOCK_WITH_SEP_RE then _BLOCK_RE for start-of-file), then
  append exactly one fresh block
- Add test seeding a two-block config and asserting install collapses
  to one block with status "installed"
why: The OscSignal regex accepts both ST and BEL terminators but only
the ST path had a working doctest example.

what:
- Add BEL-terminated doctest to OscSignal.feed showing
  b"\033]3008;state=idle\007" yields a Reading with state idle
tony added 23 commits June 28, 2026 17:37
why: The status() == "outdated" branch (marker present, content stale)
had no test coverage.

what:
- Add test that installs, mutates the emit command in the block,
  and asserts status() returns "outdated"
why: The class docstring and parse() method docstring carried identical
doctests; --doctest-modules ran both, which is pure noise.

what:
- Replace class-level Examples block with prose description
- Keep the method-level doctest as the single runnable example
why: AgentStore was imported at runtime and again under TYPE_CHECKING;
the runtime import already satisfies the annotation, so the second
import is dead.

what:
- Remove the redundant TYPE_CHECKING import of AgentStore
why: When the display-message own-id probe fails, own is None, so the
`sid != own` guard is always true and _primary_session_id falls through
to return ids[0] — tmux's phantom `tmux -C` session (it sorts first),
which holds no agent panes. Attaching there leaves the option channel
effectively silent.

what:
- Return None from _primary_session_id when the own-session probe fails,
  so start() skips attach instead of binding to the phantom session
- Cover the case with a fake engine whose display-message probe raises
…close

why: aclose() sets _closing, broadcasts the stream-end sentinel, and
clears _subscribers. A subscribe() call afterwards registers a fresh
queue no broadcast will ever touch, hanging the consumer forever on
queue.get().

what:
- Gate subscribe() on _closing at the top: a permanently-closing engine
  yields nothing and ends at once. A merely _dead (reconnecting) engine
  still keeps the subscriber so the post-reconnect reader feeds it
- Cover with a regression test asserting the async for ends within a
  timeout after aclose()
…connect

why: The supervisor backoff counter climbed for the engine's whole life,
so a reconnect after a long healthy session waited near the 5s cap.

what:
- Reset the attempt counter to 0 once a spawn succeeds (its startup ACK
  was consumed = a healthy connection); only consecutive connect failures
  now escalate the backoff. Verified by inspection + the existing
  reconnect tests (a counter-reset assertion would require exposing the
  internal attempt, deferred).
why: tmux reports a failed attach (e.g. stale session id) as a non-zero
returncode, not an exception, so the monitor recorded _attached_session
even when the attach failed -- silencing the option channel.

what:
- monitor.start() now records the sticky attach only when attach-session
  returns returncode 0; logs the stderr on failure
- document _replay_attach's optimistic fire-and-forget attach and that a
  failed re-attach self-corrects on the monitor's next reconcile
why: The agent monitor needs to tell a floating overlay (e.g. a status
HUD) apart from a real agent pane; the snapshot had no floating flag and
the monitor's pane format did not request it.

what:
- Add PaneSnapshot.floating, parsed from #{pane_floating_flag} (tmux
  3.7+; renders empty -> False on older tmux, so no version gate)
- Request pane_floating_flag in the monitor's PANE_FORMAT
- Cover the snapshot floating flag and the format request

Foundation for the floating HUD; the renderer + monitor self-exclusion
land next.
why: The agent monitor was observation-only; on tmux 3.7 a floating
overlay can surface live agent state in the session itself, with no
external UI.

what:
- Add HudRenderer: a pure AgentStore -> text frame plus the typed
  RespawnPane paint op (the frame is shell-quoted and held open with
  `tail -f /dev/null` so it persists between repaints)
- AgentMonitor gains opt-in `hud=True`: start() creates one floating
  NewPane over the primary session and captures its id; the supervised
  drain repaints on every store change (dirty flag set in _observe and
  reconcile); stop() kills the HUD pane
- Exclude the HUD's own pane from _reconcile_once so it never enters the
  diff or the health sweep (tracking is signal-driven, so this is the
  only exclusion needed); best-effort throughout (no session, engine
  error, or tmux < 3.7 silently skips the HUD)
- Cover the renderer, the repaint op, HUD create/teardown, and the
  reconcile self-exclusion
why: _replay_attach sent attach-session fire-and-forget then cached
self._attached_session optimistically, without confirming success. If a
session was killed during the disconnect, the events layer's
_ensure_attached trusted the stale cache, skipped the re-attach, and a
wait_for_output caller got a silently-empty capture (its docstring
promises to raise instead).

what:
- Stop setting _attached_session in _replay_attach; the events layer
  owns that cache (set on a confirmed attach, re-attached on a miss).
  The fire-and-forget re-attach itself is unchanged.
- Update the now-stale docstrings on _replay_attach / set_attach_targets
- Rewrite the reconnect test to assert the cache stays unset across a
  reconnect (replay no longer caches optimistically)
why: On engine death the supervisor's _broadcast_stream_end ends the
subscribe stream, so _EventRing._drain completes. _ensure_started's bare
`if self._task is None` guard never restarts a *completed* task, so after
the first reconnect poll_events froze on a stale cursor.

what:
- Restart the drain when the task is None OR done, and clear any stale
  error so a healthy restart isn't masked
- Cover restart-when-done vs keep-when-running (parametrized)
why: watch_agents opened its own engine.subscribe() and re-ingested the
fan-out stream while the monitor's drain already ingests it, so every
event was processed twice -- drifting the MonotonicCounter and `since`
stamps. It was also tagged readOnlyHint=True despite mutating the store.

what:
- Drop the redundant subscribe()+ingest loop; observe the monitor's live
  store over the window (the monitor's drain is the sole ingester), so
  readOnlyHint=True is now accurate
- Cover that watch_agents never calls engine.subscribe()
why: _ensure_hud ran only in start(). After a full tmux restart the HUD
pane id is dead; _repaint_hud ignored the arun result (which reports a
tmux-side failure as data, not a raise), so _hud_pane_id was never
cleared and the HUD stayed dark for the rest of the session.

what:
- _repaint_hud now checks the result: a failed/errored repaint drops
  _hud_pane_id (leaving it dirty) so it can be recreated; the dirty flag
  is cleared only on a successful paint
- _run recreates the HUD (_ensure_hud) after a reconcile when it is
  enabled but has no pane id -- covering both a restart and an initial
  create that had no session yet
- Cover ok-keeps-pane vs fail-drops-pane (parametrized)
why: The module docstring said the HUD "repaints on every store change",
but _hud_dirty is set unconditionally on each notification and reconcile,
so it repaints on those events rather than only on an actual mutation.

what:
- Reword to "repaints after each notification and reconcile"
why: The AgentHook.install/uninstall doctests ran a bare
ClaudeCodeHook(), which defaults settings_path to the real
~/.claude/settings.json and rewrites it on every doctest run. The
"no-op on stub" comment was wrong -- these methods always write.

what:
- Redirect each doctest into a tempfile.TemporaryDirectory(), mirroring
  the already-isolated claude.py/codex.py doctests
why: The agent-monitor entry described the liveness/reconcile sweep as
"a periodic health probe" running "every few seconds". It is actually
event/reconnect-driven: _run reconciles at startup and again each time
the subscribe stream ends (a disconnect/reconnect). The only sleep is a
retry backoff on the failure path, not a timer.

what:
- "refreshed by a periodic health probe" -> "refreshed on each
  reconciliation"
- "runs every few seconds" -> "runs at startup and on each engine
  reconnect"
why: Two monitor test fakes failed strict mypy: a stream-engine fake
missing the _StreamEngine Protocol's _attached_session member, and a
capturing FastMCP stand-in passed where FastMCP is expected.

what:
- Add _attached_session to FakeStreamEngine/_BlockingStreamEngine
- Cast the capturing MCP fake to FastMCP at the register_agents call
why: from_dict raised ValueError on a state string absent from the
current enum (e.g. a store written by a newer version), crashing the
monitor on startup when it loads the store.

what:
- Deserialize state via AgentState.from_signal (unknown -> UNKNOWN),
  mirroring signal ingestion
- Add parametrized tests for known/unknown/garbage states
why: `emit ... --name` with the flag as the final CLI arg raised
IndexError instead of a clean exit, surfacing a traceback to the
agent's hook runner.

what:
- Fall back to name=None when --name has no following value
- Add parametrized tests for main's --name parsing
why: install_agent_hooks awaited blocking file I/O (read, fsync,
atomic replace) directly on the event loop, stalling concurrent
MCP tools during an install.

what:
- Run hook install/status via asyncio.to_thread
- Add parametrized tests for the install tool (known/unknown agent)
why: A PR does not own its release version; the monitor entry named a
concrete version that also went stale after rebasing onto newer master.

what:
- Open the entry with the package as subject, not "libtmux X.Y.Z ships"
- Add the (#692) PR ref to the deliverable heading
@tony tony force-pushed the engine-ops-supatui branch from 9bc1f82 to 6a0a06a Compare June 28, 2026 23:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant