feat: RunLivenessWatchdog shadow rule flags implausibly-long Runs by xmap · Pull Request #273 · xmap/cora

xmap · 2026-06-21T05:23:35Z

What

Step 1 of the trust-first observe-stream reframe: a shadow, observe-only rule inside the existing RunSupervisor loop that flags a Run which has been Running past an operator ceiling, i.e. the de-facto-dead scan a human must currently catch by hand. In shadow mode it only logs run_liveness.would_flag -- it records no Decision and issues no command. Off by default.

It is a rule, not a new agent (it runs under the existing RunSupervisor identity), and the per-channel observe-stream feeder is deliberately not built here (the 102-agent stress test's headline: decouple the source-agnostic rule from the structurally-expensive feeder).

Why

The one beamtime-stewardship failure CORA still cannot detect autonomously is a silently-hung run burning allocation while status stays Running. This rung delivers that detection now, on a signal CORA owns and cannot have spoofed, while building the live-run-watch loop without any feeder.

The signal: `running_since` (not `created_at`)

Keys on a new running_since column on proj_run_summary, set on RunStarted and RESET on RunResumed, so it measures actual Running-duration. created_at over-counts overnight Held intervals and would false-alarm (the failure the design warns against); updated_at is reset by every transition and isn't on the list read surface. (This corrects the design memo's earlier "no migration / use created_at" framing.)

Changes

additive nullable running_since migration (no backfill -- NULL never flags, the safe default)
running_since surfaced on list_runs (RunSummaryItem / SELECT)
is_run_stale pure rule (inclusive >= ceiling) + the shadow pass, run before the beam read so it's independent of beam I/O
run_liveness_ceiling_seconds setting (default None = off) + >0-when-set validator
edge-trigger memory (liveness) walled off from the beam-Hold FSM (memory)

Posture

Off + inert by default (two gates: the None ceiling AND run_supervisor_enabled). Observe-only: a flagged Run leaves hold/resume calls empty (proven behaviorally). Honors every stress-test fix: no feeder, no ControlPort hoist, no seeded-Agent grant, own walled memory, own (future) choice -- not the overloaded SupervisionDeferred.

Test plan

ruff, pyright, tach, full architecture fitness suite, and the run/api/decision unit suites green (verified locally + by the pre-push hook). New tests: is_run_stale (old/recent/inclusive-boundary/None), shadow tick (observe-only, off-when-None, recent-not-flagged = the resumed-overnight regression, NULL-never-flags, edge-trigger stable, multi-run only-stale-flagged, liveness-prune-on-leave, discard-then-re-flag), projection (RunStarted writes running_since; RunResumed resets it), config validator.

Gate review

4 agents (3 baseline + migration-safety): 3 APPROVE, 1 APPROVE-WITH-NITS, 0 P0/P1. R2's three coverage nits (multi-run filtering, liveness prune, discard/re-flag arm) were folded before commit. R1's AST command-ban is deferred to advise mode (shadow is log-only; observe-only is proven behaviorally).

Next rungs (deferred)

Advise mode -- a Decision(context=RunSupervision, choice=SupervisionQuieted) per quiet episode, promoted once the ceiling is calibrated from shadow logs. Then, only when staff confirm a real machine-readable signal (SNR-1/PROG-1), the per-channel observe-stream feeder. See project_run_liveness_watchdog_design.md.

🤖 Generated with Claude Code

Step 1 of the trust-first observe-stream reframe: a SHADOW, observe-only rule inside the existing RunSupervisor loop that logs (run_liveness.would_flag) a Run that has been Running past an operator ceiling, the de-facto-dead scan a human must currently catch by hand. It records NO Decision and issues NO command. Off by default (run_liveness_ceiling_seconds None = off, plus run_supervisor_enabled). It is a RULE, not a new agent (runs under the existing RunSupervisor identity), and keys on a NEW running_since proj_run_summary column, set on RunStarted and RESET on RunResumed, not created_at (which over-counts overnight Held intervals and would false-alarm). The edge-trigger memory (liveness) is walled off from the beam-Hold FSM (memory); NULL running_since never flags. - additive nullable running_since migration (no backfill: NULL never flags) - running_since surfaced on list_runs (RunSummaryItem / SELECT) - is_run_stale pure rule (inclusive >= ceiling) + the shadow pass before the beam read (independent of beam I/O) - run_liveness_ceiling_seconds setting + >0-when-set validator Advise mode (a Decision(choice=SupervisionQuieted)) and the per-channel observe-stream feeder are the gated next rungs. Gate review (4 agents: 3 baseline + migration-safety): 3 APPROVE, 1 APPROVE-WITH-NITS, 0 P0/P1. R2's 3 coverage nits (multi-run filtering, liveness prune, discard/re-flag arm) folded here. AST command-ban deferred to advise mode (shadow is log-only; observe-only proven behaviorally). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-06-21T05:34:04Z

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
apps/api/src/cora/api
_run_supervisor.py					661
apps/api/src/cora/infrastructure
config.py
apps/api/src/cora/run/features/list_runs
handler.py
apps/api/src/cora/run/projections
summary.py
Project Total

_{This report was generated by python-coverage-comment-action}

…leet (#290) The recent agent PRs (#233 gated resume, #266 ClearanceWatcher, #273 run-liveness, #288 observation-signal rules) shipped code-only; the hand-authored module docs had drifted. Shape-level catch-up, no internals: - agent/index.md: five -> six seeded agents (add ClearanceWatcher, a passive flag-only periodic-loop agent recording a ClearanceProgress Decision); note RunSupervisor also carries shadow observe-only rules. - run/index.md: RunSupervisor now does gated autonomous resume (not wind-down only) and carries shadow run-liveness / signal-quality / signal-stall rules that log-only; add is_simulated to the Observation shape + the entries_run_observations DDL with a one-line rationale. mkdocs build --strict passes. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

…l rules (#294) * feat(decision): add the three RunSupervision advise-rung choices Slice A of the observation-signal advise rung. Adds SupervisionQuieted (run-age liveness backstop), SupervisionStalled (Rule R rate-dropout), and SupervisionBreached (Rule Q quality-below-limit) to the RunSupervisionChoice Literal + RUN_SUPERVISION_CHOICES frozenset (7 -> 10), with the vocab test updated to the 10-value set + a work-noun guard on the new dispositions. WHY: promoting the shipped shadow observation-signal + run-liveness rules one rung (observe -> advise) means the supervisor records one Decision per breach edge for a human; that Decision's choice must exist in the closed set first. Decision-only dispositions (never a command). SupervisionBreached is the naming-r3 rename of the originally-proposed SupervisionDoubted: "Doubted" read as the supervisor's epistemic state; "Breached" names the objective limit-crossing, family-uniform with Deferred / Conflicted / Stalled. This slice adds vocabulary only; the supervisor emission lands next. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(api): promote the RunSupervisor shadow rules to the advise rung Slice B of the observation-signal advise rung. Adds run_supervisor_advise_enabled (default off, a further opt-in above each rule's own enable) and, when on, emits exactly one Decision per breach EDGE from the three shadow rules -- still issuing NO command (advise rung): - run-liveness backstop -> SupervisionQuieted - Rule R rate-dropout -> SupervisionStalled - Rule Q quality breach -> SupervisionBreached WHY: the shadow rules (#288 / #273) log would_flag but leave no durable record a human can triage. The advise rung climbs exactly one step (observe -> advise), recording one RunSupervision Decision per breach episode for a human while keeping the act rung (auto-Hold) deferred. Emission is edge-triggered off the already-walled per-rule memory (one Decision per episode; nothing on a standing breach across ticks), beam-free (the liveness rule runs before the beam read), and reuses the existing DecisionRegistered shape under the RunSupervisor identity + Authorize path. Shadow logging is unchanged; advise only adds the Decision. cannot-tell still defers (no Decision). Tests cover advise-off (no Decision), each disposition under advise-on (one Decision, no command), and edge-triggering (one Decision across two ticks of a standing breach). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(api): cover advise-rung edge-trigger + cannot-tell gates Gate-review follow-ups (the advise diff drew 2 ship + 1 changes_needed, the last purely a test-coverage gap; the correctness/trust lens passed clean). Adds three tests: - advise liveness is edge-triggered: two ticks of a standing stale Run record only ONE SupervisionQuieted Decision (parity with the quality + stall edge-trigger tests). - advise records no Decision when the quality channel has no observation (cannot-tell -> defer; pins that the value-None path never emits, which a reviewer worried about -- the decider returns would_flag=False on None). - advise records no Decision when the rule is disabled (snr_limit None): advise respects each rule's own enable, not just the global advise flag. Test-only; no production change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(api): cover the advise-emitter ConcurrencyError no-op branch The diff-coverage gate (hard 90% on changed lines) flagged _run_supervisor.py at 88.9%: the new _record_supervision_advice except ConcurrencyError branch (lines 490-491) was uncovered. Adds an idempotency test that re-derives the same advise Decision id (via a FixedIdGenerator repeating the id) so the second append collides and is swallowed -- mirrors the existing test_record_decision_is_idempotent_on_repeated_id for the beam-Hold path. Test-only; covers the cross-restart re-emission no-op. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

xmap merged commit 6bee0cd into main Jun 21, 2026
16 checks passed

xmap deleted the feat/run-liveness-watchdog branch June 21, 2026 05:35

xmap mentioned this pull request Jun 21, 2026

docs(modules): bring agent + run module docs current with the agent fleet #290

Merged

xmap mentioned this pull request Jun 21, 2026

feat(api): RunSupervisor advise rung for the shadow observation-signal rules #294

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: RunLivenessWatchdog shadow rule flags implausibly-long Runs#273

feat: RunLivenessWatchdog shadow rule flags implausibly-long Runs#273
xmap merged 1 commit into
mainfrom
feat/run-liveness-watchdog

xmap commented Jun 21, 2026

Uh oh!

github-actions Bot commented Jun 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xmap commented Jun 21, 2026

What

Why

The signal: running_since (not created_at)

Changes

Posture

Test plan

Gate review

Next rungs (deferred)

Uh oh!

github-actions Bot commented Jun 21, 2026

Coverage report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

The signal: `running_since` (not `created_at`)