Skip to content

feat: RunLivenessWatchdog shadow rule flags implausibly-long Runs#273

Merged
xmap merged 1 commit into
mainfrom
feat/run-liveness-watchdog
Jun 21, 2026
Merged

feat: RunLivenessWatchdog shadow rule flags implausibly-long Runs#273
xmap merged 1 commit into
mainfrom
feat/run-liveness-watchdog

Conversation

@xmap

@xmap xmap commented Jun 21, 2026

Copy link
Copy Markdown
Owner

What

Step 1 of the trust-first observe-stream reframe: a shadow, observe-only rule inside the existing RunSupervisor loop that flags a Run which has been Running past an operator ceiling, i.e. the de-facto-dead scan a human must currently catch by hand. In shadow mode it only logs run_liveness.would_flag -- it records no Decision and issues no command. Off by default.

It is a rule, not a new agent (it runs under the existing RunSupervisor identity), and the per-channel observe-stream feeder is deliberately not built here (the 102-agent stress test's headline: decouple the source-agnostic rule from the structurally-expensive feeder).

Why

The one beamtime-stewardship failure CORA still cannot detect autonomously is a silently-hung run burning allocation while status stays Running. This rung delivers that detection now, on a signal CORA owns and cannot have spoofed, while building the live-run-watch loop without any feeder.

The signal: running_since (not created_at)

Keys on a new running_since column on proj_run_summary, set on RunStarted and RESET on RunResumed, so it measures actual Running-duration. created_at over-counts overnight Held intervals and would false-alarm (the failure the design warns against); updated_at is reset by every transition and isn't on the list read surface. (This corrects the design memo's earlier "no migration / use created_at" framing.)

Changes

  • additive nullable running_since migration (no backfill -- NULL never flags, the safe default)
  • running_since surfaced on list_runs (RunSummaryItem / SELECT)
  • is_run_stale pure rule (inclusive >= ceiling) + the shadow pass, run before the beam read so it's independent of beam I/O
  • run_liveness_ceiling_seconds setting (default None = off) + >0-when-set validator
  • edge-trigger memory (liveness) walled off from the beam-Hold FSM (memory)

Posture

Off + inert by default (two gates: the None ceiling AND run_supervisor_enabled). Observe-only: a flagged Run leaves hold/resume calls empty (proven behaviorally). Honors every stress-test fix: no feeder, no ControlPort hoist, no seeded-Agent grant, own walled memory, own (future) choice -- not the overloaded SupervisionDeferred.

Test plan

ruff, pyright, tach, full architecture fitness suite, and the run/api/decision unit suites green (verified locally + by the pre-push hook). New tests: is_run_stale (old/recent/inclusive-boundary/None), shadow tick (observe-only, off-when-None, recent-not-flagged = the resumed-overnight regression, NULL-never-flags, edge-trigger stable, multi-run only-stale-flagged, liveness-prune-on-leave, discard-then-re-flag), projection (RunStarted writes running_since; RunResumed resets it), config validator.

Gate review

4 agents (3 baseline + migration-safety): 3 APPROVE, 1 APPROVE-WITH-NITS, 0 P0/P1. R2's three coverage nits (multi-run filtering, liveness prune, discard/re-flag arm) were folded before commit. R1's AST command-ban is deferred to advise mode (shadow is log-only; observe-only is proven behaviorally).

Next rungs (deferred)

Advise mode -- a Decision(context=RunSupervision, choice=SupervisionQuieted) per quiet episode, promoted once the ceiling is calibrated from shadow logs. Then, only when staff confirm a real machine-readable signal (SNR-1/PROG-1), the per-channel observe-stream feeder. See project_run_liveness_watchdog_design.md.

🤖 Generated with Claude Code

Step 1 of the trust-first observe-stream reframe: a SHADOW, observe-only rule
inside the existing RunSupervisor loop that logs (run_liveness.would_flag) a Run
that has been Running past an operator ceiling, the de-facto-dead scan a human
must currently catch by hand. It records NO Decision and issues NO command. Off
by default (run_liveness_ceiling_seconds None = off, plus run_supervisor_enabled).

It is a RULE, not a new agent (runs under the existing RunSupervisor identity),
and keys on a NEW running_since proj_run_summary column, set on RunStarted and
RESET on RunResumed, not created_at (which over-counts overnight Held intervals
and would false-alarm). The edge-trigger memory (liveness) is walled off from
the beam-Hold FSM (memory); NULL running_since never flags.

- additive nullable running_since migration (no backfill: NULL never flags)
- running_since surfaced on list_runs (RunSummaryItem / SELECT)
- is_run_stale pure rule (inclusive >= ceiling) + the shadow pass before the
  beam read (independent of beam I/O)
- run_liveness_ceiling_seconds setting + >0-when-set validator

Advise mode (a Decision(choice=SupervisionQuieted)) and the per-channel
observe-stream feeder are the gated next rungs.

Gate review (4 agents: 3 baseline + migration-safety): 3 APPROVE, 1
APPROVE-WITH-NITS, 0 P0/P1. R2's 3 coverage nits (multi-run filtering, liveness
prune, discard/re-flag arm) folded here. AST command-ban deferred to advise mode
(shadow is log-only; observe-only proven behaviorally).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  apps/api/src/cora/api
  _run_supervisor.py 661
  apps/api/src/cora/infrastructure
  config.py
  apps/api/src/cora/run/features/list_runs
  handler.py
  apps/api/src/cora/run/projections
  summary.py
Project Total  

This report was generated by python-coverage-comment-action

@xmap xmap merged commit 6bee0cd into main Jun 21, 2026
16 checks passed
@xmap xmap deleted the feat/run-liveness-watchdog branch June 21, 2026 05:35
xmap added a commit that referenced this pull request Jun 21, 2026
…leet (#290)

The recent agent PRs (#233 gated resume, #266 ClearanceWatcher, #273
run-liveness, #288 observation-signal rules) shipped code-only; the
hand-authored module docs had drifted. Shape-level catch-up, no internals:

- agent/index.md: five -> six seeded agents (add ClearanceWatcher, a
  passive flag-only periodic-loop agent recording a ClearanceProgress
  Decision); note RunSupervisor also carries shadow observe-only rules.
- run/index.md: RunSupervisor now does gated autonomous resume (not
  wind-down only) and carries shadow run-liveness / signal-quality /
  signal-stall rules that log-only; add is_simulated to the Observation
  shape + the entries_run_observations DDL with a one-line rationale.

mkdocs build --strict passes.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
xmap added a commit that referenced this pull request Jun 22, 2026
…l rules (#294)

* feat(decision): add the three RunSupervision advise-rung choices

Slice A of the observation-signal advise rung. Adds SupervisionQuieted
(run-age liveness backstop), SupervisionStalled (Rule R rate-dropout), and
SupervisionBreached (Rule Q quality-below-limit) to the RunSupervisionChoice
Literal + RUN_SUPERVISION_CHOICES frozenset (7 -> 10), with the vocab test
updated to the 10-value set + a work-noun guard on the new dispositions.

WHY: promoting the shipped shadow observation-signal + run-liveness rules
one rung (observe -> advise) means the supervisor records one Decision per
breach edge for a human; that Decision's choice must exist in the closed
set first. Decision-only dispositions (never a command). SupervisionBreached
is the naming-r3 rename of the originally-proposed SupervisionDoubted:
"Doubted" read as the supervisor's epistemic state; "Breached" names the
objective limit-crossing, family-uniform with Deferred / Conflicted /
Stalled. This slice adds vocabulary only; the supervisor emission lands next.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(api): promote the RunSupervisor shadow rules to the advise rung

Slice B of the observation-signal advise rung. Adds run_supervisor_advise_enabled
(default off, a further opt-in above each rule's own enable) and, when on, emits
exactly one Decision per breach EDGE from the three shadow rules -- still issuing
NO command (advise rung):
  - run-liveness backstop  -> SupervisionQuieted
  - Rule R rate-dropout    -> SupervisionStalled
  - Rule Q quality breach  -> SupervisionBreached

WHY: the shadow rules (#288 / #273) log would_flag but leave no durable record a
human can triage. The advise rung climbs exactly one step (observe -> advise),
recording one RunSupervision Decision per breach episode for a human while keeping
the act rung (auto-Hold) deferred. Emission is edge-triggered off the already-walled
per-rule memory (one Decision per episode; nothing on a standing breach across
ticks), beam-free (the liveness rule runs before the beam read), and reuses the
existing DecisionRegistered shape under the RunSupervisor identity + Authorize path.
Shadow logging is unchanged; advise only adds the Decision. cannot-tell still
defers (no Decision). Tests cover advise-off (no Decision), each disposition under
advise-on (one Decision, no command), and edge-triggering (one Decision across two
ticks of a standing breach).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(api): cover advise-rung edge-trigger + cannot-tell gates

Gate-review follow-ups (the advise diff drew 2 ship + 1 changes_needed, the
last purely a test-coverage gap; the correctness/trust lens passed clean).
Adds three tests:
  - advise liveness is edge-triggered: two ticks of a standing stale Run
    record only ONE SupervisionQuieted Decision (parity with the quality +
    stall edge-trigger tests).
  - advise records no Decision when the quality channel has no observation
    (cannot-tell -> defer; pins that the value-None path never emits, which a
    reviewer worried about -- the decider returns would_flag=False on None).
  - advise records no Decision when the rule is disabled (snr_limit None):
    advise respects each rule's own enable, not just the global advise flag.

Test-only; no production change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(api): cover the advise-emitter ConcurrencyError no-op branch

The diff-coverage gate (hard 90% on changed lines) flagged
_run_supervisor.py at 88.9%: the new _record_supervision_advice except
ConcurrencyError branch (lines 490-491) was uncovered. Adds an idempotency
test that re-derives the same advise Decision id (via a FixedIdGenerator
repeating the id) so the second append collides and is swallowed -- mirrors
the existing test_record_decision_is_idempotent_on_repeated_id for the
beam-Hold path. Test-only; covers the cross-restart re-emission no-op.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant