feat(run): shadow closed-loop observation-signal rules in the RunSupervisor#288
Merged
Conversation
…p read seam Slice 1 of the observation-signal port design lock ([[project_observation_signal_port_design]]). Adds an additive is_simulated boolean column to entries_run_observations (NOT NULL DEFAULT false, forward-only) plus the field on ObservationInput / the REST request / the MCP tool / the Observation row. WHY: the closed-loop quality + stall rules must never act on simulated data as if it were real. The flag travels WITH the datum (not a route registry: observations key on operator channel_name, not a substrate address, and one channel can carry real data on one Run and sim data on another). A single boolean is the right grain: a row has one origin; the window-level mixed case is an OR-fold on the read side. Default false keeps the route back-compatible and is the safe direction for the gate (a real adapter that omits the flag reads as real). The read-side mirror of the Operation BC ActuationKind "any simulator touch disqualifies" gate. openapi.json + atlas.sum regenerated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Slice 2 of the observation-signal port design lock ([[project_observation_signal_port_design]]). Adds the read half of the closed-loop seam: a Run-BC-local RunChannelLookup port (run/ports/) with read_run_channel_latest (Rule Q point read) and read_run_channel_window (Rule R windowed count_since), a PostgresRunChannelLookup adapter (run/adapters/) querying the existing entries_run_observations table, an InMemoryRunChannelLookup stub (seedable; unseeded = always-quiet default), and a required (run_id, channel_name, recorded_at DESC) btree index. WHY: the rules need to read a live Run's channels without a new projection (a projection would cost a permanent fold and read staler than the source on the very freshness signal Rule R depends on). The port is BC-local because its sole consumer is the composition-root supervisor (EnclosureObserver precedent); promote to infrastructure/ports only on a real second cross-BC consumer. Freshness keys on recorded_at (CORA write-time), never the spoofable sampled_at; the read surfaces is_simulated rather than filtering it, so a mislabeled-sim real row still shows (the decider disqualifies sim, the safe direction) and the sim feeder can exercise the rules end to end. The new index is load-bearing: the pre-existing sampled_at indexes carry no channel_name and order by the wrong column. atlas.sum regenerated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Slice 3 of the observation-signal port design lock ([[project_observation_signal_port_design]]). Adds an append-only entries_run_feed_heartbeats table (one row per drain tick; REVOKE UPDATE/DELETE/TRUNCATE per the entries_* append-only contract), a BC-internal FeedHeartbeatStore (Postgres + InMemory, idempotent on event_id), and a read_feed_health method on RunChannelLookup returning the newest heartbeat recorded_at (RunFeedHealth VO). WHY: Rule R must distinguish a genuinely quiet channel from a DEAD feeder. The feeder pings a heartbeat every tick regardless of data flow; the decider treats a heartbeat older than the operator ceiling (or none at all) as cannot-tell and defers, so a dead feeder disables the stall flag instead of an absent channel masquerading as a stall. Append-only INSERT not UPSERT: an UPSERT needs UPDATE, which the append-only role is REVOKEd from; MAX(recorded_at) answers "newest heartbeat" without mutable state. Freshness keys on recorded_at (CORA write time), not the producer-asserted heartbeat_at. The adapter returns the raw recorded_at; the decider owns the clock + ceiling so liveness derivation stays pure. atlas.sum regenerated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Slice 4 of the observation-signal port design lock ([[project_observation_signal_port_design]]). Adds nullable snr_limit + expected_observation_interval_seconds columns to proj_run_summary, derived from effective_parameters on BOTH the RunStarted INSERT arm and the RunAdjusted UPDATE arm, and surfaced on RunSummaryItem / _SELECT_COLUMNS / _row_to_item (the running_since precedent). WHY: the two rules need a per-Run scalar each, and the supervisor reads RunSummaryItem every tick. Precompute keeps that an O(1) projection-column read instead of a per-Running-run fold (the supervisor does ZERO such folds today; a fold per tick would add stream-replay I/O on the common case). Recompute on RunAdjusted is IN v1, not deferred: RunAdjusted already re-snapshots effective_parameters and is already subscribed, so a mid-run re-cadence must refresh the inputs in the same arm or Rule R evaluates against a stale interval (false alarm or masked stall, the exact stale-baseline failure the watchdog guards against). The inputs are operator-declared keys (not hardwired to n_projections, avoiding the one-observation-per-projection assumption); absent / non-positive / non-finite values land NULL, which disables that rule for the Run (cannot-tell -> defer). atlas.sum regenerated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Slice 5 of the observation-signal port design lock ([[project_observation_signal_port_design]]). Adds two pure deciders -- decide_quality_signal (Rule Q: latest channel value vs the operator-set limit) and decide_signal_stall (Rule R: beam-aware rate-dropout) -- and a shadow OBSERVE-ONLY pass inside _supervise_tick that logs run_quality.would_flag / run_stall.would_flag, records NO Decision and issues NO command (byte-identical to the run-liveness shadow). New config gates both rules off by default (run_quality_channel_name / run_stall_channel_name None) above run_supervisor_enabled. WHY: this is the FINE per-channel detector the coarse run-age liveness rule cannot be. Each rule keeps its own edge-trigger state (quality, stall + the stall_streak hysteresis counter, feed_dead_warned) walled off from the beam-Hold FSM memory and from each other, so one rule flapping cannot corrupt another. Rule R reuses the tick's single beam read (a gap while beam is down is expected, never a stall), checks feeder health FIRST (a dead/never-seen feeder defers and warns loudly, never reads as a calm run), and requires run_stall_hysteresis_ticks consecutive stall-condition ticks (anti-flap for top-ups). Every cannot-tell path (missing signal, disabled rule, dead feeder, beam down, degenerate interval) defers (Lock 4). RunChannelLookup is constructed BC-local at the composition root, not on the Kernel. Shadow flags on the merits incl. simulated data (observe-only, so the sim feeder exercises it); the is_simulated act-disqualify lands with the advise/act rung. Pure deciders live in api/ (outside the feature-tree decider gates) so they carry explicit unit + Hypothesis property tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… to end Slice 6 (final shadow-v1 slice) of the observation-signal port design lock ([[project_observation_signal_port_design]]). Adds SimObservationFeeder, a replay feeder that drives the rules end to end over the REAL write path: each drain emits the elapsed trace points (is_simulated=True) via the AppendObservations handler and pings a heartbeat, under a distinct sim principal (SIM_OBSERVATION_FEEDER_AGENT_ID). CORA ships ONLY the sim feeder; each deployment writes its real EPICS / tomoStream feeder against the same write path (no new ingest port). WHY: the rules need to be exercisable without a real IOC, and the gate-review required a dead-feeder integration test. The two integration tests prove (1) sim rows carry is_simulated=True and are attributable to the sim principal (the split that lets authz / actor_id tell sim from real), and (2) the dead-feeder transition: a fresh heartbeat lets a zero-arrival channel stall-flag, while a stale heartbeat (feeder stopped + clock past the ceiling) defers (feed_dead) -- a dead feeder is never read as a calm stall. The real-feeder lifespan + its Authorize grant are deployment-owned (out of the spine for shadow v1). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…dit field Gate-review follow-ups for the observation-signal port. Three reviewers flagged a P1: the is_simulated migration comment claimed "the production read path filters is_simulated = false", which contradicts the implemented design. The read SURFACES is_simulated (it does not hard-filter, so a mislabeled-sim real row is not hidden and the sim feeder can exercise the rules); the DECIDER disqualifies sim in the future act/advise rung, and shadow observes + logs even simulated breaches. The comment now states that accurately and points at the port docstring. Also adds a unit test asserting run_quality.would_flag carries the is_simulated field, so an operator's forensic log read can tell a real breach from a simulator rehearsal and a refactor cannot silently drop the audit field. (A fourth reviewer's "P0" that the field was missing was a false alarm: it is already logged on both shadow lines; this test locks it in.) atlas.sum regenerated for the comment change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
xmap
added a commit
that referenced
this pull request
Jun 21, 2026
…leet (#290) The recent agent PRs (#233 gated resume, #266 ClearanceWatcher, #273 run-liveness, #288 observation-signal rules) shipped code-only; the hand-authored module docs had drifted. Shape-level catch-up, no internals: - agent/index.md: five -> six seeded agents (add ClearanceWatcher, a passive flag-only periodic-loop agent recording a ClearanceProgress Decision); note RunSupervisor also carries shadow observe-only rules. - run/index.md: RunSupervisor now does gated autonomous resume (not wind-down only) and carries shadow run-liveness / signal-quality / signal-stall rules that log-only; add is_simulated to the Observation shape + the entries_run_observations DDL with a one-line rationale. mkdocs build --strict passes. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
xmap
added a commit
that referenced
this pull request
Jun 22, 2026
…l rules (#294) * feat(decision): add the three RunSupervision advise-rung choices Slice A of the observation-signal advise rung. Adds SupervisionQuieted (run-age liveness backstop), SupervisionStalled (Rule R rate-dropout), and SupervisionBreached (Rule Q quality-below-limit) to the RunSupervisionChoice Literal + RUN_SUPERVISION_CHOICES frozenset (7 -> 10), with the vocab test updated to the 10-value set + a work-noun guard on the new dispositions. WHY: promoting the shipped shadow observation-signal + run-liveness rules one rung (observe -> advise) means the supervisor records one Decision per breach edge for a human; that Decision's choice must exist in the closed set first. Decision-only dispositions (never a command). SupervisionBreached is the naming-r3 rename of the originally-proposed SupervisionDoubted: "Doubted" read as the supervisor's epistemic state; "Breached" names the objective limit-crossing, family-uniform with Deferred / Conflicted / Stalled. This slice adds vocabulary only; the supervisor emission lands next. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(api): promote the RunSupervisor shadow rules to the advise rung Slice B of the observation-signal advise rung. Adds run_supervisor_advise_enabled (default off, a further opt-in above each rule's own enable) and, when on, emits exactly one Decision per breach EDGE from the three shadow rules -- still issuing NO command (advise rung): - run-liveness backstop -> SupervisionQuieted - Rule R rate-dropout -> SupervisionStalled - Rule Q quality breach -> SupervisionBreached WHY: the shadow rules (#288 / #273) log would_flag but leave no durable record a human can triage. The advise rung climbs exactly one step (observe -> advise), recording one RunSupervision Decision per breach episode for a human while keeping the act rung (auto-Hold) deferred. Emission is edge-triggered off the already-walled per-rule memory (one Decision per episode; nothing on a standing breach across ticks), beam-free (the liveness rule runs before the beam read), and reuses the existing DecisionRegistered shape under the RunSupervisor identity + Authorize path. Shadow logging is unchanged; advise only adds the Decision. cannot-tell still defers (no Decision). Tests cover advise-off (no Decision), each disposition under advise-on (one Decision, no command), and edge-triggering (one Decision across two ticks of a standing breach). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(api): cover advise-rung edge-trigger + cannot-tell gates Gate-review follow-ups (the advise diff drew 2 ship + 1 changes_needed, the last purely a test-coverage gap; the correctness/trust lens passed clean). Adds three tests: - advise liveness is edge-triggered: two ticks of a standing stale Run record only ONE SupervisionQuieted Decision (parity with the quality + stall edge-trigger tests). - advise records no Decision when the quality channel has no observation (cannot-tell -> defer; pins that the value-None path never emits, which a reviewer worried about -- the decider returns would_flag=False on None). - advise records no Decision when the rule is disabled (snr_limit None): advise respects each rule's own enable, not just the global advise flag. Test-only; no production change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(api): cover the advise-emitter ConcurrencyError no-op branch The diff-coverage gate (hard 90% on changed lines) flagged _run_supervisor.py at 88.9%: the new _record_supervision_advice except ConcurrencyError branch (lines 490-491) was uncovered. Adds an idempotency test that re-derives the same advise Decision id (via a FixedIdGenerator repeating the id) so the second append collides and is swallowed -- mirrors the existing test_record_decision_is_idempotent_on_repeated_id for the beam-Hold path. Test-only; covers the cross-restart re-emission no-op. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Ships the observation-signal closed-loop seam for the RunSupervisor: a generic way for CORA to read a live Run's named scalar channels and react, shaped by two consumer rules and running shadow + off by default. Implements the design lock
project_observation_signal_port_design(judge-panel + 4-lens gate review).CORA owns the contract; each deployment owns the real EPICS/tomoStream adapter. No new ingest port (the existing
AppendObservationswrite path is reused); the read side is a Run-BC-localRunChannelLookupover the existingentries_run_observationstable (no new projection).Why
This is the FINE per-channel watch-a-live-run detector the coarse run-age liveness rule cannot be. It is built trust-first so the watchdog cannot be fooled: every cannot-tell path defers (never acts on missing data), freshness keys on the CORA-owned
recorded_at(never the spoofable producersampled_at), a dead/never-seen feeder defers and warns loudly, and the rules are OBSERVE-ONLY (logwould_flag, record no Decision, issue no command). Climbing the autonomy ladder observe -> advise -> act; only the observe rung lands here.Slices (one commit each, each green through the full pre-commit suite)
is_simulatedprovenance column onentries_run_observations+ field onObservationInput/route/MCP tool.RunChannelLookupread port (read_run_channel_latest+read_run_channel_window) + Postgres adapter + in-memory stub + load-bearing(run_id, channel_name, recorded_at DESC)index.entries_run_feed_heartbeats+FeedHeartbeatStore+read_feed_health(the dead-feeder seam).snr_limit+expected_observation_interval_secondsonproj_run_summary(RunStarted and RunAdjusted arms) +RunSummaryItem._supervise_tick+ per-rule walled memory + hysteresis + config (off by default) + unit/Hypothesis/shadow tests.SimObservationFeeder(sim-only; distinct sim principal) + dead-feeder integration test.is_simulatedmigration comment to the surface-not-filter design; lock the shadow audit field with a test.Trust posture
run_supervisor_enabled(rules inert unlessrun_quality_channel_name/run_stall_channel_nameset).SupervisionQuieted/SupervisionDoubted/SupervisionStalledin the Decision BC) and act (reversible Hold) are deferred to later rungs.Gate review
4-lens code review on the assembled diff: one
ship, threeship_with_p1_fixes, nochanges_needed. The single agreed P1 (a misleading migration comment) is fixed in commit 7. One reviewer's "P0" (is_simulated missing from logs) was a verified false alarm: it is already logged on both shadow lines, now locked by a test.Deferred (documented follow-ups, not blockers)
Test plan
Unit (deciders, every cannot-tell branch + Hypothesis PBTs), shadow behavioral (observe-only, walled memory, hysteresis), projection precompute (both arms), and Postgres integration (RunChannelLookup, heartbeat, sim feeder dead-feeder transition). Full suite + architecture fitness green on every feature commit.
🤖 Generated with Claude Code