Skip to content

feat(run): shadow closed-loop observation-signal rules in the RunSupervisor#288

Merged
xmap merged 7 commits into
mainfrom
worktree-observation-signal-port
Jun 21, 2026
Merged

feat(run): shadow closed-loop observation-signal rules in the RunSupervisor#288
xmap merged 7 commits into
mainfrom
worktree-observation-signal-port

Conversation

@xmap

@xmap xmap commented Jun 21, 2026

Copy link
Copy Markdown
Owner

What

Ships the observation-signal closed-loop seam for the RunSupervisor: a generic way for CORA to read a live Run's named scalar channels and react, shaped by two consumer rules and running shadow + off by default. Implements the design lock project_observation_signal_port_design (judge-panel + 4-lens gate review).

  • Rule Q (quality-within-limits): a channel's latest value vs an operator-SET limit.
  • Rule R (rate-dropout / stall): a channel's live arrival rate vs an expected interval, beam-aware + dead-feeder-aware + hysteresis.

CORA owns the contract; each deployment owns the real EPICS/tomoStream adapter. No new ingest port (the existing AppendObservations write path is reused); the read side is a Run-BC-local RunChannelLookup over the existing entries_run_observations table (no new projection).

Why

This is the FINE per-channel watch-a-live-run detector the coarse run-age liveness rule cannot be. It is built trust-first so the watchdog cannot be fooled: every cannot-tell path defers (never acts on missing data), freshness keys on the CORA-owned recorded_at (never the spoofable producer sampled_at), a dead/never-seen feeder defers and warns loudly, and the rules are OBSERVE-ONLY (log would_flag, record no Decision, issue no command). Climbing the autonomy ladder observe -> advise -> act; only the observe rung lands here.

Slices (one commit each, each green through the full pre-commit suite)

  1. is_simulated provenance column on entries_run_observations + field on ObservationInput/route/MCP tool.
  2. RunChannelLookup read port (read_run_channel_latest + read_run_channel_window) + Postgres adapter + in-memory stub + load-bearing (run_id, channel_name, recorded_at DESC) index.
  3. Append-only entries_run_feed_heartbeats + FeedHeartbeatStore + read_feed_health (the dead-feeder seam).
  4. Precompute snr_limit + expected_observation_interval_seconds on proj_run_summary (RunStarted and RunAdjusted arms) + RunSummaryItem.
  5. Two pure deciders + shadow pass in _supervise_tick + per-rule walled memory + hysteresis + config (off by default) + unit/Hypothesis/shadow tests.
  6. SimObservationFeeder (sim-only; distinct sim principal) + dead-feeder integration test.
  7. Gate-review fixes: correct the is_simulated migration comment to the surface-not-filter design; lock the shadow audit field with a test.

Trust posture

  • Off by default, a second gate above run_supervisor_enabled (rules inert unless run_quality_channel_name / run_stall_channel_name set).
  • Shadow only: no Decision, no command. Advise (SupervisionQuieted/SupervisionDoubted/SupervisionStalled in the Decision BC) and act (reversible Hold) are deferred to later rungs.
  • Per-rule memory walled off from the beam-Hold FSM and from each other; sim data surfaced (not hidden) and logged so a forensic read can tell a real breach from a rehearsal.

Gate review

4-lens code review on the assembled diff: one ship, three ship_with_p1_fixes, no changes_needed. The single agreed P1 (a misleading migration comment) is fixed in commit 7. One reviewer's "P0" (is_simulated missing from logs) was a verified false alarm: it is already logged on both shadow lines, now locked by a test.

Deferred (documented follow-ups, not blockers)

  • Advise + act rungs (Decision-BC vocab edit; reversible Hold; act-mode sim composition guard).
  • Self-calibrated expected interval; method-schema validation of degenerate rule-input defaults.
  • The real deployment feeder + its lifespan/Authorize grant (deployment-owned).

Test plan

Unit (deciders, every cannot-tell branch + Hypothesis PBTs), shadow behavioral (observe-only, walled memory, hysteresis), projection precompute (both arms), and Postgres integration (RunChannelLookup, heartbeat, sim feeder dead-feeder transition). Full suite + architecture fitness green on every feature commit.

🤖 Generated with Claude Code

xmap and others added 7 commits June 21, 2026 11:43
…p read seam

Slice 1 of the observation-signal port design lock
([[project_observation_signal_port_design]]). Adds an additive
is_simulated boolean column to entries_run_observations (NOT NULL
DEFAULT false, forward-only) plus the field on ObservationInput / the
REST request / the MCP tool / the Observation row.

WHY: the closed-loop quality + stall rules must never act on simulated
data as if it were real. The flag travels WITH the datum (not a route
registry: observations key on operator channel_name, not a substrate
address, and one channel can carry real data on one Run and sim data on
another). A single boolean is the right grain: a row has one origin; the
window-level mixed case is an OR-fold on the read side. Default false
keeps the route back-compatible and is the safe direction for the gate
(a real adapter that omits the flag reads as real). The read-side mirror
of the Operation BC ActuationKind "any simulator touch disqualifies"
gate. openapi.json + atlas.sum regenerated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Slice 2 of the observation-signal port design lock
([[project_observation_signal_port_design]]). Adds the read half of the
closed-loop seam: a Run-BC-local RunChannelLookup port (run/ports/) with
read_run_channel_latest (Rule Q point read) and read_run_channel_window
(Rule R windowed count_since), a PostgresRunChannelLookup adapter
(run/adapters/) querying the existing entries_run_observations table, an
InMemoryRunChannelLookup stub (seedable; unseeded = always-quiet default),
and a required (run_id, channel_name, recorded_at DESC) btree index.

WHY: the rules need to read a live Run's channels without a new
projection (a projection would cost a permanent fold and read staler than
the source on the very freshness signal Rule R depends on). The port is
BC-local because its sole consumer is the composition-root supervisor
(EnclosureObserver precedent); promote to infrastructure/ports only on a
real second cross-BC consumer. Freshness keys on recorded_at (CORA
write-time), never the spoofable sampled_at; the read surfaces
is_simulated rather than filtering it, so a mislabeled-sim real row still
shows (the decider disqualifies sim, the safe direction) and the sim
feeder can exercise the rules end to end. The new index is load-bearing:
the pre-existing sampled_at indexes carry no channel_name and order by
the wrong column. atlas.sum regenerated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Slice 3 of the observation-signal port design lock
([[project_observation_signal_port_design]]). Adds an append-only
entries_run_feed_heartbeats table (one row per drain tick; REVOKE
UPDATE/DELETE/TRUNCATE per the entries_* append-only contract), a
BC-internal FeedHeartbeatStore (Postgres + InMemory, idempotent on
event_id), and a read_feed_health method on RunChannelLookup returning
the newest heartbeat recorded_at (RunFeedHealth VO).

WHY: Rule R must distinguish a genuinely quiet channel from a DEAD
feeder. The feeder pings a heartbeat every tick regardless of data flow;
the decider treats a heartbeat older than the operator ceiling (or none
at all) as cannot-tell and defers, so a dead feeder disables the stall
flag instead of an absent channel masquerading as a stall. Append-only
INSERT not UPSERT: an UPSERT needs UPDATE, which the append-only role is
REVOKEd from; MAX(recorded_at) answers "newest heartbeat" without mutable
state. Freshness keys on recorded_at (CORA write time), not the
producer-asserted heartbeat_at. The adapter returns the raw recorded_at;
the decider owns the clock + ceiling so liveness derivation stays pure.
atlas.sum regenerated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Slice 4 of the observation-signal port design lock
([[project_observation_signal_port_design]]). Adds nullable snr_limit +
expected_observation_interval_seconds columns to proj_run_summary,
derived from effective_parameters on BOTH the RunStarted INSERT arm and
the RunAdjusted UPDATE arm, and surfaced on RunSummaryItem /
_SELECT_COLUMNS / _row_to_item (the running_since precedent).

WHY: the two rules need a per-Run scalar each, and the supervisor reads
RunSummaryItem every tick. Precompute keeps that an O(1) projection-column
read instead of a per-Running-run fold (the supervisor does ZERO such
folds today; a fold per tick would add stream-replay I/O on the common
case). Recompute on RunAdjusted is IN v1, not deferred: RunAdjusted
already re-snapshots effective_parameters and is already subscribed, so a
mid-run re-cadence must refresh the inputs in the same arm or Rule R
evaluates against a stale interval (false alarm or masked stall, the
exact stale-baseline failure the watchdog guards against). The inputs are
operator-declared keys (not hardwired to n_projections, avoiding the
one-observation-per-projection assumption); absent / non-positive /
non-finite values land NULL, which disables that rule for the Run
(cannot-tell -> defer). atlas.sum regenerated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Slice 5 of the observation-signal port design lock
([[project_observation_signal_port_design]]). Adds two pure deciders --
decide_quality_signal (Rule Q: latest channel value vs the operator-set
limit) and decide_signal_stall (Rule R: beam-aware rate-dropout) -- and a
shadow OBSERVE-ONLY pass inside _supervise_tick that logs
run_quality.would_flag / run_stall.would_flag, records NO Decision and
issues NO command (byte-identical to the run-liveness shadow). New config
gates both rules off by default (run_quality_channel_name /
run_stall_channel_name None) above run_supervisor_enabled.

WHY: this is the FINE per-channel detector the coarse run-age liveness
rule cannot be. Each rule keeps its own edge-trigger state (quality,
stall + the stall_streak hysteresis counter, feed_dead_warned) walled off
from the beam-Hold FSM memory and from each other, so one rule flapping
cannot corrupt another. Rule R reuses the tick's single beam read (a gap
while beam is down is expected, never a stall), checks feeder health
FIRST (a dead/never-seen feeder defers and warns loudly, never reads as a
calm run), and requires run_stall_hysteresis_ticks consecutive
stall-condition ticks (anti-flap for top-ups). Every cannot-tell path
(missing signal, disabled rule, dead feeder, beam down, degenerate
interval) defers (Lock 4). RunChannelLookup is constructed BC-local at
the composition root, not on the Kernel. Shadow flags on the merits incl.
simulated data (observe-only, so the sim feeder exercises it); the
is_simulated act-disqualify lands with the advise/act rung. Pure deciders
live in api/ (outside the feature-tree decider gates) so they carry
explicit unit + Hypothesis property tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… to end

Slice 6 (final shadow-v1 slice) of the observation-signal port design lock
([[project_observation_signal_port_design]]). Adds SimObservationFeeder, a
replay feeder that drives the rules end to end over the REAL write path:
each drain emits the elapsed trace points (is_simulated=True) via the
AppendObservations handler and pings a heartbeat, under a distinct sim
principal (SIM_OBSERVATION_FEEDER_AGENT_ID). CORA ships ONLY the sim
feeder; each deployment writes its real EPICS / tomoStream feeder against
the same write path (no new ingest port).

WHY: the rules need to be exercisable without a real IOC, and the
gate-review required a dead-feeder integration test. The two integration
tests prove (1) sim rows carry is_simulated=True and are attributable to
the sim principal (the split that lets authz / actor_id tell sim from
real), and (2) the dead-feeder transition: a fresh heartbeat lets a
zero-arrival channel stall-flag, while a stale heartbeat (feeder stopped +
clock past the ceiling) defers (feed_dead) -- a dead feeder is never read
as a calm stall. The real-feeder lifespan + its Authorize grant are
deployment-owned (out of the spine for shadow v1).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…dit field

Gate-review follow-ups for the observation-signal port. Three reviewers
flagged a P1: the is_simulated migration comment claimed "the production
read path filters is_simulated = false", which contradicts the implemented
design. The read SURFACES is_simulated (it does not hard-filter, so a
mislabeled-sim real row is not hidden and the sim feeder can exercise the
rules); the DECIDER disqualifies sim in the future act/advise rung, and
shadow observes + logs even simulated breaches. The comment now states
that accurately and points at the port docstring.

Also adds a unit test asserting run_quality.would_flag carries the
is_simulated field, so an operator's forensic log read can tell a real
breach from a simulator rehearsal and a refactor cannot silently drop the
audit field. (A fourth reviewer's "P0" that the field was missing was a
false alarm: it is already logged on both shadow lines; this test locks it
in.) atlas.sum regenerated for the comment change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  apps/api/src/cora/api
  _run_supervisor.py 734, 736-737, 739-740, 742-743, 903
  apps/api/src/cora/infrastructure
  config.py 517-521, 529-533, 542-546
  apps/api/src/cora/infrastructure/ports
  clock.py
  apps/api/src/cora/run/adapters
  __init__.py
  postgres_run_channel_lookup.py
  sim_observation_feeder.py
  apps/api/src/cora/run/aggregates/run
  __init__.py
  entries.py
  feed_heartbeats.py 66
  apps/api/src/cora/run/features/append_observations
  command.py
  route.py
  tool.py
  apps/api/src/cora/run/features/list_runs
  handler.py
  apps/api/src/cora/run/ports
  __init__.py
  run_channel_lookup.py
  apps/api/src/cora/run/projections
  summary.py
Project Total  

This report was generated by python-coverage-comment-action

@xmap xmap merged commit 933b962 into main Jun 21, 2026
16 checks passed
@xmap xmap deleted the worktree-observation-signal-port branch June 21, 2026 18:59
xmap added a commit that referenced this pull request Jun 21, 2026
…leet (#290)

The recent agent PRs (#233 gated resume, #266 ClearanceWatcher, #273
run-liveness, #288 observation-signal rules) shipped code-only; the
hand-authored module docs had drifted. Shape-level catch-up, no internals:

- agent/index.md: five -> six seeded agents (add ClearanceWatcher, a
  passive flag-only periodic-loop agent recording a ClearanceProgress
  Decision); note RunSupervisor also carries shadow observe-only rules.
- run/index.md: RunSupervisor now does gated autonomous resume (not
  wind-down only) and carries shadow run-liveness / signal-quality /
  signal-stall rules that log-only; add is_simulated to the Observation
  shape + the entries_run_observations DDL with a one-line rationale.

mkdocs build --strict passes.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
xmap added a commit that referenced this pull request Jun 22, 2026
…l rules (#294)

* feat(decision): add the three RunSupervision advise-rung choices

Slice A of the observation-signal advise rung. Adds SupervisionQuieted
(run-age liveness backstop), SupervisionStalled (Rule R rate-dropout), and
SupervisionBreached (Rule Q quality-below-limit) to the RunSupervisionChoice
Literal + RUN_SUPERVISION_CHOICES frozenset (7 -> 10), with the vocab test
updated to the 10-value set + a work-noun guard on the new dispositions.

WHY: promoting the shipped shadow observation-signal + run-liveness rules
one rung (observe -> advise) means the supervisor records one Decision per
breach edge for a human; that Decision's choice must exist in the closed
set first. Decision-only dispositions (never a command). SupervisionBreached
is the naming-r3 rename of the originally-proposed SupervisionDoubted:
"Doubted" read as the supervisor's epistemic state; "Breached" names the
objective limit-crossing, family-uniform with Deferred / Conflicted /
Stalled. This slice adds vocabulary only; the supervisor emission lands next.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(api): promote the RunSupervisor shadow rules to the advise rung

Slice B of the observation-signal advise rung. Adds run_supervisor_advise_enabled
(default off, a further opt-in above each rule's own enable) and, when on, emits
exactly one Decision per breach EDGE from the three shadow rules -- still issuing
NO command (advise rung):
  - run-liveness backstop  -> SupervisionQuieted
  - Rule R rate-dropout    -> SupervisionStalled
  - Rule Q quality breach  -> SupervisionBreached

WHY: the shadow rules (#288 / #273) log would_flag but leave no durable record a
human can triage. The advise rung climbs exactly one step (observe -> advise),
recording one RunSupervision Decision per breach episode for a human while keeping
the act rung (auto-Hold) deferred. Emission is edge-triggered off the already-walled
per-rule memory (one Decision per episode; nothing on a standing breach across
ticks), beam-free (the liveness rule runs before the beam read), and reuses the
existing DecisionRegistered shape under the RunSupervisor identity + Authorize path.
Shadow logging is unchanged; advise only adds the Decision. cannot-tell still
defers (no Decision). Tests cover advise-off (no Decision), each disposition under
advise-on (one Decision, no command), and edge-triggering (one Decision across two
ticks of a standing breach).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(api): cover advise-rung edge-trigger + cannot-tell gates

Gate-review follow-ups (the advise diff drew 2 ship + 1 changes_needed, the
last purely a test-coverage gap; the correctness/trust lens passed clean).
Adds three tests:
  - advise liveness is edge-triggered: two ticks of a standing stale Run
    record only ONE SupervisionQuieted Decision (parity with the quality +
    stall edge-trigger tests).
  - advise records no Decision when the quality channel has no observation
    (cannot-tell -> defer; pins that the value-None path never emits, which a
    reviewer worried about -- the decider returns would_flag=False on None).
  - advise records no Decision when the rule is disabled (snr_limit None):
    advise respects each rule's own enable, not just the global advise flag.

Test-only; no production change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(api): cover the advise-emitter ConcurrencyError no-op branch

The diff-coverage gate (hard 90% on changed lines) flagged
_run_supervisor.py at 88.9%: the new _record_supervision_advice except
ConcurrencyError branch (lines 490-491) was uncovered. Adds an idempotency
test that re-derives the same advise Decision id (via a FixedIdGenerator
repeating the id) so the second append collides and is swallowed -- mirrors
the existing test_record_decision_is_idempotent_on_repeated_id for the
beam-Hold path. Test-only; covers the cross-restart re-emission no-op.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant