feat(loops): opt-in session-continuation + checkpoint-fork lineage for runLoop (backend-blind)#150
Conversation
…ackend-blind) Two @experimental, default-OFF seams on runLoop so a loop can CONTINUE a sandbox session across iterations (same box + sessionId, no prompt-text replay) and FORK fanout branches from a parent checkpoint (shared context prefix) — both behind a capability probe so the kernel asks 'can I fork?' (client.criuStatus) and never names Docker/Firecracker, degrading to fresh boxes when CRIU is absent. - sandbox-capabilities.ts: memoized, fail-closed criuStatus probe -> {canFork}. - sandbox-lineage.ts: createSandboxLineage owns box+session handles with start/continue/fork/teardown; reuses the kernel's acquireSandbox / buildBackendOptions / deleteBoxSafe; fail-loud if the probe says canFork but the box has no fork(). - run-loop.ts: RunLoopOptions.lineage (sessionContinuity / forkFanout); refine continues, fanout forks-once, else fresh-through-lineage. Default OFF is byte-identical to today, so random@k stays N independent fresh boxes (the compute-control invariant). Rejects lineage + onWorkerBox (both own boxes). - 7 new unit tests (continuation reuses session; fork when canFork; fresh fallback; default-off invariant). Full suite 621 pass, typecheck clean.
🔍 Reviewing
|
| Pass | Status | ETA |
|---|---|---|
| opencode DeepSeek v4 Pro | Running (4 min) | ~5-15 min |
| opencode GLM 5.1 | Running (4 min) | ~5-15 min |
Agent review running. Reads the actual code. This comment updates in place.
tangletools · #150 · model: kimi-for-coding · started 2026-06-04T12:56:01Z
tangletools
left a comment
There was a problem hiding this comment.
✅ Approved — 7 non-blocking findings — 5c3c9cde
Full multi-shot audit completed 2/2 planned shots over 6 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 2/2 planned shots over 6 changed files. Global verifier still owns final merge decision.
Full immutable report for this review: trace
Latest PR review status: sticky summary
tangletools · 2026-06-04T11:47:25Z · immutable trace
drewstone
left a comment
There was a problem hiding this comment.
Review
Verdict: approve to land. @experimental + default-OFF, and the byte-identical-when-off invariant is pinned by a test, so the blast radius on the existing kernel is ~zero. Clean architecture: backend-blind via the criuStatus() capability bit (never "Docker or Firecracker?"), fail-closed probe, fail-loud on the probe-says-fork / box-can't-fork contract violation, reuses acquireSandbox / buildBackendOptions / deleteBoxSafe instead of reinventing them, and the lineage owns every box it starts or forks.
Six findings — one is a silent-correctness landmine to clear before the flag is ever enabled.
1. (Blocker-before-enabling) Client-minted sessionId may be a silent no-op
start() mints loop-sess-<uuid> and passes it on the first streamPrompt({ sessionId }); continue() re-passes it. If the platform ignores a client-supplied id and mints its own (or scopes context per-box rather than per-session), then continue silently runs a contextless turn while the loop believes it is continuing — the "faithful context, no truncation tax" property quietly inverts and nothing fails. This is the one real risk in the PR. Raise it from "worth confirming" to must verify against a live platform, with a fail-loud assertion that the returned session matches the minted id, before sessionContinuity is trusted.
2. (Major) Fork box-creation is not bounded by maxConcurrency
fork() does Promise.all(prompts.map(...)) — the first branch's acquire() triggers all N checkpoint+forks (or N fresh acquires on the degraded path) at once. The kernel bounds the streaming by concurrency, but box creation now fires unbounded, where today's fresh fanout is rate-limited. A 20-way fanout under maxConcurrency: 2 provisions 20 boxes simultaneously. Bound the fork/acquire fan with the same pool.
3. (Major) forkFanout ignores per-branch specs under a real fork
A real fork inherits the parent's image; specs[i] is honored only on the degraded fresh path. So a heterogeneous fanout (different model/profile per branch) silently runs every branch on the parent's profile when CRIU is present. Fine for homogeneous diverse@k, wrong for heterogeneous arms. Document this in LoopLineageOptions.forkFanout (it is in the PR body, not the type doc).
4. (Major) The lineage holds every box alive until loop end
handles keeps each iteration's box so a later round can descend from it, but never prunes non-descended branches — so the live-box ceiling becomes total iterations across all rounds, not the active frontier. Today each box dies after its iteration. A 5-round × 3-fanout steered loop holds ~15 live boxes to the end. Prune handles that can no longer be descended from, or at least document the ceiling.
5. (Minor) Abort is not responsive during fork
forkFromCheckpoint calls fork.call(box, checkpointId) with no signal; an abort mid-Promise.all will not cancel in-flight forks (they complete, get owned, get reaped at teardown — no leak, just unresponsive). No test covers abort-under-lineage.
6. (Nit) Export order
In index.ts, LoopLineageOptions is inserted before LoopIterationDispatchPayload — not alphabetical; Biome may flag it.
None of 1–6 block landing (default-off). #1 blocks enabling; #3 and #4 should be documented now. The test suite is solid on the happy paths + the independence invariant + teardown, but has no abort-under-lineage and (necessarily) no live-CRIU pass.
Context for whoever picks up the follow-ups: this is the leaf-level continued-session/fork dial that the recursive-execution-atom work composes on top of — the sandbox executor there forwards this lineage passthrough rather than reinventing checkpoint/fork, so the items above (esp. #1 and #3) are load-bearing for that layer too.
…-loud session continuity Resolve all six findings from the review (none blocked landing; #1 gated enabling, #3/#4 wanted documenting). Lineage remains default-OFF and byte-identical to the fresh-box path when both flags are unset. - #1 sessionContinuity silent no-op: `continue` now asserts the session is still known to the sandbox via `box.session(id).status()` before streaming. A `null` (platform never honored the client-minted id, or it was reaped) raises a ValidationError, which executeIteration now propagates as a hard structural failure instead of degrading to a soft empty iteration — so a non-honoring platform errors loudly rather than running contextless turns. - #2 unbounded fork creation: `fork` provisions child boxes through `mapWithConcurrency` bounded by the loop's `maxConcurrency`, not a single `Promise.all` over all N branches. - #3 fork ignores per-branch specs: documented on `fork` and `LoopLineageOptions.forkFanout` that a real CRIU fork inherits the parent image/profile (per-branch specs apply only on the degraded fresh path). - #4 lineage holds every box to loop end: kernel prunes boxes no future round can descend from after each round, gated on a kernel-inferred (monotonic) branch point — skipped when the driver authors its own `parentIndex`. The unprunable case is documented as the box ceiling. - #5 abort during fork: documented the SDK's signal-less fork; abort is now checked per branch (between bounded waves) + an abort-under-lineage test. - #6 export order: alphabetized the loops barrel. Adds `mapWithConcurrency` util and six lineage tests (session-liveness pass/ fail, bounded-fork peak, mid-loop prune, no-prune-under-authored-parent, abort-under-lineage). 627 tests pass, typecheck + biome clean.
tangletools
left a comment
There was a problem hiding this comment.
✅ Refreshed approval after new commits — 9e953884
A previous trusted approval on this PR was invalidated by new commits.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: stale_approval_refresh · 2026-06-04T12:50:18Z
|
Addressed all six review findings in 9e95388 (lineage stays default-OFF; the no-lineage path is unchanged). 627 tests pass, typecheck + biome clean.
New tests: session-liveness pass/fail, bounded-fork peak ≤ maxConcurrency, mid-loop prune frees non-frontier boxes, no-prune under an authored parent, abort-under-lineage teardown. |
…ervisor keystone (#151) * docs(research): track RSI driver architecture research Capture the design thread as tracked research docs under docs/research: - recursive-execution-atom: the next generation (one recursive Agent atom run as a durable, observable supervision tree; analyst-as-agent-with-runtime; async dynamic spawning), the proposed surface, the file-grounded gap, and the open forks. Plane B contains the flat harness. - flat-harness-design: the assumption-free experiment harness synthesis (profiles x steer x executionMode x allocation). Plane A. - long-horizon-benchmark-survey: adversarially-verified survey; Commit0 and tau2-bench as the multi-turn picks. Index them under the new Research track in docs/README.md. * docs(research): fold the wnrxtvdta reconcile into the recursive-atom spec Freeze the contract: the budget-conserving reactive Scope + Supervisor keystone, the event-sourced SpawnJournal + ResultBlobStore (outRef replay), the LeafExecutor per-harness model (harness:null = Router inference, sandbox, cli), and the 8-step v1 build order. Records the 4 resolved forks and the operator override (build the LLM meta-driver now as the treatment on top of the conserved reservation pool, so the equal-k gate stays valid by construction). * docs(research): open LeafExecutor interface + compose PR #150 lineage Operator refinement: the runtime is ONE open interface with an execute that returns a promise or an async stream, not a closed inline|sandbox|cli union. Built-ins are implementations; a user agent (mastra/agno/HTTP/custom) is first-class by implementing it; no per-vendor adapters. The sandbox executor composes runLoop and forwards PR #150's lineage passthrough rather than reinventing checkpoint/fork. Records the #150 review (approve-to-land; verify client-minted sessionId, bound fork acquisition, document parent-image fork). * feat(loops): recursive execution atom — budget-conserving Scope + Supervisor The v1 keystone for the recursive-agent-atom (drivers-of-drivers, async, observable). One self-similar atom whose act spawns child agents through a Scope; the flat experiment harness is recovered as the simplest act. - src/loops/supervise/types.ts: the frozen contract — Agent, an OPEN LeafExecutor interface (execute returns a promise or an async stream; router/inline + sandbox + cli are implementations, BYO is first-class via the registry; no per-vendor adapters), Scope, Supervisor, Settled, Budget. - supervise/budget.ts: a conserved reservation pool — atomic reserve-on-spawn, fail-closed admission, refund-on-settle, so equal-k holds by construction (the invariant that keeps a steered arm from silently out-computing blind). - durable/spawn-journal.ts: event-sourced SpawnJournal + content-addressed ResultBlobStore + seq-ordered replay (resumable, queryable, reproducible). - supervise/scope.ts: a ray.wait cursor over an in-memory nursery; spawn reserves budget and resolves the executor through the open registry; a Settled-to-Iteration adapter keeps defaultSelectWinner single-sourced. - supervise/runtime.ts: the open executor registry; the sandbox executor composes runLoop (forwarding an optional lineage passthrough) rather than reinventing checkpoint/fork. - supervise/supervisor.ts: nursery join barrier, abort cascade (incl. the acquire lifecycle), OTP intensity breaker, typed SupervisedResult, RootHandle (view/signal/abort) as the observability substrate. - bench/src/drivers: flat-harness control (with the equal-k assertion), progressive-widening control, and the LLM-meta-driver treatment. The WidenGate defaults to flat so the selector-not-judge firewall stays dormant; widening reads trace findings, never the raw verdict, unless judgeExempt. - program.ts: mapPool one-for-one failure semantics (a down child is excluded from the merge, an all-down batch re-throws the first original error so a maxDepth guard still propagates loud). Verified: typecheck, lint (204 files), build, full suite (642 tests incl. 28 keystone property tests), bench typecheck. Caveat: the live executor paths (real router HTTP, sandbox runLoop, cli subprocess) are exercised only through the offline mock LeafExecutor; no live-backend run yet.
What
Two
@experimental, default-OFF seams onrunLoopso a loop can do more than N-fresh-boxes-per-round:streamPrompt({ sessionId }), instead of a fresh box + re-injecting prior context as prompt text. Faithful context, no truncation tax.checkpoint({ leaveRunning: true })+fork(checkpointId)so they inherit a shared context prefix, instead of N from-scratch boxes.Composed, these are continue-for-depth + fork-for-breadth = the continue-in-fork tree, mapping onto the Driver's existing
refine | fanout | selectmoves.Backend-blind by construction
The kernel never asks "Docker or Firecracker?" — only
client.criuStatus()→{ canFork }. Forking degrades to fresh boxes when CRIU is unavailable, so identical code runs on any backend.src/loops/sandbox-capabilities.ts— memoized (per-clientWeakMap), fail-closed probe: nocriuStatusmethod or a throwing/available:falseprobe ⇒canFork=false.src/loops/sandbox-lineage.ts—createSandboxLineage(client, caps)owns box+session handles:start/continue/fork/teardown. Reuses the kernel's ownacquireSandbox/buildBackendOptions/deleteBoxSafe(no re-implementation). Fail-loud: if the probe sayscanForkbut the box has nofork(), it throwsValidationErrorrather than silently degrading.src/loops/run-loop.ts— newRunLoopOptions.lineage(sessionContinuity/forkFanout). Per round: refine+continuity+live-parent ⇒ continue; fanout+fork+live-parent ⇒ fork-once (one shared checkpoint across branches); else fresh-through-lineage. Round 0 always fresh. Rejectslineage+onWorkerBoxtogether (both own boxes).The invariant
Default OFF ⇒ behavior is byte-identical to today — fresh box per iteration,
streamPrompt(msg, { signal })with no sessionId. Sorandom@kstays N independent fresh boxes (the compute-control independence is sacred and must never be forked/continued). This is pinned by a test.Tests / verification
tests/loops/sandbox-lineage.test.ts(7, all pass): (a) invariant — continuity OFF ⇒ 2 distinct fresh boxes, no sessionId; (b) ON ⇒ refine reuses the same box+session; (c) fanout forks whencriuStatus.available, degrades to fresh when not (and when the client has nocriuStatus); + guardrails (lineage+onWorkerBox rejected; teardown reaps every owned box incl. forks).pnpm typecheck(strict) → 0 errors.pnpm vitest run→ 621 passed, 0 failed.Reviewer eyes-on (honest gaps)
streamPromptreturns anAsyncGenerator(no result object), so the lineage mints the id and passes it on the firststreamPrompt({ sessionId })— matching the SDK'sdispatchPrompt({ sessionId })/session(id)idempotency contract. Worth confirming the platform accepts a client-supplied id onstreamPrompt.specsfor fresh boxes; the real-fork path inherits the parent image (specs ignored for box creation, perbox.fork()semantics).Opt-in + default-off means this is safe to land and adopt incrementally (e.g. refine→continuation first, fork behind
criuStatussecond).