Skip to content

feat(loops): opt-in session-continuation + checkpoint-fork lineage for runLoop (backend-blind)#150

Merged
drewstone merged 2 commits into
mainfrom
feat/runloop-session-continuation-and-fork
Jun 4, 2026
Merged

feat(loops): opt-in session-continuation + checkpoint-fork lineage for runLoop (backend-blind)#150
drewstone merged 2 commits into
mainfrom
feat/runloop-session-continuation-and-fork

Conversation

@drewstone
Copy link
Copy Markdown
Contributor

What

Two @experimental, default-OFF seams on runLoop so a loop can do more than N-fresh-boxes-per-round:

  1. Session continuation (any path) — continue a sandbox session across iterations: same box, streamPrompt({ sessionId }), instead of a fresh box + re-injecting prior context as prompt text. Faithful context, no truncation tax.
  2. Checkpoint + fork (fanout breadth) — branch fanout children from a parent's checkpoint({ leaveRunning: true }) + fork(checkpointId) so they inherit a shared context prefix, instead of N from-scratch boxes.

Composed, these are continue-for-depth + fork-for-breadth = the continue-in-fork tree, mapping onto the Driver's existing refine | fanout | select moves.

Backend-blind by construction

The kernel never asks "Docker or Firecracker?" — only client.criuStatus(){ canFork }. Forking degrades to fresh boxes when CRIU is unavailable, so identical code runs on any backend.

  • src/loops/sandbox-capabilities.ts — memoized (per-client WeakMap), fail-closed probe: no criuStatus method or a throwing/available:false probe ⇒ canFork=false.
  • src/loops/sandbox-lineage.tscreateSandboxLineage(client, caps) owns box+session handles: start / continue / fork / teardown. Reuses the kernel's own acquireSandbox / buildBackendOptions / deleteBoxSafe (no re-implementation). Fail-loud: if the probe says canFork but the box has no fork(), it throws ValidationError rather than silently degrading.
  • src/loops/run-loop.ts — new RunLoopOptions.lineage (sessionContinuity / forkFanout). Per round: refine+continuity+live-parent ⇒ continue; fanout+fork+live-parent ⇒ fork-once (one shared checkpoint across branches); else fresh-through-lineage. Round 0 always fresh. Rejects lineage + onWorkerBox together (both own boxes).

The invariant

Default OFF ⇒ behavior is byte-identical to today — fresh box per iteration, streamPrompt(msg, { signal }) with no sessionId. So random@k stays N independent fresh boxes (the compute-control independence is sacred and must never be forked/continued). This is pinned by a test.

Tests / verification

  • tests/loops/sandbox-lineage.test.ts (7, all pass): (a) invariant — continuity OFF ⇒ 2 distinct fresh boxes, no sessionId; (b) ON ⇒ refine reuses the same box+session; (c) fanout forks when criuStatus.available, degrades to fresh when not (and when the client has no criuStatus); + guardrails (lineage+onWorkerBox rejected; teardown reaps every owned box incl. forks).
  • pnpm typecheck (strict) → 0 errors. pnpm vitest run621 passed, 0 failed.

Reviewer eyes-on (honest gaps)

  • Caller-minted sessionId: streamPrompt returns an AsyncGenerator (no result object), so the lineage mints the id and passes it on the first streamPrompt({ sessionId }) — matching the SDK's dispatchPrompt({ sessionId }) / session(id) idempotency contract. Worth confirming the platform accepts a client-supplied id on streamPrompt.
  • Fork degraded path round-robins specs for fresh boxes; the real-fork path inherits the parent image (specs ignored for box creation, per box.fork() semantics).
  • Not run against a live CRIU host — verification is unit-level with fakes. Wants one integration pass on a real Firecracker/CRIU sandbox before anyone flips the flags in production.

Opt-in + default-off means this is safe to land and adopt incrementally (e.g. refine→continuation first, fork behind criuStatus second).

…ackend-blind)

Two @experimental, default-OFF seams on runLoop so a loop can CONTINUE a sandbox
session across iterations (same box + sessionId, no prompt-text replay) and FORK
fanout branches from a parent checkpoint (shared context prefix) — both behind a
capability probe so the kernel asks 'can I fork?' (client.criuStatus) and never
names Docker/Firecracker, degrading to fresh boxes when CRIU is absent.

- sandbox-capabilities.ts: memoized, fail-closed criuStatus probe -> {canFork}.
- sandbox-lineage.ts: createSandboxLineage owns box+session handles with
  start/continue/fork/teardown; reuses the kernel's acquireSandbox /
  buildBackendOptions / deleteBoxSafe; fail-loud if the probe says canFork but
  the box has no fork().
- run-loop.ts: RunLoopOptions.lineage (sessionContinuity / forkFanout); refine
  continues, fanout forks-once, else fresh-through-lineage. Default OFF is
  byte-identical to today, so random@k stays N independent fresh boxes (the
  compute-control invariant). Rejects lineage + onWorkerBox (both own boxes).
- 7 new unit tests (continuation reuses session; fork when canFork; fresh
  fallback; default-off invariant). Full suite 621 pass, typecheck clean.
@tangletools
Copy link
Copy Markdown
Contributor

tangletools commented Jun 4, 2026

🔍 Reviewing 9e953884

Pass Status ETA
opencode DeepSeek v4 Pro Running (4 min) ~5-15 min
opencode GLM 5.1 Running (4 min) ~5-15 min

Agent review running. Reads the actual code. This comment updates in place.

tangletools · #150 · model: kimi-for-coding · started 2026-06-04T12:56:01Z

tangletools
tangletools previously approved these changes Jun 4, 2026
Copy link
Copy Markdown
Contributor

@tangletools tangletools left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Approved — 7 non-blocking findings — 5c3c9cde

Full multi-shot audit completed 2/2 planned shots over 6 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 2/2 planned shots over 6 changed files. Global verifier still owns final merge decision.

Full immutable report for this review: trace

Latest PR review status: sticky summary


tangletools · 2026-06-04T11:47:25Z · immutable trace

Copy link
Copy Markdown
Contributor Author

@drewstone drewstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Verdict: approve to land. @experimental + default-OFF, and the byte-identical-when-off invariant is pinned by a test, so the blast radius on the existing kernel is ~zero. Clean architecture: backend-blind via the criuStatus() capability bit (never "Docker or Firecracker?"), fail-closed probe, fail-loud on the probe-says-fork / box-can't-fork contract violation, reuses acquireSandbox / buildBackendOptions / deleteBoxSafe instead of reinventing them, and the lineage owns every box it starts or forks.

Six findings — one is a silent-correctness landmine to clear before the flag is ever enabled.

1. (Blocker-before-enabling) Client-minted sessionId may be a silent no-op

start() mints loop-sess-<uuid> and passes it on the first streamPrompt({ sessionId }); continue() re-passes it. If the platform ignores a client-supplied id and mints its own (or scopes context per-box rather than per-session), then continue silently runs a contextless turn while the loop believes it is continuing — the "faithful context, no truncation tax" property quietly inverts and nothing fails. This is the one real risk in the PR. Raise it from "worth confirming" to must verify against a live platform, with a fail-loud assertion that the returned session matches the minted id, before sessionContinuity is trusted.

2. (Major) Fork box-creation is not bounded by maxConcurrency

fork() does Promise.all(prompts.map(...)) — the first branch's acquire() triggers all N checkpoint+forks (or N fresh acquires on the degraded path) at once. The kernel bounds the streaming by concurrency, but box creation now fires unbounded, where today's fresh fanout is rate-limited. A 20-way fanout under maxConcurrency: 2 provisions 20 boxes simultaneously. Bound the fork/acquire fan with the same pool.

3. (Major) forkFanout ignores per-branch specs under a real fork

A real fork inherits the parent's image; specs[i] is honored only on the degraded fresh path. So a heterogeneous fanout (different model/profile per branch) silently runs every branch on the parent's profile when CRIU is present. Fine for homogeneous diverse@k, wrong for heterogeneous arms. Document this in LoopLineageOptions.forkFanout (it is in the PR body, not the type doc).

4. (Major) The lineage holds every box alive until loop end

handles keeps each iteration's box so a later round can descend from it, but never prunes non-descended branches — so the live-box ceiling becomes total iterations across all rounds, not the active frontier. Today each box dies after its iteration. A 5-round × 3-fanout steered loop holds ~15 live boxes to the end. Prune handles that can no longer be descended from, or at least document the ceiling.

5. (Minor) Abort is not responsive during fork

forkFromCheckpoint calls fork.call(box, checkpointId) with no signal; an abort mid-Promise.all will not cancel in-flight forks (they complete, get owned, get reaped at teardown — no leak, just unresponsive). No test covers abort-under-lineage.

6. (Nit) Export order

In index.ts, LoopLineageOptions is inserted before LoopIterationDispatchPayload — not alphabetical; Biome may flag it.


None of 1–6 block landing (default-off). #1 blocks enabling; #3 and #4 should be documented now. The test suite is solid on the happy paths + the independence invariant + teardown, but has no abort-under-lineage and (necessarily) no live-CRIU pass.

Context for whoever picks up the follow-ups: this is the leaf-level continued-session/fork dial that the recursive-execution-atom work composes on top of — the sandbox executor there forwards this lineage passthrough rather than reinventing checkpoint/fork, so the items above (esp. #1 and #3) are load-bearing for that layer too.

…-loud session continuity

Resolve all six findings from the review (none blocked landing; #1 gated
enabling, #3/#4 wanted documenting). Lineage remains default-OFF and
byte-identical to the fresh-box path when both flags are unset.

- #1 sessionContinuity silent no-op: `continue` now asserts the session is
  still known to the sandbox via `box.session(id).status()` before streaming.
  A `null` (platform never honored the client-minted id, or it was reaped)
  raises a ValidationError, which executeIteration now propagates as a hard
  structural failure instead of degrading to a soft empty iteration — so a
  non-honoring platform errors loudly rather than running contextless turns.
- #2 unbounded fork creation: `fork` provisions child boxes through
  `mapWithConcurrency` bounded by the loop's `maxConcurrency`, not a single
  `Promise.all` over all N branches.
- #3 fork ignores per-branch specs: documented on `fork` and
  `LoopLineageOptions.forkFanout` that a real CRIU fork inherits the parent
  image/profile (per-branch specs apply only on the degraded fresh path).
- #4 lineage holds every box to loop end: kernel prunes boxes no future round
  can descend from after each round, gated on a kernel-inferred (monotonic)
  branch point — skipped when the driver authors its own `parentIndex`. The
  unprunable case is documented as the box ceiling.
- #5 abort during fork: documented the SDK's signal-less fork; abort is now
  checked per branch (between bounded waves) + an abort-under-lineage test.
- #6 export order: alphabetized the loops barrel.

Adds `mapWithConcurrency` util and six lineage tests (session-liveness pass/
fail, bounded-fork peak, mid-loop prune, no-prune-under-authored-parent,
abort-under-lineage). 627 tests pass, typecheck + biome clean.
Copy link
Copy Markdown
Contributor

@tangletools tangletools left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Refreshed approval after new commits — 9e953884

A previous trusted approval on this PR was invalidated by new commits.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: stale_approval_refresh · 2026-06-04T12:50:18Z

@drewstone
Copy link
Copy Markdown
Contributor Author

Addressed all six review findings in 9e95388 (lineage stays default-OFF; the no-lineage path is unchanged). 627 tests pass, typecheck + biome clean.

# Finding Resolution
1 client-minted sessionId silent no-op continue asserts box.session(id).status() before streaming; a null (id not honored / reaped) raises a ValidationError, which executeIteration now propagates as a hard structural failure instead of a soft empty iteration. A non-honoring platform errors loudly. Still gate enabling on a live-platform check — the assertion proves the session exists, not that turns replay.
2 unbounded fork box creation fork provisions children via new mapWithConcurrency bounded by the loop's maxConcurrency, not one Promise.all over all N.
3 per-branch specs ignored under a real fork Documented on fork + LoopLineageOptions.forkFanout: a CRIU fork inherits the parent image/profile; per-branch specs apply only on the degraded fresh path.
4 lineage holds every box to loop end Kernel prunes boxes no future round can descend from after each round, gated on a kernel-inferred (monotonic) branch point — skipped when the driver authors its own parentIndex (that ceiling is documented).
5 abort not responsive during fork SDK ForkOptions has no signal (documented); abort is now checked per branch between bounded waves + an abort-under-lineage test.
6 export order Alphabetized the loops barrel (biome clean).

New tests: session-liveness pass/fail, bounded-fork peak ≤ maxConcurrency, mid-loop prune frees non-frontier boxes, no-prune under an authored parent, abort-under-lineage teardown.

@drewstone drewstone merged commit 4f38515 into main Jun 4, 2026
1 check passed
drewstone added a commit that referenced this pull request Jun 4, 2026
…ervisor keystone (#151)

* docs(research): track RSI driver architecture research

Capture the design thread as tracked research docs under docs/research:
- recursive-execution-atom: the next generation (one recursive Agent atom
  run as a durable, observable supervision tree; analyst-as-agent-with-runtime;
  async dynamic spawning), the proposed surface, the file-grounded gap, and the
  open forks. Plane B contains the flat harness.
- flat-harness-design: the assumption-free experiment harness synthesis
  (profiles x steer x executionMode x allocation). Plane A.
- long-horizon-benchmark-survey: adversarially-verified survey; Commit0 and
  tau2-bench as the multi-turn picks.
Index them under the new Research track in docs/README.md.

* docs(research): fold the wnrxtvdta reconcile into the recursive-atom spec

Freeze the contract: the budget-conserving reactive Scope + Supervisor keystone,
the event-sourced SpawnJournal + ResultBlobStore (outRef replay), the LeafExecutor
per-harness model (harness:null = Router inference, sandbox, cli), and the 8-step
v1 build order. Records the 4 resolved forks and the operator override (build the
LLM meta-driver now as the treatment on top of the conserved reservation pool, so
the equal-k gate stays valid by construction).

* docs(research): open LeafExecutor interface + compose PR #150 lineage

Operator refinement: the runtime is ONE open interface with an execute that
returns a promise or an async stream, not a closed inline|sandbox|cli union.
Built-ins are implementations; a user agent (mastra/agno/HTTP/custom) is
first-class by implementing it; no per-vendor adapters. The sandbox executor
composes runLoop and forwards PR #150's lineage passthrough rather than
reinventing checkpoint/fork. Records the #150 review (approve-to-land; verify
client-minted sessionId, bound fork acquisition, document parent-image fork).

* feat(loops): recursive execution atom — budget-conserving Scope + Supervisor

The v1 keystone for the recursive-agent-atom (drivers-of-drivers, async,
observable). One self-similar atom whose act spawns child agents through a
Scope; the flat experiment harness is recovered as the simplest act.

- src/loops/supervise/types.ts: the frozen contract — Agent, an OPEN
  LeafExecutor interface (execute returns a promise or an async stream;
  router/inline + sandbox + cli are implementations, BYO is first-class via
  the registry; no per-vendor adapters), Scope, Supervisor, Settled, Budget.
- supervise/budget.ts: a conserved reservation pool — atomic reserve-on-spawn,
  fail-closed admission, refund-on-settle, so equal-k holds by construction
  (the invariant that keeps a steered arm from silently out-computing blind).
- durable/spawn-journal.ts: event-sourced SpawnJournal + content-addressed
  ResultBlobStore + seq-ordered replay (resumable, queryable, reproducible).
- supervise/scope.ts: a ray.wait cursor over an in-memory nursery; spawn
  reserves budget and resolves the executor through the open registry; a
  Settled-to-Iteration adapter keeps defaultSelectWinner single-sourced.
- supervise/runtime.ts: the open executor registry; the sandbox executor
  composes runLoop (forwarding an optional lineage passthrough) rather than
  reinventing checkpoint/fork.
- supervise/supervisor.ts: nursery join barrier, abort cascade (incl. the
  acquire lifecycle), OTP intensity breaker, typed SupervisedResult, RootHandle
  (view/signal/abort) as the observability substrate.
- bench/src/drivers: flat-harness control (with the equal-k assertion),
  progressive-widening control, and the LLM-meta-driver treatment. The
  WidenGate defaults to flat so the selector-not-judge firewall stays dormant;
  widening reads trace findings, never the raw verdict, unless judgeExempt.
- program.ts: mapPool one-for-one failure semantics (a down child is excluded
  from the merge, an all-down batch re-throws the first original error so a
  maxDepth guard still propagates loud).

Verified: typecheck, lint (204 files), build, full suite (642 tests incl. 28
keystone property tests), bench typecheck. Caveat: the live executor paths
(real router HTTP, sandbox runLoop, cli subprocess) are exercised only through
the offline mock LeafExecutor; no live-backend run yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants