feat(loops): opt-in session-continuation + checkpoint-fork lineage for runLoop (backend-blind) by drewstone · Pull Request #150 · tangle-network/agent-runtime

drewstone · 2026-06-04T11:33:52Z

What

Two @experimental, default-OFF seams on runLoop so a loop can do more than N-fresh-boxes-per-round:

Session continuation (any path) — continue a sandbox session across iterations: same box, streamPrompt({ sessionId }), instead of a fresh box + re-injecting prior context as prompt text. Faithful context, no truncation tax.
Checkpoint + fork (fanout breadth) — branch fanout children from a parent's checkpoint({ leaveRunning: true }) + fork(checkpointId) so they inherit a shared context prefix, instead of N from-scratch boxes.

Composed, these are continue-for-depth + fork-for-breadth = the continue-in-fork tree, mapping onto the Driver's existing refine | fanout | select moves.

Backend-blind by construction

The kernel never asks "Docker or Firecracker?" — only client.criuStatus() → { canFork }. Forking degrades to fresh boxes when CRIU is unavailable, so identical code runs on any backend.

src/loops/sandbox-capabilities.ts — memoized (per-client WeakMap), fail-closed probe: no criuStatus method or a throwing/available:false probe ⇒ canFork=false.
src/loops/sandbox-lineage.ts — createSandboxLineage(client, caps) owns box+session handles: start / continue / fork / teardown. Reuses the kernel's own acquireSandbox / buildBackendOptions / deleteBoxSafe (no re-implementation). Fail-loud: if the probe says canFork but the box has no fork(), it throws ValidationError rather than silently degrading.
src/loops/run-loop.ts — new RunLoopOptions.lineage (sessionContinuity / forkFanout). Per round: refine+continuity+live-parent ⇒ continue; fanout+fork+live-parent ⇒ fork-once (one shared checkpoint across branches); else fresh-through-lineage. Round 0 always fresh. Rejects lineage + onWorkerBox together (both own boxes).

The invariant

Default OFF ⇒ behavior is byte-identical to today — fresh box per iteration, streamPrompt(msg, { signal }) with no sessionId. So random@k stays N independent fresh boxes (the compute-control independence is sacred and must never be forked/continued). This is pinned by a test.

Tests / verification

tests/loops/sandbox-lineage.test.ts (7, all pass): (a) invariant — continuity OFF ⇒ 2 distinct fresh boxes, no sessionId; (b) ON ⇒ refine reuses the same box+session; (c) fanout forks when criuStatus.available, degrades to fresh when not (and when the client has no criuStatus); + guardrails (lineage+onWorkerBox rejected; teardown reaps every owned box incl. forks).
pnpm typecheck (strict) → 0 errors. pnpm vitest run → 621 passed, 0 failed.

Reviewer eyes-on (honest gaps)

Caller-minted sessionId: streamPrompt returns an AsyncGenerator (no result object), so the lineage mints the id and passes it on the first streamPrompt({ sessionId }) — matching the SDK's dispatchPrompt({ sessionId }) / session(id) idempotency contract. Worth confirming the platform accepts a client-supplied id on streamPrompt.
Fork degraded path round-robins specs for fresh boxes; the real-fork path inherits the parent image (specs ignored for box creation, per box.fork() semantics).
Not run against a live CRIU host — verification is unit-level with fakes. Wants one integration pass on a real Firecracker/CRIU sandbox before anyone flips the flags in production.

Opt-in + default-off means this is safe to land and adopt incrementally (e.g. refine→continuation first, fork behind criuStatus second).

@experimental

…ackend-blind) Two @experimental, default-OFF seams on runLoop so a loop can CONTINUE a sandbox session across iterations (same box + sessionId, no prompt-text replay) and FORK fanout branches from a parent checkpoint (shared context prefix) — both behind a capability probe so the kernel asks 'can I fork?' (client.criuStatus) and never names Docker/Firecracker, degrading to fresh boxes when CRIU is absent. - sandbox-capabilities.ts: memoized, fail-closed criuStatus probe -> {canFork}. - sandbox-lineage.ts: createSandboxLineage owns box+session handles with start/continue/fork/teardown; reuses the kernel's acquireSandbox / buildBackendOptions / deleteBoxSafe; fail-loud if the probe says canFork but the box has no fork(). - run-loop.ts: RunLoopOptions.lineage (sessionContinuity / forkFanout); refine continues, fanout forks-once, else fresh-through-lineage. Default OFF is byte-identical to today, so random@k stays N independent fresh boxes (the compute-control invariant). Rejects lineage + onWorkerBox (both own boxes). - 7 new unit tests (continuation reuses session; fork when canFork; fresh fallback; default-off invariant). Full suite 621 pass, typecheck clean.

tangletools · 2026-06-04T11:35:34Z

🔍 Reviewing `9e953884`

Pass	Status	ETA
opencode DeepSeek v4 Pro	Running (4 min)	~5-15 min
opencode GLM 5.1	Running (4 min)	~5-15 min

Agent review running. Reads the actual code. This comment updates in place.

_{tangletools · #150 · model: kimi-for-coding · started 2026-06-04T12:56:01Z}

tangletools

✅ Approved — 7 non-blocking findings — `5c3c9cde`

Full multi-shot audit completed 2/2 planned shots over 6 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 2/2 planned shots over 6 changed files. Global verifier still owns final merge decision.

Full immutable report for this review: trace

Latest PR review status: sticky summary

_{tangletools · 2026-06-04T11:47:25Z · immutable trace}

drewstone

Review

Verdict: approve to land. @experimental + default-OFF, and the byte-identical-when-off invariant is pinned by a test, so the blast radius on the existing kernel is ~zero. Clean architecture: backend-blind via the criuStatus() capability bit (never "Docker or Firecracker?"), fail-closed probe, fail-loud on the probe-says-fork / box-can't-fork contract violation, reuses acquireSandbox / buildBackendOptions / deleteBoxSafe instead of reinventing them, and the lineage owns every box it starts or forks.

Six findings — one is a silent-correctness landmine to clear before the flag is ever enabled.

1. (Blocker-before-enabling) Client-minted `sessionId` may be a silent no-op

start() mints loop-sess-<uuid> and passes it on the first streamPrompt({ sessionId }); continue() re-passes it. If the platform ignores a client-supplied id and mints its own (or scopes context per-box rather than per-session), then continue silently runs a contextless turn while the loop believes it is continuing — the "faithful context, no truncation tax" property quietly inverts and nothing fails. This is the one real risk in the PR. Raise it from "worth confirming" to must verify against a live platform, with a fail-loud assertion that the returned session matches the minted id, before sessionContinuity is trusted.

2. (Major) Fork box-creation is not bounded by `maxConcurrency`

fork() does Promise.all(prompts.map(...)) — the first branch's acquire() triggers all N checkpoint+forks (or N fresh acquires on the degraded path) at once. The kernel bounds the streaming by concurrency, but box creation now fires unbounded, where today's fresh fanout is rate-limited. A 20-way fanout under maxConcurrency: 2 provisions 20 boxes simultaneously. Bound the fork/acquire fan with the same pool.

3. (Major) `forkFanout` ignores per-branch `specs` under a real fork

A real fork inherits the parent's image; specs[i] is honored only on the degraded fresh path. So a heterogeneous fanout (different model/profile per branch) silently runs every branch on the parent's profile when CRIU is present. Fine for homogeneous diverse@k, wrong for heterogeneous arms. Document this in LoopLineageOptions.forkFanout (it is in the PR body, not the type doc).

4. (Major) The lineage holds every box alive until loop end

handles keeps each iteration's box so a later round can descend from it, but never prunes non-descended branches — so the live-box ceiling becomes total iterations across all rounds, not the active frontier. Today each box dies after its iteration. A 5-round × 3-fanout steered loop holds ~15 live boxes to the end. Prune handles that can no longer be descended from, or at least document the ceiling.

5. (Minor) Abort is not responsive during fork

forkFromCheckpoint calls fork.call(box, checkpointId) with no signal; an abort mid-Promise.all will not cancel in-flight forks (they complete, get owned, get reaped at teardown — no leak, just unresponsive). No test covers abort-under-lineage.

6. (Nit) Export order

In index.ts, LoopLineageOptions is inserted before LoopIterationDispatchPayload — not alphabetical; Biome may flag it.

None of 1–6 block landing (default-off). #1 blocks enabling; #3 and #4 should be documented now. The test suite is solid on the happy paths + the independence invariant + teardown, but has no abort-under-lineage and (necessarily) no live-CRIU pass.

Context for whoever picks up the follow-ups: this is the leaf-level continued-session/fork dial that the recursive-execution-atom work composes on top of — the sandbox executor there forwards this lineage passthrough rather than reinventing checkpoint/fork, so the items above (esp. #1 and #3) are load-bearing for that layer too.

…-loud session continuity Resolve all six findings from the review (none blocked landing; #1 gated enabling, #3/#4 wanted documenting). Lineage remains default-OFF and byte-identical to the fresh-box path when both flags are unset. - #1 sessionContinuity silent no-op: `continue` now asserts the session is still known to the sandbox via `box.session(id).status()` before streaming. A `null` (platform never honored the client-minted id, or it was reaped) raises a ValidationError, which executeIteration now propagates as a hard structural failure instead of degrading to a soft empty iteration — so a non-honoring platform errors loudly rather than running contextless turns. - #2 unbounded fork creation: `fork` provisions child boxes through `mapWithConcurrency` bounded by the loop's `maxConcurrency`, not a single `Promise.all` over all N branches. - #3 fork ignores per-branch specs: documented on `fork` and `LoopLineageOptions.forkFanout` that a real CRIU fork inherits the parent image/profile (per-branch specs apply only on the degraded fresh path). - #4 lineage holds every box to loop end: kernel prunes boxes no future round can descend from after each round, gated on a kernel-inferred (monotonic) branch point — skipped when the driver authors its own `parentIndex`. The unprunable case is documented as the box ceiling. - #5 abort during fork: documented the SDK's signal-less fork; abort is now checked per branch (between bounded waves) + an abort-under-lineage test. - #6 export order: alphabetized the loops barrel. Adds `mapWithConcurrency` util and six lineage tests (session-liveness pass/ fail, bounded-fork peak, mid-loop prune, no-prune-under-authored-parent, abort-under-lineage). 627 tests pass, typecheck + biome clean.

tangletools

✅ Refreshed approval after new commits — `9e953884`

A previous trusted approval on this PR was invalidated by new commits.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: stale_approval_refresh · 2026-06-04T12:50:18Z}

drewstone · 2026-06-04T12:50:32Z

Addressed all six review findings in 9e95388 (lineage stays default-OFF; the no-lineage path is unchanged). 627 tests pass, typecheck + biome clean.

#	Finding	Resolution
1	client-minted `sessionId` silent no-op	`continue` asserts `box.session(id).status()` before streaming; a `null` (id not honored / reaped) raises a `ValidationError`, which `executeIteration` now propagates as a hard structural failure instead of a soft empty iteration. A non-honoring platform errors loudly. Still gate enabling on a live-platform check — the assertion proves the session exists, not that turns replay.
2	unbounded fork box creation	`fork` provisions children via new `mapWithConcurrency` bounded by the loop's `maxConcurrency`, not one `Promise.all` over all N.
3	per-branch specs ignored under a real fork	Documented on `fork` + `LoopLineageOptions.forkFanout`: a CRIU fork inherits the parent image/profile; per-branch specs apply only on the degraded fresh path.
4	lineage holds every box to loop end	Kernel prunes boxes no future round can descend from after each round, gated on a kernel-inferred (monotonic) branch point — skipped when the driver authors its own `parentIndex` (that ceiling is documented).
5	abort not responsive during fork	SDK `ForkOptions` has no signal (documented); abort is now checked per branch between bounded waves + an abort-under-lineage test.
6	export order	Alphabetized the loops barrel (biome clean).

New tests: session-liveness pass/fail, bounded-fork peak ≤ maxConcurrency, mid-loop prune frees non-frontier boxes, no-prune under an authored parent, abort-under-lineage teardown.

…ervisor keystone (#151) * docs(research): track RSI driver architecture research Capture the design thread as tracked research docs under docs/research: - recursive-execution-atom: the next generation (one recursive Agent atom run as a durable, observable supervision tree; analyst-as-agent-with-runtime; async dynamic spawning), the proposed surface, the file-grounded gap, and the open forks. Plane B contains the flat harness. - flat-harness-design: the assumption-free experiment harness synthesis (profiles x steer x executionMode x allocation). Plane A. - long-horizon-benchmark-survey: adversarially-verified survey; Commit0 and tau2-bench as the multi-turn picks. Index them under the new Research track in docs/README.md. * docs(research): fold the wnrxtvdta reconcile into the recursive-atom spec Freeze the contract: the budget-conserving reactive Scope + Supervisor keystone, the event-sourced SpawnJournal + ResultBlobStore (outRef replay), the LeafExecutor per-harness model (harness:null = Router inference, sandbox, cli), and the 8-step v1 build order. Records the 4 resolved forks and the operator override (build the LLM meta-driver now as the treatment on top of the conserved reservation pool, so the equal-k gate stays valid by construction). * docs(research): open LeafExecutor interface + compose PR #150 lineage Operator refinement: the runtime is ONE open interface with an execute that returns a promise or an async stream, not a closed inline|sandbox|cli union. Built-ins are implementations; a user agent (mastra/agno/HTTP/custom) is first-class by implementing it; no per-vendor adapters. The sandbox executor composes runLoop and forwards PR #150's lineage passthrough rather than reinventing checkpoint/fork. Records the #150 review (approve-to-land; verify client-minted sessionId, bound fork acquisition, document parent-image fork). * feat(loops): recursive execution atom — budget-conserving Scope + Supervisor The v1 keystone for the recursive-agent-atom (drivers-of-drivers, async, observable). One self-similar atom whose act spawns child agents through a Scope; the flat experiment harness is recovered as the simplest act. - src/loops/supervise/types.ts: the frozen contract — Agent, an OPEN LeafExecutor interface (execute returns a promise or an async stream; router/inline + sandbox + cli are implementations, BYO is first-class via the registry; no per-vendor adapters), Scope, Supervisor, Settled, Budget. - supervise/budget.ts: a conserved reservation pool — atomic reserve-on-spawn, fail-closed admission, refund-on-settle, so equal-k holds by construction (the invariant that keeps a steered arm from silently out-computing blind). - durable/spawn-journal.ts: event-sourced SpawnJournal + content-addressed ResultBlobStore + seq-ordered replay (resumable, queryable, reproducible). - supervise/scope.ts: a ray.wait cursor over an in-memory nursery; spawn reserves budget and resolves the executor through the open registry; a Settled-to-Iteration adapter keeps defaultSelectWinner single-sourced. - supervise/runtime.ts: the open executor registry; the sandbox executor composes runLoop (forwarding an optional lineage passthrough) rather than reinventing checkpoint/fork. - supervise/supervisor.ts: nursery join barrier, abort cascade (incl. the acquire lifecycle), OTP intensity breaker, typed SupervisedResult, RootHandle (view/signal/abort) as the observability substrate. - bench/src/drivers: flat-harness control (with the equal-k assertion), progressive-widening control, and the LLM-meta-driver treatment. The WidenGate defaults to flat so the selector-not-judge firewall stays dormant; widening reads trace findings, never the raw verdict, unless judgeExempt. - program.ts: mapPool one-for-one failure semantics (a down child is excluded from the merge, an all-down batch re-throws the first original error so a maxDepth guard still propagates loud). Verified: typecheck, lint (204 files), build, full suite (642 tests incl. 28 keystone property tests), bench typecheck. Caveat: the live executor paths (real router HTTP, sandbox runLoop, cli subprocess) are exercised only through the offline mock LeafExecutor; no live-backend run yet.

tangletools previously approved these changes Jun 4, 2026

View reviewed changes

drewstone commented Jun 4, 2026

View reviewed changes

drewstone mentioned this pull request Jun 4, 2026

feat(loops): recursive execution atom — budget-conserving Scope + Supervisor keystone #151

Merged

tangletools dismissed their stale review via 9e95388 June 4, 2026 12:50

tangletools approved these changes Jun 4, 2026

View reviewed changes

drewstone merged commit 4f38515 into main Jun 4, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(loops): opt-in session-continuation + checkpoint-fork lineage for runLoop (backend-blind)#150

feat(loops): opt-in session-continuation + checkpoint-fork lineage for runLoop (backend-blind)#150
drewstone merged 2 commits into
mainfrom
feat/runloop-session-continuation-and-fork

drewstone commented Jun 4, 2026

Uh oh!

tangletools commented Jun 4, 2026 •

edited

Loading

Uh oh!

tangletools left a comment

Uh oh!

drewstone left a comment

Uh oh!

tangletools left a comment

Uh oh!

drewstone commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drewstone commented Jun 4, 2026

What

Backend-blind by construction

The invariant

Tests / verification

Reviewer eyes-on (honest gaps)

Uh oh!

tangletools commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Reviewing 9e953884

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Approved — 7 non-blocking findings — 5c3c9cde

Uh oh!

drewstone left a comment

Choose a reason for hiding this comment

Review

1. (Blocker-before-enabling) Client-minted sessionId may be a silent no-op

2. (Major) Fork box-creation is not bounded by maxConcurrency

3. (Major) forkFanout ignores per-branch specs under a real fork

4. (Major) The lineage holds every box alive until loop end

5. (Minor) Abort is not responsive during fork

6. (Nit) Export order

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Refreshed approval after new commits — 9e953884

Uh oh!

drewstone commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tangletools commented Jun 4, 2026 •

edited

Loading

🔍 Reviewing `9e953884`

✅ Approved — 7 non-blocking findings — `5c3c9cde`

1. (Blocker-before-enabling) Client-minted `sessionId` may be a silent no-op

2. (Major) Fork box-creation is not bounded by `maxConcurrency`

3. (Major) `forkFanout` ignores per-branch `specs` under a real fork

✅ Refreshed approval after new commits — `9e953884`