feat(loops): recursive execution atom — budget-conserving Scope + Supervisor keystone#151
Merged
Merged
Conversation
Capture the design thread as tracked research docs under docs/research: - recursive-execution-atom: the next generation (one recursive Agent atom run as a durable, observable supervision tree; analyst-as-agent-with-runtime; async dynamic spawning), the proposed surface, the file-grounded gap, and the open forks. Plane B contains the flat harness. - flat-harness-design: the assumption-free experiment harness synthesis (profiles x steer x executionMode x allocation). Plane A. - long-horizon-benchmark-survey: adversarially-verified survey; Commit0 and tau2-bench as the multi-turn picks. Index them under the new Research track in docs/README.md.
…spec Freeze the contract: the budget-conserving reactive Scope + Supervisor keystone, the event-sourced SpawnJournal + ResultBlobStore (outRef replay), the LeafExecutor per-harness model (harness:null = Router inference, sandbox, cli), and the 8-step v1 build order. Records the 4 resolved forks and the operator override (build the LLM meta-driver now as the treatment on top of the conserved reservation pool, so the equal-k gate stays valid by construction).
Operator refinement: the runtime is ONE open interface with an execute that returns a promise or an async stream, not a closed inline|sandbox|cli union. Built-ins are implementations; a user agent (mastra/agno/HTTP/custom) is first-class by implementing it; no per-vendor adapters. The sandbox executor composes runLoop and forwards PR #150's lineage passthrough rather than reinventing checkpoint/fork. Records the #150 review (approve-to-land; verify client-minted sessionId, bound fork acquisition, document parent-image fork).
…ervisor The v1 keystone for the recursive-agent-atom (drivers-of-drivers, async, observable). One self-similar atom whose act spawns child agents through a Scope; the flat experiment harness is recovered as the simplest act. - src/loops/supervise/types.ts: the frozen contract — Agent, an OPEN LeafExecutor interface (execute returns a promise or an async stream; router/inline + sandbox + cli are implementations, BYO is first-class via the registry; no per-vendor adapters), Scope, Supervisor, Settled, Budget. - supervise/budget.ts: a conserved reservation pool — atomic reserve-on-spawn, fail-closed admission, refund-on-settle, so equal-k holds by construction (the invariant that keeps a steered arm from silently out-computing blind). - durable/spawn-journal.ts: event-sourced SpawnJournal + content-addressed ResultBlobStore + seq-ordered replay (resumable, queryable, reproducible). - supervise/scope.ts: a ray.wait cursor over an in-memory nursery; spawn reserves budget and resolves the executor through the open registry; a Settled-to-Iteration adapter keeps defaultSelectWinner single-sourced. - supervise/runtime.ts: the open executor registry; the sandbox executor composes runLoop (forwarding an optional lineage passthrough) rather than reinventing checkpoint/fork. - supervise/supervisor.ts: nursery join barrier, abort cascade (incl. the acquire lifecycle), OTP intensity breaker, typed SupervisedResult, RootHandle (view/signal/abort) as the observability substrate. - bench/src/drivers: flat-harness control (with the equal-k assertion), progressive-widening control, and the LLM-meta-driver treatment. The WidenGate defaults to flat so the selector-not-judge firewall stays dormant; widening reads trace findings, never the raw verdict, unless judgeExempt. - program.ts: mapPool one-for-one failure semantics (a down child is excluded from the merge, an all-down batch re-throws the first original error so a maxDepth guard still propagates loud). Verified: typecheck, lint (204 files), build, full suite (642 tests incl. 28 keystone property tests), bench typecheck. Caveat: the live executor paths (real router HTTP, sandbox runLoop, cli subprocess) are exercised only through the offline mock LeafExecutor; no live-backend run yet.
Contributor
🔍 Reviewing
|
| Pass | Status | ETA |
|---|---|---|
| opencode DeepSeek v4 Pro | Running (2 min) | ~5-15 min |
| opencode GLM 5.1 | Running (2 min) | ~5-15 min |
Agent review running. Reads the actual code. This comment updates in place.
tangletools · #151 · model: kimi-for-coding · started 2026-06-04T13:05:48Z
…ion-atom # Conflicts: # src/loops/index.ts
drewstone
added a commit
that referenced
this pull request
Jun 6, 2026
Cuts the 58-commit backlog on main into a published release. Headline surface: - runToolLoop / streamToolLoop — bounded turn-level tool-dispatch loop (#137) - RSI agent tree: recursive Agent.act, Supervisor keystone, runProgram, the adaptive-driver channel (#139/#151/#165) - optimization API collapsed onto agent-eval selfImprove; the runtime keeps the CODE-surface ImprovementDriver you pass as driver (#172) - deployable benchmark adapters: AppWorld, commit0, aec-bench, EnterpriseOps-Gym; runBenchmarks over one ADAPTERS registry (#153/#156/#157) - agent-eval floor raised to >=0.83.0 (#175)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
The v1 recursive execution atom — the keystone for drivers-of-drivers: one self-similar
Agentwhoseactspawns child agents through aScope, run by aSupervisorthat owns a conserved budget pool, an event-sourced journal, and an observability/conversation handle. The flat experiment harness is recovered as the simplestact(it does not compete with this — it is one program over it).Design + decision record:
docs/research/recursive-execution-atom.md(frozen contract, build order, the 4 resolved forks, the adversarial critique that shaped the surface).The pieces
src/loops/supervise/types.ts— the frozen contract.Agent, an openLeafExecutorinterface (executereturns a promise or an async stream;router/inline+sandbox+cliare implementations, a user's own agent is first-class via the registry/BYO — no per-vendor adapters),Scope,Supervisor,Settled,Budget.supervise/budget.ts— a conserved reservation pool: atomic reserve-on-spawn, fail-closed admission, refund-on-settle. This is the load-bearing invariant:Σk(treatment) ≡ Σk(blind)holds by construction, so a steered arm can never silently out-compute blind (the confound that burned the earlier "+20pp steering" result).src/durable/spawn-journal.ts— event-sourcedSpawnJournal+ content-addressedResultBlobStore+ seq-ordered replay. Resumable, queryable, reproducible from one log.supervise/scope.ts— a ray.wait cursor over an in-memory nursery;spawnreserves budget and resolves the executor through the open registry; aSettled → Iterationadapter keepsdefaultSelectWinnersingle-sourced.supervise/runtime.ts— the open executor registry; thesandboxexecutor composesrunLoop(forwarding an optionallineagepassthrough) rather than reinventing checkpoint/fork.supervise/supervisor.ts— nursery join barrier, abort cascade (incl. the acquire lifecycle), OTP intensity breaker, typedSupervisedResult,RootHandle(view/signal/abort) as the observability substrate.bench/src/drivers/—flat-harnesscontrol (with the equal-k assertion),progressive-wideningcontrol, and the LLM-meta-driver treatment.WidenGatedefaults to flat so the selector≠judge firewall stays dormant; widening reads trace findings, never the raw verdict, unlessjudgeExempt.program.ts—mapPoolone-for-one failure semantics (a down child is excluded from the merge; an all-down batch re-throws the first original error so amaxDepthguard still propagates loud).Verification
Independently re-run on this branch:
typecheck✓,lint✓ (204 files),build✓, full suite 642/642 (incl. 28 keystone property tests: conserved-budget fail-closed + refund, equal-k by construction, monotonic-seq cursor, abort/teardown, replay determinism), benchtypecheck✓.Honest gaps (why draft)
router/sandbox/cliexecutors are exercised only through the offline mockLeafExecutor— they typecheck and the wiring is proven, but no real router-HTTP / sandbox-runLoop/ cli-subprocess run has happened.runProgram's loop-layerparallelop.Relationship to #150
#150 adds the leaf-level continued-session/fork
lineageonrunLoop. This is the driver layer on top — thesandboxexecutor forwards thatlineagepassthrough rather than duplicating it. #150's findings (esp. verify the client-mintedsessionId) are load-bearing here too.