chore: agent-runtime 0.5.5 — unify agent-eval / agent-knowledge dep tree#6
Merged
Conversation
…se 0.5.5 The agent-eval ^0.23.0 pin already landed in #3, but downstream lockfiles still resolve agent-runtime@0.5.4 → a transitive agent-eval@0.20.12 entry. Cutting 0.5.5 invalidates that lockfile entry so every consumer collapses to a single agent-eval 0.23.x copy after their next pnpm install. agent-runtime does not directly depend on agent-knowledge, so no agent-knowledge pin change is required here — consumers pick up ^1.2.0 via their own package.json after this release lands.
drewstone
added a commit
that referenced
this pull request
May 10, 2026
…5.6) (#7) Add createRuntimeStreamEventCollector — a sibling of createRuntimeEventCollector typed for RuntimeStreamEvent. Honors the same RuntimeTelemetryOptions redaction flags (includeInputs, includeUserAnswers, includeControlPayloads, includeEvidenceIds, includeMetadata, includeRequirementDescriptions, includeEvalDetails) and returns the same {events, onEvent} interface plus a summary() function that rolls up event counts, session id, final status, and concatenated text_delta.text. Sibling factory rather than overload because stream and non-stream events have different field shapes (timestamps, sessions, text/tool deltas) and overlapping type literals (task_start, readiness_end, …) — a unified dispatcher would silently misroute events. Adds the streaming-collector example mirror at examples/sanitized-telemetry-streaming/. Documents in README that task.intent flows through sanitized telemetry by default and must never carry user input; route user-visible intent through inputs (redacted by default) instead. Bumps 0.5.4 → 0.5.6 (intentionally skipping 0.5.5; PR #6 currently holds 0.5.5 and is expected to land in series).
tangletools
pushed a commit
that referenced
this pull request
Jun 4, 2026
…-loud session continuity Resolve all six findings from the review (none blocked landing; #1 gated enabling, #3/#4 wanted documenting). Lineage remains default-OFF and byte-identical to the fresh-box path when both flags are unset. - #1 sessionContinuity silent no-op: `continue` now asserts the session is still known to the sandbox via `box.session(id).status()` before streaming. A `null` (platform never honored the client-minted id, or it was reaped) raises a ValidationError, which executeIteration now propagates as a hard structural failure instead of degrading to a soft empty iteration — so a non-honoring platform errors loudly rather than running contextless turns. - #2 unbounded fork creation: `fork` provisions child boxes through `mapWithConcurrency` bounded by the loop's `maxConcurrency`, not a single `Promise.all` over all N branches. - #3 fork ignores per-branch specs: documented on `fork` and `LoopLineageOptions.forkFanout` that a real CRIU fork inherits the parent image/profile (per-branch specs apply only on the degraded fresh path). - #4 lineage holds every box to loop end: kernel prunes boxes no future round can descend from after each round, gated on a kernel-inferred (monotonic) branch point — skipped when the driver authors its own `parentIndex`. The unprunable case is documented as the box ceiling. - #5 abort during fork: documented the SDK's signal-less fork; abort is now checked per branch (between bounded waves) + an abort-under-lineage test. - #6 export order: alphabetized the loops barrel. Adds `mapWithConcurrency` util and six lineage tests (session-liveness pass/ fail, bounded-fork peak, mid-loop prune, no-prune-under-authored-parent, abort-under-lineage). 627 tests pass, typecheck + biome clean.
drewstone
added a commit
that referenced
this pull request
Jun 4, 2026
…r runLoop (backend-blind) (#150) * feat(loops): opt-in session continuation + checkpoint-fork lineage (backend-blind) Two @experimental, default-OFF seams on runLoop so a loop can CONTINUE a sandbox session across iterations (same box + sessionId, no prompt-text replay) and FORK fanout branches from a parent checkpoint (shared context prefix) — both behind a capability probe so the kernel asks 'can I fork?' (client.criuStatus) and never names Docker/Firecracker, degrading to fresh boxes when CRIU is absent. - sandbox-capabilities.ts: memoized, fail-closed criuStatus probe -> {canFork}. - sandbox-lineage.ts: createSandboxLineage owns box+session handles with start/continue/fork/teardown; reuses the kernel's acquireSandbox / buildBackendOptions / deleteBoxSafe; fail-loud if the probe says canFork but the box has no fork(). - run-loop.ts: RunLoopOptions.lineage (sessionContinuity / forkFanout); refine continues, fanout forks-once, else fresh-through-lineage. Default OFF is byte-identical to today, so random@k stays N independent fresh boxes (the compute-control invariant). Rejects lineage + onWorkerBox (both own boxes). - 7 new unit tests (continuation reuses session; fork when canFork; fresh fallback; default-off invariant). Full suite 621 pass, typecheck clean. * fix(loops): address PR #150 review — bound forks, prune lineage, fail-loud session continuity Resolve all six findings from the review (none blocked landing; #1 gated enabling, #3/#4 wanted documenting). Lineage remains default-OFF and byte-identical to the fresh-box path when both flags are unset. - #1 sessionContinuity silent no-op: `continue` now asserts the session is still known to the sandbox via `box.session(id).status()` before streaming. A `null` (platform never honored the client-minted id, or it was reaped) raises a ValidationError, which executeIteration now propagates as a hard structural failure instead of degrading to a soft empty iteration — so a non-honoring platform errors loudly rather than running contextless turns. - #2 unbounded fork creation: `fork` provisions child boxes through `mapWithConcurrency` bounded by the loop's `maxConcurrency`, not a single `Promise.all` over all N branches. - #3 fork ignores per-branch specs: documented on `fork` and `LoopLineageOptions.forkFanout` that a real CRIU fork inherits the parent image/profile (per-branch specs apply only on the degraded fresh path). - #4 lineage holds every box to loop end: kernel prunes boxes no future round can descend from after each round, gated on a kernel-inferred (monotonic) branch point — skipped when the driver authors its own `parentIndex`. The unprunable case is documented as the box ceiling. - #5 abort during fork: documented the SDK's signal-less fork; abort is now checked per branch (between bounded waves) + an abort-under-lineage test. - #6 export order: alphabetized the loops barrel. Adds `mapWithConcurrency` util and six lineage tests (session-liveness pass/ fail, bounded-fork peak, mid-loop prune, no-prune-under-authored-parent, abort-under-lineage). 627 tests pass, typecheck + biome clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
After the 0.23 fleet bumps, every downstream consumer's lockfile still
carries a transitive
@tangle-network/agent-eval@0.20.12becauseagent-runtime@0.5.4(published with the^0.23.0pin from #3) is stillthe latest version on the registry, and a stale lockfile entry for
agent-runtime@0.5.4 → agent-eval@0.20.12persists in consumers thatlast resolved before #3 landed.
Cutting 0.5.5 is the minimal version bump that invalidates that
lockfile entry so every consumer collapses to a single
agent-evalcopyon their next
pnpm install.Bumps
@tangle-network/agent-runtime(this pkg)0.5.40.5.5@tangle-network/agent-eval(dep)^0.23.0^0.23.0(unchanged — already at target since #3)@tangle-network/agent-knowledge^1.2.0via their ownpackage.json)The only file change is
package.json'sversionfield. Thepnpm-lock.yamlalready resolves
agent-eval@0.23.0cleanly — no lockfile delta required.API adaptations
None. The
agent-eval^0.23.0pin landed in #3 and source compilesclean against the 0.23 surface —
pnpm typecheckis green, no driftfrom 0.21 capture-integrity, 0.22 EvalCampaign, or 0.23 RL primitives
reached anything
agent-runtimere-exports.Test plan
pnpm install— lockfile resolves cleanly (no spec churn)pnpm typecheck— cleanpnpm build—dist/index.js34.31 KB,dist/index.d.ts18.40 KBpnpm test— 16/16 passed (tests/runtime.test.ts)Follow-up
npm publish of
0.5.5is a separate step after this merges.