chore: agent-runtime 0.5.5 — unify agent-eval / agent-knowledge dep tree by drewstone · Pull Request #6 · tangle-network/agent-runtime

drewstone · 2026-05-10T21:44:09Z

Why

After the 0.23 fleet bumps, every downstream consumer's lockfile still
carries a transitive @tangle-network/agent-eval@0.20.12 because
agent-runtime@0.5.4 (published with the ^0.23.0 pin from #3) is still
the latest version on the registry, and a stale lockfile entry for
agent-runtime@0.5.4 → agent-eval@0.20.12 persists in consumers that
last resolved before #3 landed.

Cutting 0.5.5 is the minimal version bump that invalidates that
lockfile entry so every consumer collapses to a single agent-eval copy
on their next pnpm install.

Bumps

Package	From	To
`@tangle-network/agent-runtime` (this pkg)	`0.5.4`	`0.5.5`
`@tangle-network/agent-eval` (dep)	`^0.23.0`	`^0.23.0` (unchanged — already at target since #3)
`@tangle-network/agent-knowledge`	n/a	n/a (agent-runtime does not depend on agent-knowledge directly; consumers pick up `^1.2.0` via their own `package.json`)

The only file change is package.json's version field. The pnpm-lock.yaml
already resolves agent-eval@0.23.0 cleanly — no lockfile delta required.

API adaptations

None. The agent-eval ^0.23.0 pin landed in #3 and source compiles
clean against the 0.23 surface — pnpm typecheck is green, no drift
from 0.21 capture-integrity, 0.22 EvalCampaign, or 0.23 RL primitives
reached anything agent-runtime re-exports.

Test plan

pnpm install — lockfile resolves cleanly (no spec churn)
pnpm typecheck — clean
pnpm build — dist/index.js 34.31 KB, dist/index.d.ts 18.40 KB
pnpm test — 16/16 passed (tests/runtime.test.ts)

Follow-up

npm publish of 0.5.5 is a separate step after this merges.

…se 0.5.5 The agent-eval ^0.23.0 pin already landed in #3, but downstream lockfiles still resolve agent-runtime@0.5.4 → a transitive agent-eval@0.20.12 entry. Cutting 0.5.5 invalidates that lockfile entry so every consumer collapses to a single agent-eval 0.23.x copy after their next pnpm install. agent-runtime does not directly depend on agent-knowledge, so no agent-knowledge pin change is required here — consumers pick up ^1.2.0 via their own package.json after this release lands.

…5.6) (#7) Add createRuntimeStreamEventCollector — a sibling of createRuntimeEventCollector typed for RuntimeStreamEvent. Honors the same RuntimeTelemetryOptions redaction flags (includeInputs, includeUserAnswers, includeControlPayloads, includeEvidenceIds, includeMetadata, includeRequirementDescriptions, includeEvalDetails) and returns the same {events, onEvent} interface plus a summary() function that rolls up event counts, session id, final status, and concatenated text_delta.text. Sibling factory rather than overload because stream and non-stream events have different field shapes (timestamps, sessions, text/tool deltas) and overlapping type literals (task_start, readiness_end, …) — a unified dispatcher would silently misroute events. Adds the streaming-collector example mirror at examples/sanitized-telemetry-streaming/. Documents in README that task.intent flows through sanitized telemetry by default and must never carry user input; route user-visible intent through inputs (redacted by default) instead. Bumps 0.5.4 → 0.5.6 (intentionally skipping 0.5.5; PR #6 currently holds 0.5.5 and is expected to land in series).

…-loud session continuity Resolve all six findings from the review (none blocked landing; #1 gated enabling, #3/#4 wanted documenting). Lineage remains default-OFF and byte-identical to the fresh-box path when both flags are unset. - #1 sessionContinuity silent no-op: `continue` now asserts the session is still known to the sandbox via `box.session(id).status()` before streaming. A `null` (platform never honored the client-minted id, or it was reaped) raises a ValidationError, which executeIteration now propagates as a hard structural failure instead of degrading to a soft empty iteration — so a non-honoring platform errors loudly rather than running contextless turns. - #2 unbounded fork creation: `fork` provisions child boxes through `mapWithConcurrency` bounded by the loop's `maxConcurrency`, not a single `Promise.all` over all N branches. - #3 fork ignores per-branch specs: documented on `fork` and `LoopLineageOptions.forkFanout` that a real CRIU fork inherits the parent image/profile (per-branch specs apply only on the degraded fresh path). - #4 lineage holds every box to loop end: kernel prunes boxes no future round can descend from after each round, gated on a kernel-inferred (monotonic) branch point — skipped when the driver authors its own `parentIndex`. The unprunable case is documented as the box ceiling. - #5 abort during fork: documented the SDK's signal-less fork; abort is now checked per branch (between bounded waves) + an abort-under-lineage test. - #6 export order: alphabetized the loops barrel. Adds `mapWithConcurrency` util and six lineage tests (session-liveness pass/ fail, bounded-fork peak, mid-loop prune, no-prune-under-authored-parent, abort-under-lineage). 627 tests pass, typecheck + biome clean.

@experimental

…r runLoop (backend-blind) (#150) * feat(loops): opt-in session continuation + checkpoint-fork lineage (backend-blind) Two @experimental, default-OFF seams on runLoop so a loop can CONTINUE a sandbox session across iterations (same box + sessionId, no prompt-text replay) and FORK fanout branches from a parent checkpoint (shared context prefix) — both behind a capability probe so the kernel asks 'can I fork?' (client.criuStatus) and never names Docker/Firecracker, degrading to fresh boxes when CRIU is absent. - sandbox-capabilities.ts: memoized, fail-closed criuStatus probe -> {canFork}. - sandbox-lineage.ts: createSandboxLineage owns box+session handles with start/continue/fork/teardown; reuses the kernel's acquireSandbox / buildBackendOptions / deleteBoxSafe; fail-loud if the probe says canFork but the box has no fork(). - run-loop.ts: RunLoopOptions.lineage (sessionContinuity / forkFanout); refine continues, fanout forks-once, else fresh-through-lineage. Default OFF is byte-identical to today, so random@k stays N independent fresh boxes (the compute-control invariant). Rejects lineage + onWorkerBox (both own boxes). - 7 new unit tests (continuation reuses session; fork when canFork; fresh fallback; default-off invariant). Full suite 621 pass, typecheck clean. * fix(loops): address PR #150 review — bound forks, prune lineage, fail-loud session continuity Resolve all six findings from the review (none blocked landing; #1 gated enabling, #3/#4 wanted documenting). Lineage remains default-OFF and byte-identical to the fresh-box path when both flags are unset. - #1 sessionContinuity silent no-op: `continue` now asserts the session is still known to the sandbox via `box.session(id).status()` before streaming. A `null` (platform never honored the client-minted id, or it was reaped) raises a ValidationError, which executeIteration now propagates as a hard structural failure instead of degrading to a soft empty iteration — so a non-honoring platform errors loudly rather than running contextless turns. - #2 unbounded fork creation: `fork` provisions child boxes through `mapWithConcurrency` bounded by the loop's `maxConcurrency`, not a single `Promise.all` over all N branches. - #3 fork ignores per-branch specs: documented on `fork` and `LoopLineageOptions.forkFanout` that a real CRIU fork inherits the parent image/profile (per-branch specs apply only on the degraded fresh path). - #4 lineage holds every box to loop end: kernel prunes boxes no future round can descend from after each round, gated on a kernel-inferred (monotonic) branch point — skipped when the driver authors its own `parentIndex`. The unprunable case is documented as the box ceiling. - #5 abort during fork: documented the SDK's signal-less fork; abort is now checked per branch (between bounded waves) + an abort-under-lineage test. - #6 export order: alphabetized the loops barrel. Adds `mapWithConcurrency` util and six lineage tests (session-liveness pass/ fail, bounded-fork peak, mid-loop prune, no-prune-under-authored-parent, abort-under-lineage). 627 tests pass, typecheck + biome clean.

drewstone mentioned this pull request May 10, 2026

feat: streaming-event telemetry collector + task.intent directive (0.5.6) #7

Merged

drewstone merged commit 2fb4cd5 into main May 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: agent-runtime 0.5.5 — unify agent-eval / agent-knowledge dep tree#6

chore: agent-runtime 0.5.5 — unify agent-eval / agent-knowledge dep tree#6
drewstone merged 1 commit into
mainfrom
chore/runtime-0.5.5

drewstone commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

drewstone commented May 10, 2026

Why

Bumps

API adaptations

Test plan

Follow-up

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant