Skip to content

chore: agent-runtime 0.5.5 — unify agent-eval / agent-knowledge dep tree#6

Merged
drewstone merged 1 commit into
mainfrom
chore/runtime-0.5.5
May 10, 2026
Merged

chore: agent-runtime 0.5.5 — unify agent-eval / agent-knowledge dep tree#6
drewstone merged 1 commit into
mainfrom
chore/runtime-0.5.5

Conversation

@drewstone
Copy link
Copy Markdown
Contributor

Why

After the 0.23 fleet bumps, every downstream consumer's lockfile still
carries a transitive @tangle-network/agent-eval@0.20.12 because
agent-runtime@0.5.4 (published with the ^0.23.0 pin from #3) is still
the latest version on the registry, and a stale lockfile entry for
agent-runtime@0.5.4 → agent-eval@0.20.12 persists in consumers that
last resolved before #3 landed.

Cutting 0.5.5 is the minimal version bump that invalidates that
lockfile entry so every consumer collapses to a single agent-eval copy
on their next pnpm install.

Bumps

Package From To
@tangle-network/agent-runtime (this pkg) 0.5.4 0.5.5
@tangle-network/agent-eval (dep) ^0.23.0 ^0.23.0 (unchanged — already at target since #3)
@tangle-network/agent-knowledge n/a n/a (agent-runtime does not depend on agent-knowledge directly; consumers pick up ^1.2.0 via their own package.json)

The only file change is package.json's version field. The pnpm-lock.yaml
already resolves agent-eval@0.23.0 cleanly — no lockfile delta required.

API adaptations

None. The agent-eval ^0.23.0 pin landed in #3 and source compiles
clean against the 0.23 surface — pnpm typecheck is green, no drift
from 0.21 capture-integrity, 0.22 EvalCampaign, or 0.23 RL primitives
reached anything agent-runtime re-exports.

Test plan

  • pnpm install — lockfile resolves cleanly (no spec churn)
  • pnpm typecheck — clean
  • pnpm builddist/index.js 34.31 KB, dist/index.d.ts 18.40 KB
  • pnpm test — 16/16 passed (tests/runtime.test.ts)

Follow-up

npm publish of 0.5.5 is a separate step after this merges.

…se 0.5.5

The agent-eval ^0.23.0 pin already landed in #3, but downstream lockfiles
still resolve agent-runtime@0.5.4 → a transitive agent-eval@0.20.12 entry.
Cutting 0.5.5 invalidates that lockfile entry so every consumer collapses
to a single agent-eval 0.23.x copy after their next pnpm install.

agent-runtime does not directly depend on agent-knowledge, so no
agent-knowledge pin change is required here — consumers pick up
^1.2.0 via their own package.json after this release lands.
@drewstone drewstone merged commit 2fb4cd5 into main May 10, 2026
drewstone added a commit that referenced this pull request May 10, 2026
…5.6) (#7)

Add createRuntimeStreamEventCollector — a sibling of
createRuntimeEventCollector typed for RuntimeStreamEvent. Honors the
same RuntimeTelemetryOptions redaction flags (includeInputs,
includeUserAnswers, includeControlPayloads, includeEvidenceIds,
includeMetadata, includeRequirementDescriptions, includeEvalDetails)
and returns the same {events, onEvent} interface plus a summary()
function that rolls up event counts, session id, final status, and
concatenated text_delta.text.

Sibling factory rather than overload because stream and non-stream
events have different field shapes (timestamps, sessions, text/tool
deltas) and overlapping type literals (task_start, readiness_end, …) —
a unified dispatcher would silently misroute events.

Adds the streaming-collector example mirror at
examples/sanitized-telemetry-streaming/. Documents in README that
task.intent flows through sanitized telemetry by default and must
never carry user input; route user-visible intent through inputs
(redacted by default) instead.

Bumps 0.5.4 → 0.5.6 (intentionally skipping 0.5.5; PR #6 currently
holds 0.5.5 and is expected to land in series).
tangletools pushed a commit that referenced this pull request Jun 4, 2026
…-loud session continuity

Resolve all six findings from the review (none blocked landing; #1 gated
enabling, #3/#4 wanted documenting). Lineage remains default-OFF and
byte-identical to the fresh-box path when both flags are unset.

- #1 sessionContinuity silent no-op: `continue` now asserts the session is
  still known to the sandbox via `box.session(id).status()` before streaming.
  A `null` (platform never honored the client-minted id, or it was reaped)
  raises a ValidationError, which executeIteration now propagates as a hard
  structural failure instead of degrading to a soft empty iteration — so a
  non-honoring platform errors loudly rather than running contextless turns.
- #2 unbounded fork creation: `fork` provisions child boxes through
  `mapWithConcurrency` bounded by the loop's `maxConcurrency`, not a single
  `Promise.all` over all N branches.
- #3 fork ignores per-branch specs: documented on `fork` and
  `LoopLineageOptions.forkFanout` that a real CRIU fork inherits the parent
  image/profile (per-branch specs apply only on the degraded fresh path).
- #4 lineage holds every box to loop end: kernel prunes boxes no future round
  can descend from after each round, gated on a kernel-inferred (monotonic)
  branch point — skipped when the driver authors its own `parentIndex`. The
  unprunable case is documented as the box ceiling.
- #5 abort during fork: documented the SDK's signal-less fork; abort is now
  checked per branch (between bounded waves) + an abort-under-lineage test.
- #6 export order: alphabetized the loops barrel.

Adds `mapWithConcurrency` util and six lineage tests (session-liveness pass/
fail, bounded-fork peak, mid-loop prune, no-prune-under-authored-parent,
abort-under-lineage). 627 tests pass, typecheck + biome clean.
drewstone added a commit that referenced this pull request Jun 4, 2026
…r runLoop (backend-blind) (#150)

* feat(loops): opt-in session continuation + checkpoint-fork lineage (backend-blind)

Two @experimental, default-OFF seams on runLoop so a loop can CONTINUE a sandbox
session across iterations (same box + sessionId, no prompt-text replay) and FORK
fanout branches from a parent checkpoint (shared context prefix) — both behind a
capability probe so the kernel asks 'can I fork?' (client.criuStatus) and never
names Docker/Firecracker, degrading to fresh boxes when CRIU is absent.

- sandbox-capabilities.ts: memoized, fail-closed criuStatus probe -> {canFork}.
- sandbox-lineage.ts: createSandboxLineage owns box+session handles with
  start/continue/fork/teardown; reuses the kernel's acquireSandbox /
  buildBackendOptions / deleteBoxSafe; fail-loud if the probe says canFork but
  the box has no fork().
- run-loop.ts: RunLoopOptions.lineage (sessionContinuity / forkFanout); refine
  continues, fanout forks-once, else fresh-through-lineage. Default OFF is
  byte-identical to today, so random@k stays N independent fresh boxes (the
  compute-control invariant). Rejects lineage + onWorkerBox (both own boxes).
- 7 new unit tests (continuation reuses session; fork when canFork; fresh
  fallback; default-off invariant). Full suite 621 pass, typecheck clean.

* fix(loops): address PR #150 review — bound forks, prune lineage, fail-loud session continuity

Resolve all six findings from the review (none blocked landing; #1 gated
enabling, #3/#4 wanted documenting). Lineage remains default-OFF and
byte-identical to the fresh-box path when both flags are unset.

- #1 sessionContinuity silent no-op: `continue` now asserts the session is
  still known to the sandbox via `box.session(id).status()` before streaming.
  A `null` (platform never honored the client-minted id, or it was reaped)
  raises a ValidationError, which executeIteration now propagates as a hard
  structural failure instead of degrading to a soft empty iteration — so a
  non-honoring platform errors loudly rather than running contextless turns.
- #2 unbounded fork creation: `fork` provisions child boxes through
  `mapWithConcurrency` bounded by the loop's `maxConcurrency`, not a single
  `Promise.all` over all N branches.
- #3 fork ignores per-branch specs: documented on `fork` and
  `LoopLineageOptions.forkFanout` that a real CRIU fork inherits the parent
  image/profile (per-branch specs apply only on the degraded fresh path).
- #4 lineage holds every box to loop end: kernel prunes boxes no future round
  can descend from after each round, gated on a kernel-inferred (monotonic)
  branch point — skipped when the driver authors its own `parentIndex`. The
  unprunable case is documented as the box ceiling.
- #5 abort during fork: documented the SDK's signal-less fork; abort is now
  checked per branch (between bounded waves) + an abort-under-lineage test.
- #6 export order: alphabetized the loops barrel.

Adds `mapWithConcurrency` util and six lineage tests (session-liveness pass/
fail, bounded-fork peak, mid-loop prune, no-prune-under-authored-parent,
abort-under-lineage). 627 tests pass, typecheck + biome clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant