You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The 2026-05-22 agent-eval cross-repo audit (synthesis) catalogued @tangle-network/agent-eval adoption across consumers and produced 8 issues. Since then the substrate has moved significantly — re-baselining this audit body to match the current state.
Substrate snapshot as of 2026-05-23 16:00 UTC:
Package
Then (audit ref)
Now
@tangle-network/agent-eval
0.31.1
0.34.1 (#86 shipped AGENT_PROFILE_KINDS, toAgentProfileJson, buildSandboxAgentProfileCell — published today)
@tangle-network/agent-runtime
(not pinned)
0.17.1 — significant surface change via #38 (yanked chat-turn, intent-router, model-resolution, profile-conformance, run, trace-bridge — 1737 LOC of unused exports) and #39 (sandbox 0.2 peer-range fix)
@tangle-network/sandbox
(not pinned)
0.0.3 (Blueprint) / 0.2.1 (this runtime dev dep) — peer range now >=0.1.2 <0.3.0
blueprint-agent ← new; uses createSandboxPromptBackend + runAgentTaskStream + RuntimeStreamEvent (canonical event flow). Pinned at 0.16.1; bump to 0.17.1 is mechanically clean now that Blueprint's dead createBlueprintTraceBridge wrapper is being deleted (filed against Blueprint as a follow-up after the false-alarm #40 closed).
Audit shape — unchanged
Spawn six parallel sub-agents:
Substrate catalog — enumerate every public export from @tangle-network/agent-runtime@0.17.1 at /Users/drew/webb/agent-runtime (or /home/drew/code/agent-runtime), group by capability area, flag post-0.15.0 additions, identify deprecations, identify surface still kept that has zero consumers (the #38 yank cut 1737 LOC of unused exports — repeat the same exercise on the 0.17.1 surface to catch the next round of dead code, e.g. analyst-loop if no consumer wires it).
2-7. Per-consumer integration audit — tax-agent, legal-agent, creative-agent, gtm-agent, agent-builder, blueprint-agent. Each inventories imports, traces the integration shape, identifies gaps vs the current surface, identifies drift / staleness, produces verdict + 5 highest-leverage upgrades.
Each report writes to /tmp/audit/agent-runtime/<repo>-integration.md and the catalog writes to /tmp/audit/agent-runtime/catalog.md.
Lessons baked in from the false-alarm cycle on #40
The #40 false-alarm proved that the audit's value isn't just "find dead exports to yank" but also "distinguish dead wrappers in consumers from active wrappers." Blueprint had a createBlueprintTraceBridge wrapper that imported the yanked createTraceBridge — looked like a regression, was actually dead scaffolding the consumer should also delete.
Every per-consumer audit must explicitly call out:
Wrapper-around-runtime classes that are themselves never called. A consumer wrapping createXxx and exposing createConsumerXxx is only a real consumer if createConsumerXxx has live callsites. Otherwise the wrapper is dead, the runtime export it wraps is dead, and the right call is to delete BOTH (consumer follow-up issue) rather than restore the runtime export.
Update the spec template's §2 to include a "live callsites of the wrapper" check, not just "exported by the consumer".
Synthesis output
Synthesize the 6 reports into a CTO-level cross-repo report following the exact shape of the agent-eval audit:
Execution gaps in the substrate itself (shipped + unused — second-round yank candidates)
Scaffold-template gaps in agent-builder
Wrapper-deletion candidates per consumer (the lesson from #40)
Verdict on substrate usefulness
Ranked concrete actions
Per-repo CTO specs
Produce seven execution specs — six per consumer plus one for agent-runtime — at /tmp/audit/agent-runtime/spec-<repo>.md following the exact 10-section shape used by the agent-eval audit:
§0 Read-first context · §1 Executive summary · §2 Current state inventory (incl. live-callsites check on wrappers) · §3 Target architecture · §4 File-by-file migration tasks (T0X with file:line / current / target / why / test impact / completion check) · §5 Completion checklist (25-50 boxes) · §6 Test plan · §7 Rollout · §8 Risks + non-goals · §9 Citations · §10 Coordination
File the synthesis + per-repo specs to a new branch in agent-runtime: chore/cross-repo-runtime-audit-2026q2, mirroring the agent-eval audit branch structure.
Issue body shape: executive summary + completion checklist + cross-spec coordination + raw link to canonical spec on the audit branch. Full spec lives in the branch because GitHub's 65 KB issue body limit doesn't fit 1200-1800-line specs directly.
Optionally file an [N+2.0] triage issue for unused agent-runtime surface, mirroring agent-eval#77.
Read-first context for the sub-agent
The agent-eval audit's outputs are the exact template — re-read them before starting:
#38 — the first-round yank PR; its diff is the empirical baseline for what "dead export" looked like before the audit. Treat as a worked example.
#40 — the false-alarm cycle that validated the substrate yank and motivated the wrapper-callsites check.
Acceptance criteria
Branch chore/cross-repo-runtime-audit-2026q2 exists on tangle-network/agent-runtime with docs/audits/2026-MM-DD-cross-repo/ carrying the synthesis + catalog + 6 consumer audits + 7 specs
One issue per repo filed: agent-runtime (substrate), agent-builder (meta), tax-agent, legal-agent, creative-agent, gtm-agent, blueprint-agent
Optional triage issue if speculative surface exists
Each spec follows the 10-section CTO shape: 800-1800 lines, real file:line citations, real code snippets, 25-50 completion boxes, every task carries a test impact statement
Every consumer audit explicitly checks live-callsites of any wrappers around runtime exports (lesson from #40)
No source files modified in any repo during the audit — only spec docs landed
Why this is worth doing
Layered substrate work compounds. #38 shaved 1737 LOC of unused exports based on a single-pass inspection; a structured audit will catch what that pass missed (e.g. analyst-loop is still 100+ LOC of public surface — has it ever shipped?) AND the converse — wrappers in consumers whose only existence justifies the runtime export. Without this audit, agent-runtime drift accumulates silently and we pay the same audit cost again in a year.
Edit history: Body refreshed 2026-05-23 16:45 UTC to update the substrate version baseline (agent-eval 0.31.1 → 0.34.1; agent-runtime → 0.17.1; consumer count 5 → 6 incl. Blueprint), and to bake in the wrapper-callsites lesson from the #40 false-alarm cycle.
Why this audit
The 2026-05-22 agent-eval cross-repo audit (synthesis) catalogued
@tangle-network/agent-evaladoption across consumers and produced 8 issues. Since then the substrate has moved significantly — re-baselining this audit body to match the current state.Substrate snapshot as of 2026-05-23 16:00 UTC:
@tangle-network/agent-evalAGENT_PROFILE_KINDS,toAgentProfileJson,buildSandboxAgentProfileCell— published today)@tangle-network/agent-runtimechat-turn,intent-router,model-resolution,profile-conformance,run,trace-bridge— 1737 LOC of unused exports) and #39 (sandbox 0.2 peer-range fix)@tangle-network/sandbox>=0.1.2 <0.3.0Consumer list as of 2026-05-23
Six consumers now, not five (Blueprint joined today via tangle-network/blueprint-agent#1758 — merged 2026-05-23 13:13 UTC):
tax-agentlegal-agentcreative-agentgtm-agentagent-builderblueprint-agent← new; usescreateSandboxPromptBackend+runAgentTaskStream+RuntimeStreamEvent(canonical event flow). Pinned at 0.16.1; bump to 0.17.1 is mechanically clean now that Blueprint's deadcreateBlueprintTraceBridgewrapper is being deleted (filed against Blueprint as a follow-up after the false-alarm #40 closed).Audit shape — unchanged
Spawn six parallel sub-agents:
@tangle-network/agent-runtime@0.17.1at/Users/drew/webb/agent-runtime(or/home/drew/code/agent-runtime), group by capability area, flag post-0.15.0 additions, identify deprecations, identify surface still kept that has zero consumers (the #38 yank cut 1737 LOC of unused exports — repeat the same exercise on the 0.17.1 surface to catch the next round of dead code, e.g.analyst-loopif no consumer wires it).2-7. Per-consumer integration audit —
tax-agent,legal-agent,creative-agent,gtm-agent,agent-builder,blueprint-agent. Each inventories imports, traces the integration shape, identifies gaps vs the current surface, identifies drift / staleness, produces verdict + 5 highest-leverage upgrades.Each report writes to
/tmp/audit/agent-runtime/<repo>-integration.mdand the catalog writes to/tmp/audit/agent-runtime/catalog.md.Lessons baked in from the false-alarm cycle on #40
The #40 false-alarm proved that the audit's value isn't just "find dead exports to yank" but also "distinguish dead wrappers in consumers from active wrappers." Blueprint had a
createBlueprintTraceBridgewrapper that imported the yankedcreateTraceBridge— looked like a regression, was actually dead scaffolding the consumer should also delete.Every per-consumer audit must explicitly call out:
createXxxand exposingcreateConsumerXxxis only a real consumer ifcreateConsumerXxxhas live callsites. Otherwise the wrapper is dead, the runtime export it wraps is dead, and the right call is to delete BOTH (consumer follow-up issue) rather than restore the runtime export.Update the spec template's §2 to include a "live callsites of the wrapper" check, not just "exported by the consumer".
Synthesis output
Synthesize the 6 reports into a CTO-level cross-repo report following the exact shape of the agent-eval audit:
Per-repo CTO specs
Produce seven execution specs — six per consumer plus one for agent-runtime — at
/tmp/audit/agent-runtime/spec-<repo>.mdfollowing the exact 10-section shape used by the agent-eval audit:File the synthesis + per-repo specs to a new branch in
agent-runtime:chore/cross-repo-runtime-audit-2026q2, mirroring the agent-eval audit branch structure.File the resulting issues
Once specs land, file one issue per repo:
agent-runtime/[N+1.0]— substrate spec (second-round yank, absorb hand-rolled patterns, etc.)agent-builder/[meta-spec]— scaffold updates relevant to agent-runtime adoptiontax-agent,legal-agent,creative-agent,gtm-agent,blueprint-agent— per-consumer execution specsIssue body shape: executive summary + completion checklist + cross-spec coordination + raw link to canonical spec on the audit branch. Full spec lives in the branch because GitHub's 65 KB issue body limit doesn't fit 1200-1800-line specs directly.
Optionally file an
[N+2.0]triage issue for unused agent-runtime surface, mirroring agent-eval#77.Read-first context for the sub-agent
The agent-eval audit's outputs are the exact template — re-read them before starting:
*-integration.mdsiblings of the synthesisspec-*.mdsiblings of the synthesis (use these as exemplars for output shape + rigor)Acceptance criteria
chore/cross-repo-runtime-audit-2026q2exists ontangle-network/agent-runtimewithdocs/audits/2026-MM-DD-cross-repo/carrying the synthesis + catalog + 6 consumer audits + 7 specsWhy this is worth doing
Layered substrate work compounds. #38 shaved 1737 LOC of unused exports based on a single-pass inspection; a structured audit will catch what that pass missed (e.g.
analyst-loopis still 100+ LOC of public surface — has it ever shipped?) AND the converse — wrappers in consumers whose only existence justifies the runtime export. Without this audit, agent-runtime drift accumulates silently and we pay the same audit cost again in a year.Edit history: Body refreshed 2026-05-23 16:45 UTC to update the substrate version baseline (agent-eval 0.31.1 → 0.34.1; agent-runtime → 0.17.1; consumer count 5 → 6 incl. Blueprint), and to bake in the wrapper-callsites lesson from the #40 false-alarm cycle.