feat(0.11.0): defineAgent + surfaces-driven adapters + outcome measurement by drewstone · Pull Request #16 · tangle-network/agent-runtime

drewstone · 2026-05-20T09:38:42Z

Summary

The substrate redesign. Closes every universal failure flagged by the audit of the four per-agent draft PRs (now closed):

Subject-routing dead code → typed `FindingSubject` (agent-eval 0.30.0) + `resolveSubjectPath(subject, surfaces, repoRoot)` resolves to a real file path. No `startsWith(...)` prose matching.
ImprovementAdapter has no apply → `createSurfaceImprovementAdapter` ships first-class apply with three modes:
- `write` — `git apply -p0` in-place; operator reviews via `git diff`
- `open-pr` — branch + commit + push + `gh pr create`
- `none` — report-only
  Race-checked via SHA-256 of the file content the patch was drafted against.
No outcome measurement → `measureOutcome(result, opts)` re-runs the cohort after apply, computes composite delta, optionally rolls back applied paths on regression.
Fabricated file paths → `validateSurfaces` runs at `defineAgent` time and throws `AgentManifestError` listing every missing surface. Broken manifests can't ship.
Per-vertical ImprovementAdapter duplication → `createSurfaceImprovementAdapter(opts)` is the ONE adapter every vertical uses. Customization moves to the manifest's `surfaces` + `autoApply`, not new adapter code.
Knowledge proposals lacking lint → `createSurfaceKnowledgeAdapter` wraps `applyKnowledgeWriteBlocks` with an optional `lintAfterApply` hook.

API surface

New sub-export `@tangle-network/agent-runtime/agent` ships:

`defineAgent<TPersona, TRunOutput>(manifest)` — typed, validated factory
`AgentSurfaces`, `validateSurfaces`, `resolveSubjectPath` — surface map + Subject→Path resolver
`createSurfaceImprovementAdapter(opts)` — LLM-drafted patches + `git apply` / `gh pr create` apply
`createSurfaceKnowledgeAdapter(opts, deps)` — agent-knowledge integration with post-apply lint
`measureOutcome(result, opts)` — before/after cohort delta + rollback
`AgentManifestError` — fail-loud manifest validation

Why the redesign

The audit of the four draft PRs found the same structural failures across every one: subject grammar unenforced, no apply on the improvement side, no outcome metric, fabricated paths, per-vertical glue that doesn't scale past 4 verticals — let alone the 1000s the substrate needs to serve. Fixing those in-place across 4 PRs would have shipped the same theater; fixing the substrate fixes them by construction for every future agent.

After this lands, per-vertical PRs collapse from ~700 lines of glue to a ~50-line `defineAgent({...})` call + the substrate's default adapters.

Test plan

`pnpm test` — 124/124 pass (27 new under `tests/agent.test.ts`)
`pnpm typecheck`
`pnpm build` — emits new `agent.{js,d.ts}` sub-export
CI install gated on agent-eval 0.30.0 publish — draft until then.

Gates

agent-eval#61 (`feat(0.30.0): FindingSubject typed grammar`) merges + 0.30.0 publishes
Then this PR's `@tangle-network/agent-eval ^0.30.0` dep resolves and CI unblocks

…ement The substrate redesign. Closes every universal failure flagged by the audit of the 4 per-agent draft PRs: 1. Subject-routing dead code → `parseFindingSubject` (agent-eval 0.30.0) + `resolveSubjectPath(subject, surfaces, repoRoot)` resolve a typed FindingSubject to a real file path. No `startsWith(...)` prose matching. 2. ImprovementAdapter has no apply → `createSurfaceImprovementAdapter` ships a first-class apply with two modes: `write` — `git apply -p0` in-place; operator reviews via git diff `open-pr` — branch + commit + push + `gh pr create` plus a `none` mode for report-only runs. Race-checked via SHA-256 of the file content the patch was drafted against. 3. No outcome measurement → `measureOutcome(result, opts)` re-runs the cohort after apply, computes composite delta, optionally rolls back applied paths on regression. Process-counts-only reporting is gone. 4. Fabricated file paths → `validateSurfaces` runs at `defineAgent` time and throws `AgentManifestError` listing every missing surface. A manifest that ships broken can't get past `pnpm typecheck`. 5. Per-vertical ImprovementAdapter code (~150 lines × N agents) → `createSurfaceImprovementAdapter(opts)` is the ONE adapter every vertical uses. Per-agent customization happens at the manifest level (`surfaces` + `autoApply`), not by writing a new adapter. 6. Knowledge proposals lacking lint → `createSurfaceKnowledgeAdapter` wraps agent-knowledge's `applyKnowledgeWriteBlocks` with an optional `lintAfterApply` hook. Wiki drift surfaces in `warnings` immediately. API surface (new sub-export `@tangle-network/agent-runtime/agent`): - `defineAgent<TPersona, TRunOutput>(manifest)` — typed, validated factory - `AgentSurfaces`, `validateSurfaces`, `resolveSubjectPath` — surface map + Subject→Path resolver - `createSurfaceImprovementAdapter(opts)` — LLM-drafted patches + `git apply` / `gh pr create` apply - `createSurfaceKnowledgeAdapter(opts, deps)` — agent-knowledge integration with post-apply lint - `measureOutcome(result, opts)` — before/after cohort delta + rollback - `AgentManifestError` — fail-loud manifest validation Bumps agent-eval dep to ^0.30.0 (FindingSubject lives there). Tests: 124/124 pass (27 new under `tests/agent.test.ts`) covering manifest validation, subject-resolution for every surface variant, propose/apply error paths, race-detection via SHA mismatch, and outcome rollback on regression. Per-vertical PRs now collapse from ~700 lines of glue to a ~50-line `defineAgent({...})` call + the substrate's default adapters. Tracking that cascade in follow-up PRs per repo.

Two follow-ups missed by the #16 merge window — both required by the per-agent manifest PRs (tax #69, legal #70, gtm #122, creative #98) to edit existing system-prompt skill directories rather than fabricating flat <section>.md siblings. Multi-candidate path probing (surfaces.ts): resolveSubjectPath now probes candidates in order rather than picking a single path. system-prompt:<section> resolves to: <surfaces.systemPrompt>/<section>.md (canonical create-new) <surfaces.systemPrompt>/<section>/SKILL.md (tax/legal/gtm/creative) <surfaces.systemPrompt>/<section>/index.md (flat-md repos) tool-doc:<tool> probes <tool>/README.md then <tool>.md. First hit wins for edit-existing; first candidate is the canonical create-new target. Streaming-first AgentRuntime.act (define-agent.ts): Returns AgentRunInvocation { events, output } instead of Promise<T>. Preserves the chat-centric product's streaming surface through the substrate: - events: AsyncIterable<RuntimeStreamEvent> — chat UX consumes verbatim (SSE / WebSocket / inline render). runChatTurn plugs in directly. - output: Promise<TRunOutput> — resolves after stream drains; eval substrate awaits this; chat UX ignores (already rendered). Helpers: unimplementedAgentRun(reason?) — stub manifests use until their eval path is fully wired; yields no events, rejects output loudly. collectAgentRun(invocation) — eval-path drain helper; chat UX MUST NOT call this (defeats streaming). 133/133 tests pass (6 new: streaming contract + multi-candidate probing). Bumps to 0.11.1.

drewstone marked this pull request as ready for review May 20, 2026 09:40

drewstone merged commit 3de61f6 into main May 20, 2026
1 check failed

drewstone mentioned this pull request May 20, 2026

feat(0.11.1): streaming-first runtime + skill-dir path layout #17

Merged

drewstone added a commit that referenced this pull request May 20, 2026

style: apply biome formatter to agent substrate (post-#16 cleanup)

bac9541

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(0.11.0): defineAgent + surfaces-driven adapters + outcome measurement#16

feat(0.11.0): defineAgent + surfaces-driven adapters + outcome measurement#16
drewstone merged 1 commit into
mainfrom
feat/define-agent-substrate

drewstone commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

drewstone commented May 20, 2026

Summary

API surface

Why the redesign

Test plan

Gates

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant