Skip to content

feat(0.11.0): defineAgent + surfaces-driven adapters + outcome measurement#16

Merged
drewstone merged 1 commit into
mainfrom
feat/define-agent-substrate
May 20, 2026
Merged

feat(0.11.0): defineAgent + surfaces-driven adapters + outcome measurement#16
drewstone merged 1 commit into
mainfrom
feat/define-agent-substrate

Conversation

@drewstone
Copy link
Copy Markdown
Contributor

Summary

The substrate redesign. Closes every universal failure flagged by the audit of the four per-agent draft PRs (now closed):

  1. Subject-routing dead code → typed `FindingSubject` (agent-eval 0.30.0) + `resolveSubjectPath(subject, surfaces, repoRoot)` resolves to a real file path. No `startsWith(...)` prose matching.

  2. ImprovementAdapter has no apply → `createSurfaceImprovementAdapter` ships first-class apply with three modes:

    • `write` — `git apply -p0` in-place; operator reviews via `git diff`
    • `open-pr` — branch + commit + push + `gh pr create`
    • `none` — report-only
      Race-checked via SHA-256 of the file content the patch was drafted against.
  3. No outcome measurement → `measureOutcome(result, opts)` re-runs the cohort after apply, computes composite delta, optionally rolls back applied paths on regression.

  4. Fabricated file paths → `validateSurfaces` runs at `defineAgent` time and throws `AgentManifestError` listing every missing surface. Broken manifests can't ship.

  5. Per-vertical ImprovementAdapter duplication → `createSurfaceImprovementAdapter(opts)` is the ONE adapter every vertical uses. Customization moves to the manifest's `surfaces` + `autoApply`, not new adapter code.

  6. Knowledge proposals lacking lint → `createSurfaceKnowledgeAdapter` wraps `applyKnowledgeWriteBlocks` with an optional `lintAfterApply` hook.

API surface

New sub-export `@tangle-network/agent-runtime/agent` ships:

  • `defineAgent<TPersona, TRunOutput>(manifest)` — typed, validated factory
  • `AgentSurfaces`, `validateSurfaces`, `resolveSubjectPath` — surface map + Subject→Path resolver
  • `createSurfaceImprovementAdapter(opts)` — LLM-drafted patches + `git apply` / `gh pr create` apply
  • `createSurfaceKnowledgeAdapter(opts, deps)` — agent-knowledge integration with post-apply lint
  • `measureOutcome(result, opts)` — before/after cohort delta + rollback
  • `AgentManifestError` — fail-loud manifest validation

Why the redesign

The audit of the four draft PRs found the same structural failures across every one: subject grammar unenforced, no apply on the improvement side, no outcome metric, fabricated paths, per-vertical glue that doesn't scale past 4 verticals — let alone the 1000s the substrate needs to serve. Fixing those in-place across 4 PRs would have shipped the same theater; fixing the substrate fixes them by construction for every future agent.

After this lands, per-vertical PRs collapse from ~700 lines of glue to a ~50-line `defineAgent({...})` call + the substrate's default adapters.

Test plan

  • `pnpm test` — 124/124 pass (27 new under `tests/agent.test.ts`)
  • `pnpm typecheck`
  • `pnpm build` — emits new `agent.{js,d.ts}` sub-export
  • CI install gated on agent-eval 0.30.0 publish — draft until then.

Gates

  • agent-eval#61 (`feat(0.30.0): FindingSubject typed grammar`) merges + 0.30.0 publishes
  • Then this PR's `@tangle-network/agent-eval ^0.30.0` dep resolves and CI unblocks

…ement

The substrate redesign. Closes every universal failure flagged by the audit
of the 4 per-agent draft PRs:

  1. Subject-routing dead code → `parseFindingSubject` (agent-eval 0.30.0) +
     `resolveSubjectPath(subject, surfaces, repoRoot)` resolve a typed
     FindingSubject to a real file path. No `startsWith(...)` prose matching.

  2. ImprovementAdapter has no apply → `createSurfaceImprovementAdapter`
     ships a first-class apply with two modes:
       `write` — `git apply -p0` in-place; operator reviews via git diff
       `open-pr` — branch + commit + push + `gh pr create`
     plus a `none` mode for report-only runs. Race-checked via SHA-256 of
     the file content the patch was drafted against.

  3. No outcome measurement → `measureOutcome(result, opts)` re-runs the
     cohort after apply, computes composite delta, optionally rolls back
     applied paths on regression. Process-counts-only reporting is gone.

  4. Fabricated file paths → `validateSurfaces` runs at `defineAgent` time
     and throws `AgentManifestError` listing every missing surface. A
     manifest that ships broken can't get past `pnpm typecheck`.

  5. Per-vertical ImprovementAdapter code (~150 lines × N agents) →
     `createSurfaceImprovementAdapter(opts)` is the ONE adapter every
     vertical uses. Per-agent customization happens at the manifest level
     (`surfaces` + `autoApply`), not by writing a new adapter.

  6. Knowledge proposals lacking lint → `createSurfaceKnowledgeAdapter`
     wraps agent-knowledge's `applyKnowledgeWriteBlocks` with an optional
     `lintAfterApply` hook. Wiki drift surfaces in `warnings` immediately.

API surface (new sub-export `@tangle-network/agent-runtime/agent`):
  - `defineAgent<TPersona, TRunOutput>(manifest)` — typed, validated factory
  - `AgentSurfaces`, `validateSurfaces`, `resolveSubjectPath` — surface map +
    Subject→Path resolver
  - `createSurfaceImprovementAdapter(opts)` — LLM-drafted patches +
    `git apply` / `gh pr create` apply
  - `createSurfaceKnowledgeAdapter(opts, deps)` — agent-knowledge integration
    with post-apply lint
  - `measureOutcome(result, opts)` — before/after cohort delta + rollback
  - `AgentManifestError` — fail-loud manifest validation

Bumps agent-eval dep to ^0.30.0 (FindingSubject lives there).

Tests: 124/124 pass (27 new under `tests/agent.test.ts`) covering manifest
validation, subject-resolution for every surface variant, propose/apply
error paths, race-detection via SHA mismatch, and outcome rollback on
regression.

Per-vertical PRs now collapse from ~700 lines of glue to a ~50-line
`defineAgent({...})` call + the substrate's default adapters. Tracking
that cascade in follow-up PRs per repo.
@drewstone drewstone marked this pull request as ready for review May 20, 2026 09:40
@drewstone drewstone merged commit 3de61f6 into main May 20, 2026
1 check failed
drewstone added a commit that referenced this pull request May 20, 2026
Two follow-ups missed by the #16 merge window — both required by the
per-agent manifest PRs (tax #69, legal #70, gtm #122, creative #98) to
edit existing system-prompt skill directories rather than fabricating
flat <section>.md siblings.

Multi-candidate path probing (surfaces.ts):
  resolveSubjectPath now probes candidates in order rather than picking
  a single path. system-prompt:<section> resolves to:
    <surfaces.systemPrompt>/<section>.md          (canonical create-new)
    <surfaces.systemPrompt>/<section>/SKILL.md    (tax/legal/gtm/creative)
    <surfaces.systemPrompt>/<section>/index.md    (flat-md repos)
  tool-doc:<tool> probes <tool>/README.md then <tool>.md. First hit wins
  for edit-existing; first candidate is the canonical create-new target.

Streaming-first AgentRuntime.act (define-agent.ts):
  Returns AgentRunInvocation { events, output } instead of Promise<T>.
  Preserves the chat-centric product's streaming surface through the
  substrate:
    - events: AsyncIterable<RuntimeStreamEvent> — chat UX consumes
      verbatim (SSE / WebSocket / inline render). runChatTurn plugs
      in directly.
    - output: Promise<TRunOutput> — resolves after stream drains; eval
      substrate awaits this; chat UX ignores (already rendered).

Helpers:
  unimplementedAgentRun(reason?) — stub manifests use until their eval
    path is fully wired; yields no events, rejects output loudly.
  collectAgentRun(invocation) — eval-path drain helper; chat UX MUST
    NOT call this (defeats streaming).

133/133 tests pass (6 new: streaming contract + multi-candidate probing).

Bumps to 0.11.1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant