feat(0.20.0): MCP delegation tools — delegate_code, delegate_research, delegate_feedback by tangletools · Pull Request #45 · tangle-network/agent-runtime

tangletools · 2026-05-24T17:42:02Z

Summary

Phase 1.5 of the driven-loop substrate. Ships a stdio MCP server in
@tangle-network/agent-runtime/mcp plus an agent-runtime-mcp bin so
sandbox coding-harness agents (claude-code, codex, opencode) can delegate
long-running coder / researcher loops to other sandboxes managed by us.

Tool	Kind	Use
`delegate_code`	async	Code-modification task — returns a `taskId`; poll `delegation_status` for the patch
`delegate_research`	async	Source-grounded research task — returns a `taskId`; poll for items + citations
`delegate_feedback`	sync	Append agent/user/judge rating against a delegation, artifact, or outcome
`delegation_status`	sync	Snapshot of state machine (`pending` → `running` → `completed` \| `failed` \| `cancelled`)
`delegation_history`	sync	Newest-first read of past delegations, filterable by namespace / profile / since

Async semantics

agent → delegate_code(goal, repoRoot)        → { taskId, estimatedDurationMs }
agent → delegation_status(taskId)            → { status: 'running', progress }
... (minutes pass)
agent → delegation_status(taskId)            → { status: 'completed', result: { profile: 'coder', output } }
agent → delegate_feedback(refersTo, rating)  → { recorded: true, id }

Idempotent: duplicate identical input → same taskId (canonical-form hash).
Cancellable: queue.cancel(taskId) aborts the in-flight signal.
In-memory queue state; Phase 2 → sqlite. Documented explicitly.

Tool descriptions (the agent-facing UX)

`delegate_code`

Delegate a coding task to specialist coder agents that produce a validated patch.

Use when: you need code written, fixed, refactored, or extended to satisfy a
user goal that touches a real repository. The coder runs in an isolated
sandbox, opens a fresh branch, keeps the diff minimal, runs the supplied
test + typecheck commands, and emits a unified-diff patch.

Returns immediately with a taskId. Poll delegation_status to retrieve the
patch + validator verdict (typically minutes-to-hours, longer for large
changes). Identical inputs return the same taskId — safe to retry.

When variants > 1, multiple coder harnesses (claude-code, codex, opencode)
attempt the task in parallel and the highest-scoring patch wins (smallest
passing diff). Use variants for high-stakes changes; single variant for
routine ones.

Capability scope: the coder cannot modify paths outside repoRoot and cannot
touch paths in config.forbiddenPaths. The validator hard-fails on a
forbidden-path violation, diff above config.maxDiffLines, test failure, or
typecheck failure — none of those make it past the gate.

`delegate_research`

Delegate a research question to specialist researcher agents that produce
source-grounded, evidence-bearing knowledge items.

Use when: you need to answer a factual question with external evidence —
audience research, competitive intelligence, recency-bound web searches,
corpus / docs lookups. The researcher emits items[] with provenance, a
citations[] index, and proposedWrites[] you decide whether to persist.

Returns immediately with a taskId. Poll delegation_status to retrieve the
items + verdict. Identical inputs return the same taskId — safe to retry.

When variants > 1, multiple researcher harnesses run in parallel and the
highest-scoring valid output wins (citation density × source diversity ×
recency match × gap coverage). Use variants when answers might disagree.

Multi-tenant isolation: every item carries namespace. The validator
hard-fails when any item is scoped outside namespace. Never pass another
tenant's namespace.

`delegate_feedback`

Record feedback on a delegation, artifact, or outcome. Synchronous — the
event is durably stored when this call returns.

Use when: you (the agent), the user, or a downstream judge has formed an
opinion about a piece of work and want it persisted for calibration,
pricing, or future routing. Every call is a new event — multiple ratings
on the same target are expected and never deduped.

refersTo.kind:

"delegation" — ref is a taskId returned by delegate_code/delegate_research
"artifact" — ref is a URI/path/git-sha — anything you can dereference
"outcome" — ref is a free-form description of a downstream result

by: "agent" | "user" | "downstream-judge"

When ref names a known taskId, the rating is also attached to the
delegation record so delegation_history surfaces it inline.

`delegation_status`

Poll the status of an async delegation. Returns the current state
(pending | running | completed | failed | cancelled), optional progress,
and the final result when status === "completed".

Use when: you previously called delegate_code or delegate_research and
need to know whether the work is done. The agent's right rhythm is to
call this every minute or two while waiting; do not busy-poll.

For a completed coder task, result.output is a CoderOutput with branch,
patch, test/typecheck results, and diff stats. For a completed research
task, result.output is the items + citations + proposedWrites bundle.

Throws NotFoundError when taskId is unknown — never silently returns
pending for a typo.

`delegation_history`

Read past delegations newest-first. Each entry carries the original
arguments, current status, cost, and any feedback attached via
delegate_feedback.

Use when: you want to introspect prior decisions — "have I asked this
question before?", "did the last patch land?", "what's the historical
success rate of coder delegations on this repo?". Feed the results back
into your own routing and calibration.

Filters: namespace (multi-tenant scope), profile ("coder" | "researcher"),
since (ISO date — only delegations started at-or-after). limit defaults
to 50, capped at 500.

Layering

agent-runtime/mcp                         ← NEW. server + 5 tools + queue + feedback store
  ↓ delegates wire to
agent-runtime/loops + agent-runtime/profiles  (coder)
agent-knowledge/profiles                       (researcher — injected; optional peer)

agent-runtime cannot depend on agent-knowledge (cycle). The bin
lazy-imports the researcher delegate from agent-knowledge when present;
the surface is silently omitted otherwise. Custom integrations wire their
own researcherDelegate via createMcpServer({ researcherDelegate }).

Sandbox SDK fleet-API findings

SandboxFleetClient exists (client.fleets.create({...})) and exposes
dispatchPrompt, dispatchExec, etc. for coordinated multi-machine
work. The MCP layer does not call it directly — runLoop already
parallelizes agentRuns through bounded Promise.all against the
underlying LoopSandboxClient, and fleet semantics (shared workspace,
dispatch traces, intelligence reports) are orthogonal to the
fire-and-poll task model the MCP server presents. We retain
MCP_MAX_CONCURRENT_SANDBOXES (default 4) for the kernel cap. Phase 2
can plumb fleet dispatch into a fleet-backed delegate if the workload
demands it.

Test results

Test Files  22 passed (22)
     Tests  215 passed (215)

154 existing tests unchanged
61 new tests in tests/mcp/* covering: queue lifecycle, idempotency,
cancel + abort propagation, validation errors (Type/RangeError),
namespace isolation, feedback append-only semantics + cross-reference
to history, status NotFoundError, history filters + ordering + limit,
full JSON-RPC roundtrip end-to-end (both server.handle() and stdio
transport), parse-error handling, tool-descriptor self-tests.

Typecheck clean. pnpm build clean. biome check src tests clean.

Smoke transcript (in-process transport)

Driven through dist/mcp/index.js with stub delegates. The wire shape
matches what the bin emits.

=== smoke: initialize ===
{ protocolVersion: "2024-11-05", capabilities: { tools: {} }, serverInfo: { name: "agent-runtime-mcp", version: "0.20.0" } }

=== smoke: tools/list ===
registered 5 tools: delegate_code, delegate_research, delegate_feedback, delegation_status, delegation_history

=== smoke: delegate_research(question: "what content engages cpg-founder ICP on Twitter?", namespace: "test", variants: 2) ===
taskId: dlg-mpk2afeu-5nvjjd6r

=== smoke: poll status ===
{ taskId, profile: "researcher", status: "completed",
  result: { profile: "researcher", output: { items: [...1...], citations: [...1...], proposedWrites: [] } },
  startedAt, completedAt }

=== smoke: delegate_feedback(refersTo: {kind:'delegation', ref:taskId}, rating: {score:0.85, label:'good', notes:'great source diversity'}, by:'agent', namespace:'test') ===
{ recorded: true, id: "fbk-mpk2afew-hvifjvff" }

=== smoke: delegation_history({ namespace: "test" }) ===
{
  delegations: [
    {
      taskId: "dlg-mpk2afeu-5nvjjd6r",
      profile: "researcher",
      args: { question: "...", namespace: "test", variants: 2 },
      status: "completed",
      namespace: "test",
      feedback: [{ id: "fbk-...", score: 0.85, by: "agent", notes: "great source diversity", label: "good", capturedAt }]
    }
  ]
}

Real-credential smoke (against TCloud-routed sandboxes) is the next step
before tagging the release; this PR ships the substrate.

Out of scope (Phase 2 follow-ups)

Persistent task state (sqlite) — README documents the in-memory limitation
Webhook callbacks — MCP polling is the contract for v1
delegate_evaluation — separate future tool
Fleet-backed delegate that uses client.fleets.dispatchPrompt for
cross-machine coordinated runs

Files

src/mcp/{server,task-queue,feedback-store,delegates,types,index,bin}.ts
src/mcp/tools/{delegate-code,delegate-research,delegate-feedback,delegation-status,delegation-history}.ts
tests/mcp/*.test.ts (8 files)
package.json — version 0.20.0, new sub-export, new bin, optional agent-knowledge peer
tsup.config.ts — mcp/index + mcp/bin entries
README.md — Delegation tools (MCP) section

…, delegate_feedback New sub-export `@tangle-network/agent-runtime/mcp` and `agent-runtime-mcp` bin. Five tools exposed over stdio JSON-RPC (MCP 2024-11-05): - delegate_code async, idempotent — runs coderProfile / multi-harness fanout - delegate_research async, idempotent — runs an injected researcher delegate - delegate_feedback sync, append-only — every rating is its own event - delegation_status sync poll — state machine + progress + final result - delegation_history sync read — newest-first, filterable, feedback inline State lives in an in-memory DelegationTaskQueue (Phase 2 → sqlite). The server is topology-free; consumers wire coder + researcher delegates at construction. The bin auto-wires the default coder against the real Sandbox client and lazy-imports a researcher delegate when @tangle-network/agent-knowledge is installed as an optional peer. 61 new tests cover validation, idempotency, lifecycle, cancellation, namespace isolation, feedback cross-reference, and a full JSON-RPC end-to-end through both in-process and stdio transports.

tangletools merged commit 9c82adb into main May 24, 2026
1 check failed

tangletools deleted the feat/mcp-delegation-tools branch May 24, 2026 17:49

tangletools mentioned this pull request May 24, 2026

fix(mcp): diagnostic-mode stub client honors AGENT_RUNTIME_MCP_ALLOW_NO_KEY #46

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(0.20.0): MCP delegation tools — delegate_code, delegate_research, delegate_feedback#45

feat(0.20.0): MCP delegation tools — delegate_code, delegate_research, delegate_feedback#45
tangletools merged 1 commit into
mainfrom
feat/mcp-delegation-tools

tangletools commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tangletools commented May 24, 2026

Summary

Async semantics

Tool descriptions (the agent-facing UX)

delegate_code

delegate_research

delegate_feedback

delegation_status

delegation_history

Layering

Sandbox SDK fleet-API findings

Test results

Smoke transcript (in-process transport)

Out of scope (Phase 2 follow-ups)

Files

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`delegate_code`

`delegate_research`

`delegate_feedback`

`delegation_status`

`delegation_history`