feat(0.20.0): MCP delegation tools — delegate_code, delegate_research, delegate_feedback#45
Merged
Merged
Conversation
…, delegate_feedback New sub-export `@tangle-network/agent-runtime/mcp` and `agent-runtime-mcp` bin. Five tools exposed over stdio JSON-RPC (MCP 2024-11-05): - delegate_code async, idempotent — runs coderProfile / multi-harness fanout - delegate_research async, idempotent — runs an injected researcher delegate - delegate_feedback sync, append-only — every rating is its own event - delegation_status sync poll — state machine + progress + final result - delegation_history sync read — newest-first, filterable, feedback inline State lives in an in-memory DelegationTaskQueue (Phase 2 → sqlite). The server is topology-free; consumers wire coder + researcher delegates at construction. The bin auto-wires the default coder against the real Sandbox client and lazy-imports a researcher delegate when @tangle-network/agent-knowledge is installed as an optional peer. 61 new tests cover validation, idempotency, lifecycle, cancellation, namespace isolation, feedback cross-reference, and a full JSON-RPC end-to-end through both in-process and stdio transports.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 1.5 of the driven-loop substrate. Ships a stdio MCP server in
@tangle-network/agent-runtime/mcpplus anagent-runtime-mcpbin sosandbox coding-harness agents (claude-code, codex, opencode) can delegate
long-running coder / researcher loops to other sandboxes managed by us.
delegate_codetaskId; polldelegation_statusfor the patchdelegate_researchtaskId; poll for items + citationsdelegate_feedbackdelegation_statuspending→running→completed|failed|cancelled)delegation_historyAsync semantics
taskId(canonical-form hash).queue.cancel(taskId)aborts the in-flight signal.Tool descriptions (the agent-facing UX)
delegate_codeDelegate a coding task to specialist coder agents that produce a validated patch.
Use when: you need code written, fixed, refactored, or extended to satisfy a
user goal that touches a real repository. The coder runs in an isolated
sandbox, opens a fresh branch, keeps the diff minimal, runs the supplied
test + typecheck commands, and emits a unified-diff patch.
Returns immediately with a taskId. Poll delegation_status to retrieve the
patch + validator verdict (typically minutes-to-hours, longer for large
changes). Identical inputs return the same taskId — safe to retry.
When variants > 1, multiple coder harnesses (claude-code, codex, opencode)
attempt the task in parallel and the highest-scoring patch wins (smallest
passing diff). Use variants for high-stakes changes; single variant for
routine ones.
Capability scope: the coder cannot modify paths outside repoRoot and cannot
touch paths in config.forbiddenPaths. The validator hard-fails on a
forbidden-path violation, diff above config.maxDiffLines, test failure, or
typecheck failure — none of those make it past the gate.
delegate_researchDelegate a research question to specialist researcher agents that produce
source-grounded, evidence-bearing knowledge items.
Use when: you need to answer a factual question with external evidence —
audience research, competitive intelligence, recency-bound web searches,
corpus / docs lookups. The researcher emits items[] with provenance, a
citations[] index, and proposedWrites[] you decide whether to persist.
Returns immediately with a taskId. Poll delegation_status to retrieve the
items + verdict. Identical inputs return the same taskId — safe to retry.
When variants > 1, multiple researcher harnesses run in parallel and the
highest-scoring valid output wins (citation density × source diversity ×
recency match × gap coverage). Use variants when answers might disagree.
Multi-tenant isolation: every item carries
namespace. The validatorhard-fails when any item is scoped outside
namespace. Never pass anothertenant's namespace.
delegate_feedbackRecord feedback on a delegation, artifact, or outcome. Synchronous — the
event is durably stored when this call returns.
Use when: you (the agent), the user, or a downstream judge has formed an
opinion about a piece of work and want it persisted for calibration,
pricing, or future routing. Every call is a new event — multiple ratings
on the same target are expected and never deduped.
refersTo.kind:"delegation"— ref is a taskId returned by delegate_code/delegate_research"artifact"— ref is a URI/path/git-sha — anything you can dereference"outcome"— ref is a free-form description of a downstream resultby:"agent"|"user"|"downstream-judge"When ref names a known taskId, the rating is also attached to the
delegation record so delegation_history surfaces it inline.
delegation_statusPoll the status of an async delegation. Returns the current state
(pending | running | completed | failed | cancelled), optional progress,
and the final result when status === "completed".
Use when: you previously called delegate_code or delegate_research and
need to know whether the work is done. The agent's right rhythm is to
call this every minute or two while waiting; do not busy-poll.
For a completed coder task,
result.outputis a CoderOutput with branch,patch, test/typecheck results, and diff stats. For a completed research
task,
result.outputis the items + citations + proposedWrites bundle.Throws NotFoundError when taskId is unknown — never silently returns
pendingfor a typo.delegation_historyRead past delegations newest-first. Each entry carries the original
arguments, current status, cost, and any feedback attached via
delegate_feedback.
Use when: you want to introspect prior decisions — "have I asked this
question before?", "did the last patch land?", "what's the historical
success rate of coder delegations on this repo?". Feed the results back
into your own routing and calibration.
Filters:
namespace(multi-tenant scope),profile("coder"|"researcher"),since(ISO date — only delegations started at-or-after).limitdefaultsto 50, capped at 500.
Layering
agent-runtimecannot depend onagent-knowledge(cycle). The binlazy-imports the researcher delegate from agent-knowledge when present;
the surface is silently omitted otherwise. Custom integrations wire their
own
researcherDelegateviacreateMcpServer({ researcherDelegate }).Sandbox SDK fleet-API findings
SandboxFleetClientexists (client.fleets.create({...})) and exposesdispatchPrompt,dispatchExec, etc. for coordinated multi-machinework. The MCP layer does not call it directly —
runLoopalreadyparallelizes
agentRunsthrough boundedPromise.allagainst theunderlying
LoopSandboxClient, and fleet semantics (shared workspace,dispatch traces, intelligence reports) are orthogonal to the
fire-and-poll task model the MCP server presents. We retain
MCP_MAX_CONCURRENT_SANDBOXES(default 4) for the kernel cap. Phase 2can plumb fleet dispatch into a fleet-backed delegate if the workload
demands it.
Test results
tests/mcp/*covering: queue lifecycle, idempotency,cancel + abort propagation, validation errors (Type/RangeError),
namespace isolation, feedback append-only semantics + cross-reference
to history, status NotFoundError, history filters + ordering + limit,
full JSON-RPC roundtrip end-to-end (both
server.handle()and stdiotransport), parse-error handling, tool-descriptor self-tests.
Typecheck clean.
pnpm buildclean.biome check src testsclean.Smoke transcript (in-process transport)
Driven through
dist/mcp/index.jswith stub delegates. The wire shapematches what the bin emits.
Real-credential smoke (against TCloud-routed sandboxes) is the next step
before tagging the release; this PR ships the substrate.
Out of scope (Phase 2 follow-ups)
delegate_evaluation— separate future toolclient.fleets.dispatchPromptforcross-machine coordinated runs
Files
src/mcp/{server,task-queue,feedback-store,delegates,types,index,bin}.tssrc/mcp/tools/{delegate-code,delegate-research,delegate-feedback,delegation-status,delegation-history}.tstests/mcp/*.test.ts(8 files)package.json— version 0.20.0, new sub-export, new bin, optionalagent-knowledgepeertsup.config.ts—mcp/index+mcp/binentriesREADME.md— Delegation tools (MCP) section