Symptom
runLoop + coderProfile({ harness: 'opencode' }) against a Tangle sandbox returns an empty CoderOutput (patch "", testResult.passed=false) with costUsd: 0 and tokenUsage {0,0} on every iteration — the validator then grades the emptiness ("empty patch — no files changed"). Reproduced 9/9 across two harnesses and several models.
It is NOT the platform or auth — isolated by elimination (all evidence from the same sandbox)
| Path |
Result |
box.prompt('…') bare |
✓ works, bills (e.g. 213/5 tokens) |
box.streamPrompt('…') bare |
✓ 27 events incl. a terminal event |
box.prompt() + inline AgentProfile (model set / unset) |
✓ all work, bill |
runLoop + coderProfile (opencode, model zai/glm-4.7) |
✗ empty, $0, 0 tokens |
So streamPrompt, inline profiles, models, and credentials all work in isolation. The empty result appears only through the coder loop.
Likely cause
parseCoderEvents (src/profiles/coder.ts:220) recognizes only type === 'result' | 'final' | 'coder.result' (or a fenced-JSON text block) for the structured CoderOutput. The opencode harness's SDK event stream uses different types, so the adapter falls through to the empty default. The same shape gap likely makes extractLlmCallEvent miss the cost-bearing events → $0. Net: the agent may well be running and producing a diff, but the loop kernel can't see it, so the worker silently "fails validation."
This is the same error-collapse class noted earlier: a backend that works via prompt() reports empty through the loop, and the only surfaced signal is "no candidate passed validation."
Asks
- Make the coder output adapter +
extractLlmCallEvent cover the opencode/codex/kimi event vocabularies (or add a harness-specific adapter), not just the claude-code result/final shape.
- When
streamPrompt yields events but the adapter extracts nothing, fail loud (distinguish "agent produced no parseable result" from "empty patch") rather than coercing to a zero-value CoderOutput the validator then grades as a normal failure.
Repro
const { output, validator, agentRunSpec } = coderProfile({ harness: 'opencode', model: 'zai/glm-4.7', task })
const r = await runLoop({ task, agentRun: agentRunSpec, output, validator, driver: createRefineDriver({ maxIterations: 1 }), maxIterations: 1, ctx: { sandboxClient } })
// r.winner.verdict.valid === false, r.costUsd === 0, patch === '' — while box.prompt() on the same backend bills + responds
Symptom
runLoop+coderProfile({ harness: 'opencode' })against a Tangle sandbox returns an emptyCoderOutput(patch"",testResult.passed=false) withcostUsd: 0andtokenUsage {0,0}on every iteration — the validator then grades the emptiness ("empty patch — no files changed"). Reproduced 9/9 across two harnesses and several models.It is NOT the platform or auth — isolated by elimination (all evidence from the same sandbox)
box.prompt('…')barebox.streamPrompt('…')barebox.prompt()+ inline AgentProfile (model set / unset)runLoop+coderProfile(opencode, modelzai/glm-4.7)$0, 0 tokensSo
streamPrompt, inline profiles, models, and credentials all work in isolation. The empty result appears only through the coder loop.Likely cause
parseCoderEvents(src/profiles/coder.ts:220) recognizes onlytype === 'result' | 'final' | 'coder.result'(or a fenced-JSON text block) for the structuredCoderOutput. The opencode harness's SDK event stream uses differenttypes, so the adapter falls through to the empty default. The same shape gap likely makesextractLlmCallEventmiss the cost-bearing events →$0. Net: the agent may well be running and producing a diff, but the loop kernel can't see it, so the worker silently "fails validation."This is the same error-collapse class noted earlier: a backend that works via
prompt()reports empty through the loop, and the only surfaced signal is "no candidate passed validation."Asks
extractLlmCallEventcover the opencode/codex/kimi event vocabularies (or add a harness-specific adapter), not just the claude-coderesult/finalshape.streamPromptyields events but the adapter extracts nothing, fail loud (distinguish "agent produced no parseable result" from "empty patch") rather than coercing to a zero-valueCoderOutputthe validator then grades as a normal failure.Repro