One CLI (and one Python API, and one TypeScript API) to invoke every headless coding-CLI agent as a subprocess. claude-code, opencode, codex, gemini, aider, swe-agent, qwen, continue-cli — one RunSpec, one RunResult, zero per-CLI adapter code in your project.
Python — pip install harness-cli (imports as harness; harness was squatted on PyPI)
from harness import RunSpec, run
r = run(RunSpec(
harness="claude-code",
model="sonnet",
prompt="Write a one-line Python hello-world.",
workdir="/tmp/scratch",
))
print(f"exit={r.exit_code} cost=${r.cost_usd:.4f} tokens={r.tokens_in}/{r.tokens_out}")TypeScript — npm install @twaldin/harness-ts
import { run } from '@twaldin/harness-ts'
const r = await run({
harness: 'claude-code',
model: 'sonnet',
prompt: 'Write a one-line TypeScript hello-world.',
workdir: '/tmp/scratch',
})
console.log(`exit=${r.exitCode} cost=$${r.costUsd?.toFixed(4)} tokens=${r.tokensIn}/${r.tokensOut}`)See examples/hello-world.py and ts/examples/hello-world.ts for runnable versions.
You're building any of these:
- An eval framework or benchmark harness that needs to invoke multiple CLI agents headlessly and capture cost + tokens uniformly. (See agentelo.)
- A prompt optimizer that needs to run the same task against claude-code, gemini, and opencode and compare results without writing six subprocess wrappers. (See hone.)
- A coding orchestrator that spawns agents as subprocesses, injects system prompts, and needs to swap the underlying model without touching call sites.
- An interactive CLI wrapper (like flt) that needs command construction (
buildCommand()) without the subprocess execution. - Anything that would otherwise make you write "if harness == 'claude': ... elif harness == 'gemini': ..." in multiple places.
If you're writing per-CLI subprocess plumbing from scratch, this library has already done it.
I wrote per-CLI spawn / env / output-parsing logic three separate times across three projects:
flt— TS adapters insrc/adapters/{claude-code,opencode,codex,gemini,aider,swe-agent}.ts. Each one knew how to launch its CLI in tmux, strip ANSI, detect a ready prompt, send keys to approve dialogs.agentelo—bin/agentelo(1847 lines of Node) with ~800 lines ofif (harness === 'X')blocks. Per-CLI argv, env setup (Vertex tokens, GCloud, OpenAI proxy), inactivity watchdogs, six different token/cost parsers (claude's JSON envelope, codex's JSONL turn events, gemini'sstats.models, opencode's session sqlite, aider's "Tokens: N sent" scrape, swe-agent's trajectory file).hone—src/hone/mutators/claude_code.py, then almost the same logic again for ananthropic_api.pymutator, then acustom_script.pyshape, with the JSON parsing rewritten each time.
Three implementations, three sets of bugs, knowledge gained in one project never crossed to the others. When opencode changed its session DB schema, only agentelo learned. When claude --output-format json added a cache_creation_input_tokens field that mattered for accurate cost, only hone fixed it.
harness is the deduped version. Each CLI's quirks live in exactly one adapter file, all six adapters share the same RunSpec → RunResult contract, and the next consumer (TS or Python) shells out to harness run --json instead of starting from scratch.
from pathlib import Path
from harness import RunSpec, run
result = run(RunSpec(
harness="claude-code",
model="sonnet",
prompt="Fix the failing tests in this repo and report what you changed.",
workdir=Path("/tmp/my-bug-fix-checkout"),
timeout_seconds=1800,
))
print(f"exit={result.exit_code} cost=${result.cost_usd:.4f} "
f"tokens={result.tokens_in}/{result.tokens_out} "
f"wall={result.duration_seconds:.1f}s")for spec in [
RunSpec(harness="claude-code", model="sonnet", prompt=task, workdir=wd),
RunSpec(harness="opencode", model="openai/gpt-5.4", prompt=task, workdir=wd),
RunSpec(harness="gemini", model="gemini-2.5-pro", prompt=task, workdir=wd),
]:
r = run(spec)
print(f"{spec.harness:12} {spec.model:25} ${r.cost_usd or 0:.4f}")result = run(RunSpec(
harness="opencode",
model="openai/gpt-5.4",
prompt="Fix the failing test described in the issue.",
workdir=Path("/tmp/repo"),
instructions="""You are an autonomous bug-fixing agent. No human will respond.
Run the failing tests, identify the root cause, fix the source (not the tests),
verify, then stop. Make the smallest possible change.""",
timeout_seconds=1800,
))instructions is written to the per-harness config file in workdir (CLAUDE.md for claude-code, AGENTS.md for opencode/codex, GEMINI.md for gemini, QWEN.md for qwen, CONTINUE.md for continue-cli, .aider.conf.yml for aider). Filenames are baked into each adapter.
import { buildCommand } from '@twaldin/harness-ts'
const { cmd, args, cwd, env, instructionsFile } = buildCommand({
harness: 'claude-code',
model: 'sonnet',
prompt: 'Fix the failing tests.',
workdir: '/tmp/repo',
instructions: 'You are a careful engineer.',
})
// hand off to tmux, a process manager, or spawnSynchone run prompt.md \
--grader ./grade.sh \
--mutator harness:claude-code:sonnet \
--budget 20pip install harness-cliThe PyPI name is harness-cli (harness was squatted). The Python import is from harness import ....
For dev work:
git clone https://github.com/twaldin/harness
cd harness
pip install -e ".[dev]"npm install @twaldin/harness-ts
# or: bun add @twaldin/harness-tsSee ts/README.md for full TypeScript docs.
harness list
harness run --harness opencode --model openai/gpt-5.4 \
--workdir /tmp/repo --instructions /tmp/agents.md \
--timeout 1800 \
"Fix the failing tests."Add --json to emit a structured RunResult on stdout:
{
"harness": "opencode",
"model": "openai/gpt-5.4",
"exit_code": 0,
"duration_seconds": 47.2,
"cost_usd": 0.0821,
"tokens_in": 4201,
"tokens_out": 887,
"timed_out": false,
"stdout": "...",
"stderr": ""
}Each adapter:
- Writes
spec.instructionsto its known filename inspec.workdir(if provided). - Builds the CLI invocation for
spec.prompt+spec.model. - Calls the shared subprocess runner (env merge, cwd, timeout, capture).
- Parses any structured output the CLI emits and fills
RunResult.cost_usd/tokens_in/tokens_out/raw.
See ADAPTER-MATRIX.md for per-CLI flag details, cost-reporting quirks, and output shapes.
See SPEC.md for the full RunSpec / RunResult schema and compatibility guarantees.
harness does not create or manage git worktrees. workdir is opaque — pass any directory you've set up:
- a fresh
git cloneinto a tmpdir - a
git worktree addpath - the user's existing checkout
- a Docker volume mount
The opt-in --worktree features in some CLIs (e.g. claude --worktree) are intentionally not wrapped — they pollute the project tree and reduce consumer flexibility.
hone—harness:mutator prefix routes prompt mutations throughharness.run().agentelo— migrating from ~800 lines of per-harness TS blocks toharness run --json.flt— uses@twaldin/harness-tsfor CLI command construction; flt adds tmux lifecycle on top.
See CONTRIBUTING.md for code conventions and the "add an adapter" guide (~20 minutes).
Looking for a pre-scoped first PR? See WANTED-ADAPTERS.md. Each entry lists the CLI, adapter-to-copy-from, effort estimate, and the research already done.
v0.3 — all eight adapters shipped: claude-code, opencode, codex, gemini, aider, swe-agent, qwen, continue-cli.
Pending:
- Per-harness inactivity watchdogs (port from
agentelo/bin/agentelo). - Vertex AI / GCloud token plumbing (currently consumer-supplied via
env). - Wire as the spawn backend for flt and agentelo (TS → Python subprocess boundary; design TBD).