A low footprint harness CLI that runs three agents in a loop to implement a project from a PRD: an orchestrator picks the next story, a builder writes the code, and a verifier reviews it. Every handoff goes through a zod-validated JSON file you can read, diff, and replay.
Works on greenfield projects and existing codebases.
Requires Bun and Claude Code.
export ANTHROPIC_API_KEY=sk-ant-...
bunx marmite init # interview + scaffold
bunx marmite to-prd ./PRD.md # creates and validates .marmite/prd.json
bunx marmite cook # start the loopmarmite init interviews you, detects whether the repo is greenfield or has existing code, and wires sensors to configs already in your repo. Nothing is overwritten without asking.
my-project/
├── marmite.json # config: paths, sensors, models, budgets
├── .marmite/
│ ├── prd.json # the PRD that drives the loop (git)
│ ├── progress.json # rolling story + janitor timeline (git)
│ ├── current-task.json # per-iteration agent handoff (git)
│ ├── prompts/ # agent prompts (git)
│ ├── events.jsonl # per-session event log (ignored)
│ └── feedback.md # async notes, dropped mid-run (ignored)
└── app/ # your code (path is configurable)
┌───────────────────────────────────────────────────┐
│ current-task.json │ ◀── shared handoff
└───────────────────────────────────────────────────┘ (fail loops here:
▲ ▲ ▲ Verifier writes
│ │ │ verdict, Builder
▼ ▼ ▼ reads + retries)
┌──────────────┐ ┌─────────┐ ┌──────────┐ pass ┌──────────┐
│ Orchestrator │ ──▶│ Builder │ ──▶│ Verifier │ ────────▶ │ prd.json │
└──────────────┘ └────┬────┘ └──────────┘ └────┬─────┘
▲ │ │
│ │ commit │
│ ▼ │
│ ┌──────────────┐ │
│ │ progress.json│ │
│ └──────────────┘ │
│ │
└───────────────────────────────────────────────────────┘
next story
| Phase | Agent | Output |
|---|---|---|
ORCHESTRATE |
Orchestrator | Picks story, runs sensors, writes the brief |
BUILD |
Builder | Implements, commits |
VERIFY |
Verifier | Approves or rejects |
FIX |
Builder | Resumes the same session to address feedback |
JANITOR (optional) |
Builder | Maintenance pass that pays down debt and reverses architecture drift when sensor counts cross the configured threshold |
current-task.json is the single handoff. If a run crashes, run marmite cook again: the orchestrator picks the next non-passing story, and any in-flight story without a verify: commit gets re-attempted.
You can steer a running loop without stopping it. Drop a note any time:
echo "login UI feels cramped, add vertical spacing on the next pass" > .marmite/feedback.mdThe next iteration folds the note into story selection and guidance, then deletes the file. The PRD stays untouched, so feedback shapes the upcoming pass only. The harness clears the file as a fallback if the orchestrator forgets.
marmite.json lives at the project root. JSONC syntax (line/block comments and trailing commas allowed). Every field is optional — anything you omit falls back to the harness defaults below, and most fields can be overridden per-run via marmite cook flags (marmite cook --help).
A representative config:
| Field | Default | What it does |
|---|---|---|
app |
./app |
Project subdir the agents cd into. Relative to marmite.json. |
prd |
./.marmite/prd.json |
Path to the validated PRD that drives the loop. |
baseBranch |
(unset) | Base branch sensors diff against and PR-gated workflows target. |
workflow |
one-shot |
Which prompt bundle to load. See Workflows. |
workflowConfig |
{} |
Workflow-specific knobs (e.g. pr-on-checkpoint checkpoint kind). |
sensors |
[] |
Deterministic checks run between stories. See Sensors. |
janitor |
(unset) | Sensor-finding thresholds that trigger a maintenance pass. See Sensors. |
mcpServers |
{} |
Optional MCP servers exposed to every agent — stdio/http/sse entries following the Claude Agent SDK shape. The harness keeps strictMcpConfig on, so user/global MCP config is ignored. |
models.default |
claude-sonnet-4-6 |
Fallback model used by any role left unset. |
models.builder / verifier / orchestrator |
inherit default |
Per-role override. |
timeouts.{build,verify,fix,orchestrate} |
20m / 10m / 15m / 10m |
Per-phase wall-clock cap. Accepts 20m, 600s, 1h, or raw ms. |
budget.perStory |
15 (USD) |
Hard cap per story; 0 disables. |
budget.total |
0 (disabled) |
Whole-run cap that halts the loop between iterations. marmite init writes 100. |
retries.fix |
3 |
Fix attempts per failing story before giving up. |
retries.transient |
2 |
Per-session retries on transient SDK errors. |
maxIterations |
1000 |
Loop cap — marmite cook exits once reached. |
A workflow is a bundle of three agent prompts (orchestrator, builder, verifier) that determines how the loop behaves. marmite init asks you to pick one and copies the matching prompts into .marmite/prompts/. The selection is recorded in marmite.json as "workflow": "<name>".
| Workflow | What it does |
|---|---|
one-shot (default) |
Implements every story end-to-end without external gates. |
pr-on-checkpoint |
Opens a GitHub PR and halts when a configured checkpoint fires. workflowConfig.kind selects the trigger: every (after N passing stories — N=1 is one PR per story) or epic (after the last story of a PRD epic passes). Requires gh (authenticated). |
tdd |
Builder writes failing tests for each acceptance criterion before the implementation commit. Verifier confirms test: predates feat:. |
The PR-gated workflow uses a small halt field in .marmite/current-task.json — when present, the harness emits a run_halt event and exits 0 cleanly. The next marmite cook invocation re-enters the orchestrator, which checks gh pr view and either resumes (on merge) or rewrites the same halt and exits again.
To override a workflow's defaults, drop orchestrator-prompt.md, builder-prompt.md, or verifier-prompt.md into .marmite/prompts/ — they're checked in alongside the rest of the workflow.
Sensors are deterministic checks the orchestrator runs before handing off to the builder — they surface debt and drift in the diff so the brief can ask for the right fix. Each sensor entry in marmite.json declares a name, a type (debt or drift), and a guidance shell snippet the orchestrator executes. Sensors are scoped to files modified by the current run (git diff --name-only $baseBranch...HEAD), so they never report on untouched code in a brownfield repo.
The set of sensors that get installed — and the rules they encode — is determined by the workflow you pick at marmite init. Configs land under ./.marmite/sensors/ and are tracked in git; edit them to tune what counts as a violation, or remove a sensor's entry from marmite.json to disable it.
When sensor findings accumulate past the thresholds in marmite.json's janitor block, the orchestrator schedules a JANITOR phase instead of the next story — a maintenance run that focuses solely on paying down debt and reversing architecture drift before regular story work resumes:
"janitor": {
"thresholds": { "debt": 20, "drift": 5 }, // either trips a run
"maxFindingsPerRun": 5, // small batches per pass
"budgetUsd": 3
}| Command | Purpose |
|---|---|
marmite / marmite cook |
Run the agent loop in the current project |
marmite <n> |
Shorthand for marmite cook -n <n> (cap iterations) |
marmite init |
Interactive wizard — scaffolds marmite.json, .marmite/, prompts, sensors |
marmite to-prd <PRD.md> |
Convert a markdown PRD into .marmite/prd.json and validate it |
marmite doctor |
Preflight check — config, prompts, contract fences, sensors, gitignore |
marmite stats [path] |
Post-run summary of .marmite/events.jsonl (cost, durations, retries, cache hits) |
marmite dashboard [path] |
Serve a live HTML dashboard backed by events.jsonl + prd.json + progress.json |
-c, --config <path> Config file path (default: ./marmite.json)
-n, --max-iterations <n> Cap the loop
-p, --prd <path> Override prd.json location
--model <id> Default model (fallback for all roles)
--builder-model <id> Override builder/fix model
--verifier-model <id> Override verify model
--build-timeout <dur> e.g. 20m, 600s, 1h
--verify-timeout <dur>
--fix-timeout <dur>
--cost-budget <usd> Per-story budget (0 disables)
--cost-budget-total <usd> Whole-run budget
--max-fix-attempts <n> Fix attempts per story
--retries <n> Transient retries per session
-v, --verbose Raw SDK messages and stats
--run <id> Restrict to a specific runId
--all Fold all runs in the file together (default: latest)
--json Machine-readable output
--port <n> Port to listen on (default: 4321)
--host <h> Host to bind (default: 127.0.0.1)
--no-open Don't open the browser
{ "app": "./app", "prd": "./.marmite/prd.json", "workflow": "one-shot", "sensors": [ /* see Sensors */ ], "janitor": { /* see Sensors */ }, "mcpServers": { /* see MCP servers */ }, "models": { "default": "claude-sonnet-4-6", "builder": "claude-sonnet-4-6", "verifier": "claude-haiku-4-5", "orchestrator": "claude-sonnet-4-6" }, "timeouts": { "build": "20m", "verify": "10m", "fix": "15m", "orchestrate": "10m" }, "budget": { "perStory": 15, "total": 100 }, "retries": { "fix": 3, "transient": 2 }, "maxIterations": 1000 }