Orchestrate fleets of AI coding agents across providers
An MCP server that lets Claude Code dispatch work to Codex, Gemini, and Claude agents — running side-by-side in isolated git worktrees, with multi-stage workflows where agents collaborate across provider boundaries.
Claude Code ──> codefleet ──> Codex worker (implement)
──> Claude worker (review)
──> Codex worker (refine from review)
You have access to multiple AI coding agents. Each has different strengths. But there's no way to make them work together on the same codebase, passing results between stages, without manual copy-paste.
codefleet fixes this. Define a workflow, pick which agent runs each stage, and let the results flow automatically.
uv pip install codefleetOr with pip:
pip install codefleetAt least one of these AI CLIs must be installed:
| Agent | Install | Verify |
|---|---|---|
| Codex | npm i -g @openai/codex |
codex --version |
| Gemini | npm i -g @google/gemini-cli |
gemini --version |
| Claude | npm i -g @anthropic-ai/claude-code |
claude --version |
Plus Git and Python 3.11+.
claude mcp add -s user codefleet -- uvx -U codefleetThat's it. The -s user scope makes codefleet available in every project automatically. Restart Claude Code and the tools are ready.
With options:
claude mcp add -s user codefleet \
-e FLEET_ALLOWED_REPOS=/path/to/repo-a,/path/to/repo-b \
-e FLEET_MAX_SPAWN_DEPTH=2 \
-- uvx -U codefleetProject-only (omit -s user to register for the current project only):
claude mcp add codefleet -- uvx -U codefleetVerify:
claude mcp listStart Claude Code and ask it to use multi-agent workflows:
"Use codefleet to add input validation to the registration endpoint. Have Codex implement it, Claude review it, then Codex refine based on the review."
Claude will call create_workflow with a 3-stage pipeline. Each stage runs the right agent automatically.
For simple one-off tasks, use create_worker directly:
"Spin up a Codex worker to add rate limiting to the API gateway"
"Launch a Claude worker to write tests for the auth module"
"Use a Gemini worker to refactor the database models"
The workflow engine runs a DAG of stages. Each stage picks an executor, gets a prompt template with variables from previous stages, and runs in an isolated (or inherited) git worktree.
The core pattern: one agent implements, another reviews, the first refines.
{
"name": "write-review-refine",
"task_prompt": "Add input validation to the registration endpoint",
"stages": [
{
"name": "implement",
"executor": "codex",
"prompt_template": "{task_prompt}",
"worktree_strategy": "new",
"depends_on": []
},
{
"name": "review",
"executor": "claude",
"prompt_template": "Review these changes:\n{stage_0_summary}\nFiles: {stage_0_files}",
"worktree_strategy": "inherit",
"depends_on": [0]
},
{
"name": "refine",
"executor": "codex",
"prompt_template": "Address this review:\n{stage_1_summary}\n{stage_1_next_steps}",
"worktree_strategy": "inherit",
"depends_on": [1]
}
]
}Multiple agents work in parallel, then a reviewer checks all of them:
{
"stages": [
{"name": "module-a", "executor": "codex", "prompt_template": "{task_prompt}: module A", "depends_on": []},
{"name": "module-b", "executor": "gemini", "prompt_template": "{task_prompt}: module B", "depends_on": []},
{
"name": "review",
"executor": "claude",
"prompt_template": "Review both:\nA: {stage_0_summary}\nB: {stage_1_summary}",
"worktree_strategy": "new",
"depends_on": [0, 1]
}
]
}Two agents implement the same thing, then a judge picks the better one:
{
"stages": [
{"name": "codex-impl", "executor": "codex", "prompt_template": "{task_prompt}", "worktree_strategy": "new", "depends_on": []},
{"name": "claude-impl", "executor": "claude", "prompt_template": "{task_prompt}", "worktree_strategy": "new", "depends_on": []},
{
"name": "evaluate",
"executor": "claude",
"prompt_template": "Compare:\nA: {stage_0_summary}\nB: {stage_1_summary}\nWhich is better?",
"worktree_strategy": "new",
"depends_on": [0, 1]
}
]
}See examples/ for complete, copy-paste-ready workflow files.
Available in stage prompt_template strings:
| Variable | Value |
|---|---|
{task_prompt} |
The workflow's top-level task description |
{stage_N_summary} |
Summary from stage N's result |
{stage_N_files} |
Comma-separated list of files changed in stage N |
{stage_N_next_steps} |
Suggested next steps from stage N |
{stage_N_status} |
"completed" or "blocked" |
{stage_N_result} |
Full result JSON from stage N |
Literal curly braces in prompts (JSON examples, code snippets) are safe — only the variables above are substituted.
| Tool | Description |
|---|---|
healthcheck |
Verify codefleet, agent CLIs, and Git are available |
create_worker |
Launch a single agent in an isolated git worktree |
get_worker_status |
Check worker status |
list_workers |
List workers, optionally filtered by status |
collect_worker_result |
Get parsed results and optional log tails |
cancel_worker |
Cancel a running worker |
cleanup_worker |
Remove worktree, branch, and artifacts |
| Tool | Description |
|---|---|
create_workflow |
Start a multi-stage DAG workflow |
get_workflow_status |
Check workflow and per-stage status |
list_workflows |
List workflows with optional status filter |
cancel_workflow |
Cancel all running stages |
collect_workflow_result |
Get final or all-stage results |
cleanup_workflow |
Clean up all worktrees and branches |
| Variable | Default | Description |
|---|---|---|
FLEET_DEFAULT_EXECUTOR |
codex |
Default agent: codex, gemini, or claude |
FLEET_DEFAULT_MODEL |
gpt-5.4 |
Default Codex model |
FLEET_GEMINI_DEFAULT_MODEL |
gemini-3.1-pro-preview |
Default Gemini model |
FLEET_CLAUDE_DEFAULT_MODEL |
claude-opus-4-6 |
Default Claude model |
FLEET_DEFAULT_TIMEOUT |
600 |
Per-worker safety timeout in seconds (stale detection is the primary mechanism) |
FLEET_MAX_CONCURRENT |
50 |
Max simultaneous workers |
FLEET_MAX_SPAWN_DEPTH |
2 |
How deep agents can recursively spawn sub-agents |
FLEET_ALLOWED_REPOS |
(all) | Comma-separated allowlist of repo paths |
FLEET_BASE_DIR |
~/.codex-fleet |
Data directory for workers and DB |
FLEET_RATE_LIMIT_MAX_RETRIES |
3 |
Auto-retries on 429 rate-limit errors |
FLEET_RATE_LIMIT_BASE_DELAY |
4.0 |
Initial backoff delay in seconds (doubles each retry) |
FLEET_RATE_LIMIT_MAX_DELAY |
60.0 |
Maximum backoff delay cap in seconds |
FLEET_STALE_TIMEOUT |
120 |
Seconds of no output before a worker is considered stale and restarted |
FLEET_STALE_MAX_RESTARTS |
2 |
Max stale restarts before giving up |
- Isolation — each worker gets its own git worktree and branch (
{executor}/{task}/{id}) - Stale detection — monitors stdout/stderr activity; restarts workers that go silent for 2 minutes (preserving worktree state)
- Rate-limit retry — automatically retries on 429 errors with exponential backoff (4s, 8s, 16s)
- Structured output — every agent writes a
result.jsonwith summary, files changed, test results, and next steps - Structured progress — status responses include elapsed times, progress bars, and per-stage summaries
- Durability — all state lives in SQLite (WAL mode), survives crashes and restarts
- Concurrency control — configurable limits on concurrent workers and spawn depth
git clone https://github.com/techinfobel/codefleet
cd codefleet
uv pip install -e ".[dev]"
python -m pytest tests/ -v # 192 tests
python -m pytest tests/ --cov# Install asciinema + agg (for GIF conversion)
brew install asciinema
cargo install --git https://github.com/asciinema/agg
# Record
asciinema rec demo/demo.cast -c "python demo/demo.py" --cols 100 --rows 32
# Convert to GIF
agg demo/demo.cast demo/demo.gif --cols 100 --rows 32MIT
