Active development. Changelog | Report issues
🌐 Prefer a visual overview? See the Flow-Code app page for diagrams and examples.
New: Codex Review Backend. Cross-model reviews now work on Linux/Windows via OpenAI Codex CLI. Same Carmack-level criteria as RepoPrompt. See Cross-Model Reviews for setup.
- What Is This?
- Why It Works
- Quick Start — Install, setup, use
- When to Use What — Interview vs Plan vs Work
- Agent Readiness Assessment —
/flow-code:prime - Troubleshooting
- Codebase Map — Architecture documentation via parallel subagents
- Auto-Improve — Autonomous code optimization
- Ralph (Autonomous Mode) — Run overnight
- Features — Re-anchoring, multi-user, reviews, dependencies
- Commands — All slash commands + flags
- Command Reference — Detailed input docs for each command
- The Workflow — Planning and work phases
- .flow/ Directory — File structure
- flowctl CLI — Direct CLI usage
Flow-Code is a Claude Code plugin for plan-first orchestration. Bundled task tracking, dependency graphs, re-anchoring, and cross-model reviews.
Everything lives in your repo. No external services. No global config. Uninstall: delete .flow/ (and scripts/ralph/ if enabled).
![]() |
![]() |
| Planning: dependency-ordered tasks | Execution: fixes, evidence, review |
Flow-Code does not support standalone tasks.
Every unit of work belongs to an epic fn-N (even if it's a single task).
Tasks are always fn-N.M and inherit context from the epic spec.
Flow-Code always creates an epic container (even for one-offs) so every task has a durable home for context, re-anchoring, and automation. You never have to think about it.
Rationale: keeps the system simple, improves re-anchoring, makes automation (Ralph) reliable.
"One-off request" -> epic with one task.
Work task-by-task with full review cycles for maximum control. Or throw the whole epic at it and let Flow-Code handle everything. Same guarantees either way.
# One task at a time (review after each)
/flow-code:work fn-1.1
# Entire epic (sequential, review after all tasks complete)
/flow-code:work fn-1
# Entire epic (parallel — independent tasks run simultaneously)
/flow-code:work fn-1 --parallelAll modes get: re-anchoring before each task, evidence recording, cross-model review (if rp-cli available).
Parallel mode (Wave-Checkpoint-Wave): Spawns workers for ALL ready tasks (no unresolved dependencies) simultaneously. After each batch completes, a structured Batch Checkpoint runs: aggregate results, verify integration (guards + invariants), output a wave summary, then plan the next wave. Newly unblocked tasks become ready for the next batch. Safe because flowctl ready only returns tasks with all dependencies resolved.
Workers also use file-level Wave parallelism within each task — when touching 3+ files, they issue parallel reads in one message, analyze dependencies at a checkpoint, then issue parallel edits. This achieves 3-4x speedup over sequential file I/O.
Review timing: The RepoPrompt review runs once at the end of the work package—after a single task if you specified fn-N.M, or after all tasks if you specified fn-N. For tighter review loops on large epics, work task-by-task.
- Tasks sized at planning: Every task is scoped to fit one work iteration
- Re-anchor every task: Fresh context from
.flow/specs before each task - Survives compaction: Re-anchors after conversation summarization too
- Fresh context in Ralph: Each iteration starts with a clean context window
Never worry about 200K token limits again.
If drift happens despite re-anchoring, a different model catches it before it compounds:
- Claude implements task
- GPT reviews via RepoPrompt (sees full files, not diffs)
- Reviews block until
SHIPverdict - Fix → re-review cycles continue until approved
Two models catch what one misses.
- Works in 30 seconds. Install the plugin, run a command. No setup.
- Non-invasive. No CLAUDE.md edits. No daemons. (Ralph uses plugin hooks for enforcement.)
- Clean uninstall. Delete
.flow/(andscripts/ralph/if enabled). - Multi-user safe. Teams work parallel branches without coordination servers.
# Add marketplace
/plugin marketplace add https://github.com/z23cc/flow-code
# Install flow-code
/plugin install flow-code/flow-code:setupThis is technically optional but highly recommended. It:
- Configures review backend (RepoPrompt, Codex, or none) — required for cross-model reviews
- Copies
flowctlto.flow/bin/for direct CLI access - Adds flow-code instructions to CLAUDE.md/AGENTS.md (helps other AI tools understand your project)
- Creates
.flow/usage.mdwith full CLI reference
Idempotent - safe to re-run. Detects plugin updates and refreshes scripts automatically.
After setup:
export PATH=".flow/bin:$PATH"
flowctl --help
flowctl epics # List all epics
flowctl tasks --epic fn-1 # List tasks for epic
flowctl ready --epic fn-1 # What's ready to work on# Spec: "create a spec for X" — writes epic with structured requirements
# Then plan or interview to refine
# Plan: research, create epic with tasks
/flow-code:plan Add a contact form with validation
# Work: execute tasks in dependency order
/flow-code:work fn-1
# Or work directly from a spec file (creates epic automatically)
/flow-code:work docs/my-feature-spec.mdThat's it. Flow-Code handles research, task ordering, reviews, and audit trails.
Flow-next is flexible. There's no single "correct" order — the right sequence depends on how well-defined your spec already is.
The key question: How fleshed out is your idea?
Create spec → Interview or Plan → Work
- Create spec — ask Claude to "create a spec for X". This creates an epic with a structured spec (goal, architecture, API contracts, edge cases, acceptance criteria, boundaries, decision context) — no tasks yet
- Refine or plan:
/flow-code:interview fn-1— deep Q&A to pressure-test the spec, surface gaps/flow-code:plan fn-1— research best practices + break into tasks
- Work —
/flow-code:work fn-1executes with re-anchoring and reviews
Best for: features where you want to nail down the WHAT/WHY before committing to HOW. The spec captures everything an implementer needs.
Interview → Plan → Work
- Interview first —
/flow-code:interview "your rough idea"asks 40+ deep questions to surface requirements, edge cases, and decisions you haven't thought about - Plan —
/flow-code:plan fn-1takes the refined spec and researches best practices, current docs, repo patterns, then splits into properly-sized tasks - Work —
/flow-code:work fn-1executes with re-anchoring and reviews
Plan → Interview → Work
- Plan first —
/flow-code:plan specs/my-feature.mdresearches best practices and current patterns, then breaks your spec into epic + tasks - Interview after —
/flow-code:interview fn-1runs deep questions against the plan to catch edge cases, missing requirements, or assumptions - Work —
/flow-code:work fn-1executes
Plan → Work
Skip interview entirely for well-understood changes. Plan still researches best practices and splits into tasks.
Work directly
/flow-code:work specs/small-fix.mdFor small, self-contained changes where you already have a complete spec. Creates an epic with one task and executes immediately. You get flow tracking, re-anchoring, and optional review — without full planning overhead.
Best for: bug fixes, small features, well-scoped changes that don't need task splitting.
Note: This does NOT split into multiple tasks. For detailed specs that need breakdown, use Plan first.
Summary:
| Starting point | Recommended sequence |
|---|---|
| New feature, want solid spec first | Spec → Interview/Plan → Work |
| Vague idea, rough notes | Interview → Plan → Work |
| Detailed spec/PRD | Plan → Interview → Work |
| Well-understood, needs task splitting | Plan → Work |
| Small single-task, spec complete | Work directly (creates 1 epic + 1 task) |
Spec vs Interview vs Plan:
- Spec (just ask "create a spec") creates an epic with structured requirements (goal, architecture, API contracts, edge cases, acceptance criteria, boundaries). No tasks, no codebase research.
- Interview refines an epic via deep Q&A (40+ questions). Writes back to the epic spec only — no tasks.
- Plan researches best practices, analyzes existing patterns, and creates sized tasks with dependencies.
You can always run interview again after planning to catch anything missed. Interview writes back to the epic spec only — it won't modify existing tasks.
Inspired by Factory.ai's Agent Readiness framework
/flow-code:prime assesses your codebase for agent-readiness and proposes improvements. Works for greenfield and brownfield projects.
Agents waste cycles when codebases lack:
- Pre-commit hooks → waits 10min for CI instead of 5sec local feedback
- Documented env vars → guesses, fails, guesses again
- CLAUDE.md → doesn't know project conventions
- Test commands → can't verify changes work
These are environment problems, not agent problems. Prime helps fix them.
/flow-code:prime # Full assessment + interactive fixes
/flow-code:prime --report-only # Just show the report
/flow-code:prime --fix-all # Apply all fixes without askingPrime evaluates your codebase across eight pillars (48 criteria total):
| Pillar | What It Checks |
|---|---|
| 1. Style & Validation | Linters, formatters, type checking, pre-commit hooks |
| 2. Build System | Build tool, commands, lock files, monorepo tooling |
| 3. Testing | Test framework, commands, verification, coverage, E2E |
| 4. Documentation | README, CLAUDE.md, setup docs, architecture |
| 5. Dev Environment | .env.example, Docker, devcontainer, runtime version |
| Pillar | What It Checks |
|---|---|
| 6. Observability | Structured logging, tracing, metrics, error tracking, health endpoints |
| 7. Security | Branch protection, secret scanning, CODEOWNERS, Dependabot |
| 8. Workflow & Process | CI/CD, PR templates, issue templates, release automation |
Two-tier approach: Pillars 1-5 determine your agent maturity level and are eligible for fixes. Pillars 6-8 are reported for visibility but no fixes are offered — these are team/production decisions.
| Level | Name | Description | Overall Score |
|---|---|---|---|
| 1 | Minimal | Basic project structure only | <30% |
| 2 | Functional | Can build and run, limited docs | 30-49% |
| 3 | Standardized | Agent-ready for routine work | 50-69% |
| 4 | Optimized | Fast feedback loops, comprehensive docs | 70-84% |
| 5 | Autonomous | Full autonomous operation capable | 85%+ |
Level 3 is the target for most teams. It means agents can handle routine work: bug fixes, tests, docs, dependency updates.
-
Parallel Assessment — 9 haiku scouts run in parallel (~15-20 seconds):
Agent Readiness scouts:
tooling-scout— linters, formatters, pre-commit, type checkingclaude-md-scout— CLAUDE.md/AGENTS.md analysisenv-scout— environment setuptesting-scout— test infrastructurebuild-scout— build systemdocs-gap-scout— README, ADRs, architecture docs
Production Readiness scouts:
observability-scout— logging, tracing, metrics, health endpointssecurity-scout— GitHub API checks, CODEOWNERS, Dependabotworkflow-scout— CI/CD, templates, automation
-
Verification — Verifies test commands actually work (e.g.,
pytest --collect-only) -
Synthesize Report — Calculates Agent Readiness score, Production Readiness score, and maturity level
-
Interactive Remediation — Uses
AskUserQuestionfor agent readiness fixes only:Which tooling improvements should I add? ☐ Add pre-commit hooks (Recommended) ☐ Add linter config ☐ Add runtime version file -
Apply Fixes — Creates/modifies files based on your selections
-
Re-assess — Optionally re-run to show improvement
# Agent Readiness Report
**Repository**: my-project
**Assessed**: 2026-01-23
## Scores Summary
| Category | Score | Level |
|----------|-------|-------|
| **Agent Readiness** (Pillars 1-5) | 73% | Level 4 - Optimized |
| Production Readiness (Pillars 6-8) | 17% | — |
| **Overall** | 52% | — |
## Agent Readiness (Pillars 1-5)
| Pillar | Score | Status |
|--------|-------|--------|
| Style & Validation | 67% (4/6) | ⚠️ |
| Build System | 100% (6/6) | ✅ |
| Testing | 67% (4/6) | ⚠️ |
| Documentation | 83% (5/6) | ✅ |
| Dev Environment | 83% (5/6) | ✅ |
## Production Readiness (Pillars 6-8) — Report Only
| Pillar | Score | Status |
|--------|-------|--------|
| Observability | 33% (2/6) | ❌ |
| Security | 17% (1/6) | ❌ |
| Workflow & Process | 0% (0/6) | ❌ |
## Top Recommendations (Agent Readiness)
1. **Tooling**: Add pre-commit hooks — 5 sec feedback vs 10 min CI wait
2. **Tooling**: Add Python type checking — catch errors locally
3. **Docs**: Update README — replace generic templatePrime offers fixes for agent readiness gaps (not team governance):
| Fix | What Gets Created |
|---|---|
| CLAUDE.md | Project overview, commands, structure, conventions |
| .env.example | Template with detected env vars |
| Pre-commit (JS) | Husky + lint-staged config |
| Pre-commit (Python) | .pre-commit-config.yaml |
| Linter config | ESLint, Biome, or Ruff config (if none exists) |
| Formatter config | Prettier or Biome config (if none exists) |
| .nvmrc/.python-version | Runtime version pinning |
| .gitignore entries | .env, build outputs, node_modules |
Templates adapt to your project's detected conventions and existing tools. Won't suggest ESLint if you have Biome, etc.
By default, prime asks before every change using interactive checkboxes. You choose what gets created.
- Asks first — uses
AskUserQuestiontool for interactive selection per category - Never overwrites existing files without explicit consent
- Never commits changes (leaves for you to review)
- Never deletes files
- Merges with existing configs when possible
- Respects your existing tools (won't add ESLint if you have Biome)
Use --fix-all to skip questions and apply everything. Use --report-only to just see the assessment.
| Flag | Description |
|---|---|
--report-only |
Skip remediation, just show report |
--fix-all |
Apply all recommendations without asking |
<path> |
Assess a different directory |
After planning completes, you choose how to execute:
| Mode | Command | When to Use |
|---|---|---|
| Interactive | /flow-code:work fn-1 |
Complex tasks, learning a codebase, taste matters, want to intervene |
| Autonomous (Ralph) | scripts/ralph/ralph.sh |
Clear specs, bulk implementation, overnight runs |
The heuristic: If you can write checkboxes, you can Ralph it. If you can't, you're not ready to loop—you're ready to think.
For full autonomous mode, prepare 5-10 plans before starting Ralph. See Ralph Mode for setup.
📖 Deep dive: Ralph Mode: Why AI Agents Should Forget
# Check task status
flowctl show fn-1.2 --json
# Reset to todo (from done/blocked)
flowctl task reset fn-1.2
# Reset + dependents in same epic
flowctl task reset fn-1.2 --cascadeRun manually in terminal (not via AI agent):
# Remove all flow state (keeps git history)
rm -rf .flow/
# Re-initialize
flowctl init# Check run progress
cat scripts/ralph/runs/*/progress.txt
# View iteration logs
ls scripts/ralph/runs/*/iter-*.log
# Check for blocked tasks
ls scripts/ralph/runs/*/block-*.md# Check receipt exists
ls scripts/ralph/runs/*/receipts/
# Verify receipt format
cat scripts/ralph/runs/*/receipts/impl-fn-1.1.json
# Must have: {"type":"impl_review","id":"fn-1.1",...}Caution: If you have custom instructions for
rp-cliin yourCLAUDE.mdorAGENTS.md, they may conflict with Flow-Code's RepoPrompt integration.
Flow-Code's plan-review and impl-review skills include specific instructions for rp-cli usage (window selection, builder workflow, chat commands). Custom rp-cli instructions can override these and cause unexpected behavior.
Symptoms:
- Reviews not using the correct RepoPrompt window
- Builder not selecting expected files
- Chat commands failing or behaving differently
Fix: Remove or comment out custom rp-cli instructions from your CLAUDE.md/AGENTS.md when using Flow-Code reviews. The plugin provides complete rp-cli guidance.
Generate comprehensive architecture documentation using parallel Sonnet subagents.
/flow-code:mapCreates docs/CODEBASE_MAP.md with:
- Architecture diagram (Mermaid)
- Module guide (purpose, exports, dependencies per file)
- Data flow diagrams
- Conventions and gotchas
- Navigation guide ("To add an API endpoint: touch these files")
How it works:
- Scans file tree with token counts (respects .gitignore)
- Splits work into ~150k token chunks
- Spawns Sonnet subagents in parallel to analyze each chunk
- Synthesizes reports into a single map document
Update mode — re-run to update only changed modules:
/flow-code:map --updateIntegrated with flow-code workflow:
repo-scoutreads the map first during planning (faster, more accurate)auto-improvereads the map before each experiment (better context)context-scoutbenefits from architecture overview
Based on Cartographer (MIT).
Inspired by Karpathy's autoresearch — 700 experiments in 2 days, 19% performance gain at Shopify.
One command to start autonomous code improvement. Auto-detects project type, guard commands, and runs immediately.
/flow-code:auto-improve "fix N+1 queries and add missing tests" --scope src/That's it. Flow-Code detects your project (Django/React/Next.js), finds lint+test commands, creates an experiment branch, and starts improving. Each experiment: discover → implement → test → keep or discard.
More examples:
# Next.js bundle optimization
/flow-code:auto-improve "reduce bundle size" --scope src/components/ --max 20
# Security hardening
/flow-code:auto-improve "fix security vulnerabilities" --scope src/api/ src/auth/
# Test coverage
/flow-code:auto-improve "improve test coverage to 80%"
# Watch mode (see what agent is doing)
/flow-code:auto-improve "optimize API performance" --scope src/ --watchHow it works:
for each experiment (up to --max, default 50):
1. Agent reads code + previous experiments (learns from history)
2. Discovers ONE improvement opportunity
3. Writes test first (TDD style)
4. Implements minimal change (scope-restricted)
5. Runs guard (auto-detected lint + tests must pass)
6. Judges: keep (git commit) or discard (git reset)
7. Logs to experiments.jsonl → summary.md at end
What's auto-detected:
| Project | Guard command |
|---|---|
| Django + ruff | ruff check . && python -m pytest -x -q |
| Django + pytest | python -m pytest -x -q |
| Next.js/React | npm run lint && npm test |
| No tests found | Warning — set GUARD_CMD in config.env |
Customization:
scripts/auto-improve/program.md— edit to change improvement focus and judgment criteriascripts/auto-improve/config.env— override goal, scope, guard, max experiments
Output:
experiments.jsonl— every experiment logged (hypothesis, result, commit)summary.md— generated at end with kept/discarded/crashed counts- Kept improvements committed on
auto-improve/<date>branch
Using with Codex CLI:
# Set CLAUDE_BIN to use Codex instead of Claude
CLAUDE_BIN=codex scripts/auto-improve/auto-improve.sh
# Or set in config.env for persistent use
# CLAUDE_BIN=codex
# AUTO_IMPROVE_CODEX_MODEL=gpt-5.4Auto-improve auto-detects the CLI type and uses the correct flags (Claude: -p --output-format stream-json, Codex: -q --full-auto).
Ralph vs Auto-Improve:
| Ralph | Auto-Improve | |
|---|---|---|
| Purpose | Execute planned tasks | Explore & optimize |
| Input | Epic with spec + tasks | Goal + scope |
| Approach | Follow plan exactly | Discover improvements |
| Output | Completed features | Incremental code improvements |
| When | You know WHAT to build | You want code to get BETTER |
Run manually in terminal (DCG blocks these from AI agents):
rm -rf .flow/ # Core flow state
rm -rf scripts/ralph/ # Ralph (if enabled)Or use /flow-code:uninstall which cleans up docs and prints commands to run.
⚠️ Safety first: Ralph defaults toYOLO=1(skips permission prompts).
- Start with
ralph_once.shto observe one iteration- Consider Docker sandbox for isolation
- Consider DCG (Destructive Command Guard) to block destructive commands — see DCG setup
Community sandbox setups (alternative approaches):
- devcontainer-for-claude-yolo-and-flow-code — VS Code devcontainer with Playwright, firewall whitelisting, and RepoPrompt MCP bridge
- agent-sandbox — Docker Sandbox (Desktop 4.50+) with seccomp/user namespace isolation, .NET + Node.js
Ralph is the repo-local autonomous loop that plans and works through tasks end-to-end.
Setup (one-time, inside Claude):
/flow-code:ralph-initOr from terminal without entering Claude:
claude -p "/flow-code:ralph-init"Run (outside Claude):
scripts/ralph/ralph.shRalph writes run artifacts under scripts/ralph/runs/, including review receipts used for gating.
🖥️ Ralph TUI — Terminal UI for monitoring runs in real-time (bun add -g flow-code-tui)
Autonomous coding agents are taking the industry by storm—loop until done, commit, repeat. Most solutions gate progress by tests and linting alone. Ralph goes further.
Multi-model review gates: Ralph uses RepoPrompt (macOS) or OpenAI Codex CLI (cross-platform) to send plan and implementation reviews to a different model. A second set of eyes catches blind spots that self-review misses. RepoPrompt's builder provides full file context; Codex uses context hints from changed files.
Review loops until Ship: Reviews don't just flag issues—they block progress until resolved. Ralph runs fix → re-review cycles until the reviewer returns <verdict>SHIP</verdict>. No "LGTM with nits" that get ignored.
Receipt-based gating: Reviews must produce a receipt JSON file proving they ran. No receipt = no progress. This prevents drift where Claude skips the review step and marks things done anyway.
Guard hooks: Plugin hooks enforce workflow rules deterministically—blocking --json flags, preventing new chats on re-reviews, requiring receipts before stop. Only active when FLOW_RALPH=1; zero impact for non-Ralph users. See Guard Hooks.
Atomic window selection: The setup-review command handles RepoPrompt window matching atomically. Claude can't skip steps or invent window IDs—the entire sequence runs as one unit or fails.
The result: code that's been reviewed by two models, tested, linted, and iteratively refined. Not perfect, but meaningfully more robust than single-model autonomous loops.
External agents (Clawdbot, GitHub Actions, etc.) can pause/resume/stop Ralph runs without killing processes.
CLI commands:
# Check status
flowctl status # Epic/task counts + active runs
flowctl status --json # JSON for automation
# Control active run
flowctl ralph pause # Pause run (auto-detects if single)
flowctl ralph resume # Resume paused run
flowctl ralph stop # Request graceful stop
flowctl ralph status # Show run state
# Specify run when multiple active
flowctl ralph pause --run <id>Sentinel files (manual control):
# Pause: touch PAUSE file in run directory
touch scripts/ralph/runs/<run-id>/PAUSE
# Resume: remove PAUSE file
rm scripts/ralph/runs/<run-id>/PAUSE
# Stop: touch STOP file (kept for audit)
touch scripts/ralph/runs/<run-id>/STOPRalph checks sentinels at iteration boundaries (after Claude returns, before next iteration).
By default Ralph reviews every task individually (per-task). For faster iteration, use per-epic mode — runs all tasks first, then one comprehensive epic-level review.
Configure in scripts/ralph/config.env:
# per-epic: skip per-task reviews, single epic-level review after all tasks complete
REVIEW_MODE=per-epic
# Review backend (rp = RepoPrompt, codex = Codex CLI, none = skip)
WORK_REVIEW=rp
# Completion review auto-inherits from WORK_REVIEW when per-epic mode is active
# Set explicitly to override: COMPLETION_REVIEW=codexExecution flow comparison:
per-task (default): per-epic (recommended for speed):
plan → plan_review (optional) plan → plan_review (optional)
task 1 → impl_review ✓ task 1 → done (no review)
task 2 → impl_review ✓ task 2 → done (no review)
task 3 → impl_review ✓ task 3 → done (no review)
... ...
task N → impl_review ✓ task N → done (no review)
epic completion_review ✓ epic completion_review ✓ (covers all)
──────────────────────── ────────────────────────
N+1 reviews total 1 review total
Common configurations:
# Fast iteration with quality gate (recommended)
REVIEW_MODE=per-epic
WORK_REVIEW=rp
# Maximum speed, no reviews
REVIEW_MODE=per-epic
WORK_REVIEW=none
COMPLETION_REVIEW=none
# Strict mode, review everything
REVIEW_MODE=per-task
WORK_REVIEW=rp
COMPLETION_REVIEW=rpMonitoring:
# Watch Ralph run in real-time
scripts/ralph/ralph.sh --watch
# View run logs
tail -f scripts/ralph/runs/latest/ralph.log
# Check progress
scripts/ralph/flowctl listWhen running Ralph overnight, external changes to the backlog can cause unexpected behavior — new tasks picked up without review, removed tasks causing confusion, modified specs invalidating assumptions.
Configure in scripts/ralph/config.env:
# Capture task IDs + spec hashes at start, check each iteration
FREEZE_SCOPE=1
# What to do on scope change: stop | warn | ignore
SCOPE_CHANGE_ACTION=stopWhat it detects:
| Change Type | Detection | Outcome |
|---|---|---|
| Task added externally | Task ID not in frozen list | SCOPE_CHANGED |
| Task removed externally | Frozen task ID missing | SCOPE_CHANGED |
| Spec content modified | MD5 hash mismatch | SCOPE_CHANGED |
| Status change (todo→done) | Not tracked | Allowed (normal) |
Actions:
| Action | Behavior |
|---|---|
stop |
Halt Ralph with exit code 1 and clear message |
warn |
Log changes, display warning, continue execution |
ignore |
Log changes silently, continue execution |
Files created in $RUN_DIR/scope/:
| File | Content |
|---|---|
scope.json |
Full snapshot (task IDs, statuses, spec hashes) |
task_ids.txt |
Sorted task IDs for easy diff |
hashes.txt |
id:md5hash pairs for specs and tasks |
changes-iter-NNN.txt |
Detected changes per iteration (if any) |
Recommended for overnight runs:
FREEZE_SCOPE=1
SCOPE_CHANGE_ACTION=stop # Safe: halt on external changesFor monitored runs:
FREEZE_SCOPE=1
SCOPE_CHANGE_ACTION=warn # Continue but flag changesRalph writes structured JSON event logs to $RUN_DIR/events.jsonl for easy parsing and analysis. Each line is a JSON object:
{"ts":"2026-03-26T12:00:00.123Z","level":"info","event":"run_start","run_id":"20260326-120000-a1b2","max_iterations":25,"review_mode":"per-epic"}
{"ts":"2026-03-26T12:01:15.456Z","level":"info","event":"iteration","iter":1,"status":"work","task":"fn-1.1"}
{"ts":"2026-03-26T12:05:30.789Z","level":"info","event":"worker_done","iter":1,"exit_code":0,"timeout":false}
{"ts":"2026-03-26T12:30:00.000Z","level":"info","event":"run_end","reason":"NO_WORK","tasks_done":5,"elapsed":"29:00"}Query examples:
# Count iterations per status
jq -r 'select(.event=="iteration") | .status' events.jsonl | sort | uniq -c
# Find failed workers
jq 'select(.event=="worker_done" and .exit_code!=0)' events.jsonl
# Total run time
jq -r 'select(.event=="run_end") | .elapsed' events.jsonlThe plain-text progress.txt log still exists for backwards compatibility. Use events.jsonl for automation and analysis.
Task retry/rollback:
# Reset completed/blocked task to todo
flowctl task reset fn-1-add-oauth.3
# Reset + cascade to dependent tasks (same epic)
flowctl task reset fn-1-add-oauth.2 --cascadeDefault flow when you drive manually:
flowchart TD
A[Idea or short spec<br/>prompt or doc] --> B{Need deeper spec?}
B -- yes --> C[Optional: /flow-code:interview fn-N or spec.md<br/>40+ deep questions to refine spec]
C --> D[Refined spec]
B -- no --> D
D --> E[/flow-code:plan idea or fn-N/]
E --> F[Parallel subagents: repo patterns + online docs + best practices]
F --> G[flow-gap-analyst: edge cases + missing reqs]
G --> H[Writes .flow/ epic + tasks + deps]
H --> I{Plan review?}
I -- yes --> J[/flow-code:plan-review fn-N/]
J --> K{Plan passes review?}
K -- no --> L[Re-anchor + fix plan]
L --> J
K -- yes --> M[/flow-code:work fn-N/]
I -- no --> M
M --> N[Re-anchor before EVERY task]
N --> O[Implement]
O --> P[Test + verify acceptance]
P --> Q[flowctl done: write done summary + evidence]
Q --> R{Impl review?}
R -- yes --> S[/flow-code:impl-review/]
S --> T{Next ready task?}
R -- no --> T
T -- yes --> N
T -- no --> V{Epic review?}
V -- yes --> W[/flow-code:epic-review fn-N/]
W --> X{Epic passes review?}
X -- no --> Y[Fix gaps inline]
Y --> W
X -- yes --> U[Close epic]
V -- no --> U
classDef optional stroke-dasharray: 6 4,stroke:#999;
class C,J,S,W optional;
Notes:
/flow-code:interviewaccepts Flow IDs or spec file paths and writes refinements back/flow-code:planaccepts new ideas or an existing Flow ID to update the plan
Tip: with RP 1.5.68+, use flowctl rp setup-review --create to auto-open RepoPrompt windows. Alternatively, open RP on your repo beforehand for faster context loading.
Plan review in rp mode requires flowctl rp chat-send; if rp-cli/windows unavailable, the review gate retries.
Built for reliability. These are the guardrails.
Re-anchoring prevents drift
Before EVERY task, Flow-Code re-reads the epic spec, task spec, and git state from .flow/. This forces Claude back to the source of truth - no hallucinated scope creep, no forgotten requirements. In Ralph mode, this happens automatically each iteration.
Unlike agents that carry accumulated context (where early mistakes compound), re-anchoring gives each task a fresh, accurate starting point.
Before EVERY task, Flow-Code re-reads:
- Epic spec and task spec from
.flow/ - Current git status and recent commits
- Validation state
Per Anthropic's long-running agent guidance: agents must re-anchor from sources of truth to prevent drift. The reads are cheap; drift is expensive.
Teams can work in parallel branches without coordination servers:
- Merge-safe IDs: Scans existing files to allocate the next ID. No shared counters.
- Soft claims: Tasks track an
assigneefield. Prevents accidental duplicate work. - Actor resolution: Auto-detects from git email,
FLOW_ACTORenv, or$USER. - Local validation:
flowctl validate --allcatches issues before commit.
# Actor A starts task
flowctl start fn-1.1 # Sets assignee automatically
# Actor B tries same task
flowctl start fn-1.1 # Fails: "claimed by actor-a@example.com"
flowctl start fn-1.1 --force # Override if neededMultiple agents can work simultaneously in different git worktrees, sharing task state:
# Main repo
git worktree add ../feature-a fn-1-branch
git worktree add ../feature-b fn-2-branch
# Both worktrees share task state via .git/flow-state/
cd ../feature-a && flowctl start fn-1.1 # Agent A claims task
cd ../feature-b && flowctl start fn-2.1 # Agent B claims different taskHow it works:
- Runtime state (status, assignee, evidence) lives in
.git/flow-state/— shared across worktrees - Definition files (title, description, deps) stay in
.flow/— tracked in git - Per-task
fcntllocking prevents race conditions
State directory resolution:
FLOW_STATE_DIRenv (explicit override)git --git-common-dir+/flow-state(worktree-aware).flow/statefallback (non-git or old git)
Commands:
flowctl state-path # Show resolved state directory
flowctl migrate-state # Migrate existing repo (optional)
flowctl migrate-state --clean # Migrate + remove runtime from tracked filesBackward compatible — existing repos work without migration. The merged read path automatically falls back to definition files when no state file exists.
Everything is bundled:
flowctl.pyships with the plugin- No external tracker CLI to install
- No external services
- Just Python 3
Utility skills available during planning and implementation:
| Skill | Use Case |
|---|---|
browser |
Web automation via agent-browser CLI (verify UI, scrape docs, test flows) |
flow-code-rp-explorer |
Token-efficient codebase exploration via RepoPrompt |
flow-code-worktree-kit |
Git worktree management for parallel work |
flow-code-export-context |
Export context for external LLM review |
- No daemons
- No CLAUDE.md edits
- Delete
.flow/to uninstall; if you enabled Ralph, also deletescripts/ralph/ - Ralph uses plugin hooks for workflow enforcement (only active when
FLOW_RALPH=1)
flowctl validate --allExits 1 on errors. Drop into pre-commit hooks or GitHub Actions. See docs/ci-workflow-example.yml.
Each epic and task gets its own JSON + markdown file pair. Merge conflicts are rare and easy to resolve.
Two models catch what one misses. Reviews use a second model (via RepoPrompt or Codex) to verify plans and implementations before they ship.
Three review types:
- Plan reviews — Verify architecture before coding starts
- Impl reviews — Verify each task implementation
- Completion reviews — Verify epic delivers all spec requirements before closing
Review criteria (Carmack-level, identical for both backends):
| Review Type | Criteria |
|---|---|
| Plan | Completeness, Feasibility, Clarity, Architecture, Risks (incl. security), Scope, Testability |
| Impl | Correctness, Simplicity, DRY, Architecture, Edge Cases, Tests, Security |
| Completion | Spec compliance: all requirements delivered, docs updated, no gaps |
Reviews block progress until <verdict>SHIP</verdict>. Fix → re-review cycles continue until approved.
RepoPrompt provides the best review experience on macOS.
Why recommended:
- Best-in-class context builder for reviews (full file context, smart selection)
- Enables context-scout for deeper codebase discovery (alternative: repo-scout works without RP)
- Visual diff review UI + persistent chat threads
Setup:
-
Install RepoPrompt:
brew install --cask repoprompt
-
Enable MCP Server (required for rp-cli):
- Settings → MCP Server → Enable
- Click "Install CLI to PATH" (creates
/usr/local/bin/rp-cli) - Verify:
rp-cli --version
-
Configure models — RepoPrompt uses two models that must be set in the UI (not controllable via CLI):
Setting Recommended Purpose Context Builder model GPT-5.3 Codex Medium (via Codex CLI or OpenAI API) Builds file selection for reviews. Needs large context window. Chat model GPT-5.2 High (via Codex CLI or OpenAI API) Runs the actual review. Needs strong reasoning. Set these in Settings → Models. Any OpenAI API-compatible model works (Codex CLI, OpenAI API key, or other providers). These models are what make cross-model review valuable — a different model catches blind spots that self-review misses.
Note: When
--createauto-opens a new workspace, it inherits your default model settings. Configure models before first use.
Usage:
/flow-code:plan-review fn-1 --review=rp
/flow-code:impl-review --review=rpOpenAI Codex CLI works on any platform (macOS, Linux, Windows).
Why use Codex:
- Cross-platform (no macOS requirement)
- Terminal-based (no GUI needed)
- Session continuity via thread IDs
- Same Carmack-level review criteria as RepoPrompt
- Uses GPT 5.2 High by default when used as a review backend from Claude Code (no config needed)
Trade-off: Uses heuristic context hints from changed files rather than RepoPrompt's intelligent file selection.
Note: When running Flow-Code inside Codex itself, commands use
/prompts:prefix (e.g.,/prompts:impl-review). The/flow-code:prefix below applies to Claude Code.
Setup:
# Install and authenticate Codex CLI
npm install -g @openai/codex
codex authUsage:
/flow-code:plan-review fn-1 --review=codex
/flow-code:impl-review --review=codex
# Or via flowctl directly
flowctl codex plan-review fn-1 --base main
flowctl codex impl-review fn-1.3 --base mainVerify installation:
flowctl codex checkSet default review backend:
# Per-project (saved in .flow/config.json)
flowctl config set review.backend rp # or codex, or none
# Per-session (environment variable)
export FLOW_REVIEW_BACKEND=codexPriority: --review=... argument > FLOW_REVIEW_BACKEND env > .flow/config.json > error.
No auto-detect. Run /flow-code:setup to configure your preferred review backend, or pass --review=X explicitly.
| Scenario | Recommendation |
|---|---|
| macOS with GUI available | RepoPrompt (better context) |
| Linux/Windows | Codex (only option) |
| CI/headless environments | Codex (no GUI needed) |
| Ralph overnight runs | Either works; RP auto-opens with --create (1.5.68+) |
Without a backend configured, reviews fail with a clear error. Run /flow-code:setup or pass --review=X.
Tasks declare their blockers. flowctl ready shows what can start. Nothing executes until dependencies resolve.
Epic-level dependencies: During planning, epic-scout runs in parallel with other research scouts to find relationships with existing open epics. If the new plan depends on APIs/patterns from another epic, dependencies are auto-set via flowctl epic add-dep. Findings reported at end of planning—no prompts needed.
After MAX_ATTEMPTS_PER_TASK failures (default 5), Ralph:
- Writes
block-<task>.mdwith failure context - Marks task blocked via
flowctl block - Moves to next task
Prevents infinite retry loops. Review block-*.md files in the morning to understand what went wrong.
Synchronizes downstream task specs when implementation drifts from the original plan.
Automatic (opt-in):
flowctl config set planSync.enabled trueWhen enabled, after each task completes, a plan-sync agent:
- Compares what was planned vs what was actually built
- Identifies downstream tasks that reference stale assumptions (names, APIs, data structures)
- Updates affected task specs with accurate info
Skip conditions: disabled (default), task failed, no downstream tasks.
Cross-epic sync (opt-in, default false):
flowctl config set planSync.crossEpic trueWhen enabled, plan-sync also checks other open epics for stale references. Useful when multiple epics share APIs/patterns, but increases sync time. Disabled by default to avoid long Ralph loops.
Manual trigger:
/flow-code:sync fn-1.2 # Sync from specific task
/flow-code:sync fn-1 # Scan whole epic for drift
/flow-code:sync fn-1.2 --dry-run # Preview changes without writingManual sync ignores planSync.enabled config—if you run it, you want it. Works with any source task status (not just done).
Persistent learnings that survive context compaction.
# Enable
flowctl config set memory.enabled true
flowctl memory init
# Manual entries
flowctl memory add --type pitfall "Always use flowctl rp wrappers"
flowctl memory add --type convention "Tests in __tests__ dirs"
flowctl memory add --type decision "SQLite over Postgres for simplicity"
# Query
flowctl memory list
flowctl memory search "flowctl"
flowctl memory read --type pitfallsWhen enabled:
- Planning:
memory-scoutruns in parallel with other scouts - Work: worker reads memory files directly during re-anchor
- Ralph: NEEDS_WORK reviews auto-capture to
pitfalls.md - Auto-capture: session end hook extracts decisions, discoveries, and pitfalls from transcript
Auto-memory (on by default, zero config):
Every session end, the plugin automatically extracts key learnings from the transcript:
- Default: Gemini AI summarization —
gemini -panalyzes the transcript and extracts decisions, discoveries, and pitfalls. Understands semantics, not just keywords. - Fallback: pattern matching — if
geminiCLI is not available, falls back to regex extraction.
No setup needed — .flow/memory/ is auto-created on first capture. Max 5 entries per session:
pitfalls.md— bugs found, things to avoidconventions.md— project patterns, coding conventionsdecisions.md— architectural choices and rationale
To disable: flowctl config set memory.auto false
Memory retrieval works in all modes (manual, Ralph, auto-improve). Use flowctl memory add for manual entries.
Config lives in .flow/config.json, separate from Ralph's scripts/ralph/config.env.
Ten commands, complete workflow:
| Command | What It Does |
|---|---|
/flow-code:plan <idea> |
Research the codebase, create epic with dependency-ordered tasks |
/flow-code:work <id|file> |
Execute epic, task, or spec file, re-anchoring before each |
/flow-code:interview <id> |
Deep interview to flesh out a spec before planning |
/flow-code:plan-review <id> |
Carmack-level plan review via RepoPrompt |
/flow-code:impl-review |
Carmack-level impl review of current branch |
/flow-code:epic-review <id> |
Epic-completion review: verify implementation matches spec |
/flow-code:debug |
Systematic debugging: root cause investigation → pattern analysis → hypothesis → fix |
/flow-code:prime |
Assess codebase agent-readiness, propose fixes (details) |
/flow-code:sync <id> |
Manual plan-sync: update downstream tasks after implementation drift |
/flow-code:ralph-init |
Scaffold repo-local Ralph harness (scripts/ralph/) |
/flow-code:retro |
Post-epic retrospective: what worked, what didn't, lessons → memory |
/flow-code:django |
Django patterns: architecture, DRF, security, testing, verification |
/flow-code:skill-create |
TDD-based skill creation: baseline test → write → bulletproof |
/flow-code:setup |
Optional: install flowctl locally + add docs (for power users) |
/flow-code:uninstall |
Remove flow-code from project (keeps tasks if desired) |
Work accepts an epic (fn-N), task (fn-N.M), or markdown spec file (.md). Spec files auto-create an epic with one task.
All commands accept flags to skip questions:
# Plan with flags
/flow-code:plan Add caching --research=grep --no-review
/flow-code:plan Add auth --research=rp --review=rp
# Work with flags
/flow-code:work fn-1 --branch=current --no-review
/flow-code:work fn-1 --branch=new --review=export
# Reviews with flags
/flow-code:plan-review fn-1 --review=rp
/flow-code:impl-review --review=exportNatural language also works:
/flow-code:plan Add webhooks, use context-scout, skip review
/flow-code:work fn-1 current branch, no review| Command | Available Flags |
|---|---|
/flow-code:plan |
--research=rp|grep, --review=rp|codex|export|none, --no-review |
/flow-code:work |
--branch=current|new|worktree, --review=rp|codex|export|none, --no-review, --parallel |
/flow-code:plan-review |
--review=rp|codex|export |
/flow-code:impl-review |
--review=rp|codex|export |
/flow-code:prime |
--report-only, --fix-all |
/flow-code:sync |
--dry-run |
Detailed input documentation for each command.
/flow-code:plan <idea or fn-N> [--research=rp|grep] [--review=rp|codex|export|none]
| Input | Description |
|---|---|
<idea> |
Free-form feature description ("Add user authentication with OAuth") |
fn-N |
Existing epic ID to update the plan |
--research=rp |
Use RepoPrompt context-scout for deeper codebase discovery |
--research=grep |
Use grep-based repo-scout (default, faster) |
--review=rp|codex|export|none |
Review backend after planning |
--no-review |
Shorthand for --review=none |
/flow-code:work <id|file> [--branch=current|new|worktree] [--review=rp|codex|export|none]
| Input | Description |
|---|---|
fn-N |
Execute entire epic (all tasks in dependency order) |
fn-N.M |
Execute single task |
path/to/spec.md |
Create epic from spec file, execute immediately |
--branch=current |
Work on current branch |
--branch=new |
Create new branch fn-N-slug (default) |
--branch=worktree |
Create git worktree for isolated work |
--review=rp|codex|export|none |
Review backend after work |
--no-review |
Shorthand for --review=none |
/flow-code:interview <id|file>
| Input | Description |
|---|---|
fn-N |
Interview about epic to refine requirements |
fn-N.M |
Interview about specific task |
path/to/spec.md |
Interview about spec file |
"rough idea" |
Interview about a new idea (creates epic) |
Deep questioning (40+ questions) to surface requirements, edge cases, and decisions.
/flow-code:plan-review <fn-N> [--review=rp|codex|export] [focus areas]
| Input | Description |
|---|---|
fn-N |
Epic ID to review |
--review=rp |
Use RepoPrompt (macOS, visual builder) |
--review=codex |
Use OpenAI Codex CLI (cross-platform) |
--review=export |
Export context for manual review |
[focus areas] |
Optional: "focus on security" or "check API design" |
Carmack-level criteria: Completeness, Feasibility, Clarity, Architecture, Risks, Scope, Testability.
/flow-code:impl-review [--review=rp|codex|export] [focus areas]
| Input | Description |
|---|---|
--review=rp |
Use RepoPrompt (macOS, visual builder) |
--review=codex |
Use OpenAI Codex CLI (cross-platform) |
--review=export |
Export context for manual review |
[focus areas] |
Optional: "focus on performance" or "check error handling" |
Reviews current branch changes. Carmack-level criteria: Correctness, Simplicity, DRY, Architecture, Edge Cases, Tests, Security.
/flow-code:epic-review <fn-N> [--review=rp|codex|none]
| Input | Description |
|---|---|
fn-N |
Epic ID to review |
--review=rp |
Use RepoPrompt (macOS, visual builder) |
--review=codex |
Use OpenAI Codex CLI (cross-platform) |
--review=none |
Skip review |
Reviews epic implementation against spec. Runs after all tasks complete. Catches requirement gaps, missing functionality, incomplete doc updates.
/flow-code:prime [--report-only] [--fix-all] [path]
| Input | Description |
|---|---|
| (no args) | Assess current directory, interactive fixes |
--report-only |
Show assessment report, skip remediation |
--fix-all |
Apply all recommendations without asking |
[path] |
Assess a different directory |
See Agent Readiness Assessment for details.
/flow-code:sync <id> [--dry-run]
| Input | Description |
|---|---|
fn-N |
Sync entire epic's downstream tasks |
fn-N.M |
Sync from specific task |
--dry-run |
Preview changes without writing |
Updates downstream task specs when implementation drifts from plan.
/flow-code:ralph-init
No arguments. Scaffolds scripts/ralph/ for autonomous operation.
/flow-code:setup
No arguments. Optional setup that:
- Configures review backend (rp, codex, or none)
- Copies flowctl to
.flow/bin/ - Adds flow-code instructions to CLAUDE.md/AGENTS.md
/flow-code:uninstall
No arguments. Interactive removal with option to keep tasks.
Flow-Code uses the same defaults in manual and Ralph runs. Ralph bypasses prompts only.
- plan:
--research=grep - work:
--branch=new - review: from
.flow/config.json(set via/flow-code:setup), ornoneif not configured
Override via flags or scripts/ralph/config.env.
- Research (parallel subagents):
repo-scout(orcontext-scoutif rp-cli) +practice-scout+docs-scout+github-scout+epic-scout+docs-gap-scout - Gap analysis:
flow-gap-analystfinds edge cases + missing requirements - Epic creation: Writes spec to
.flow/specs/fn-N.md, sets epic dependencies fromepic-scoutfindings - Task breakdown: Creates tasks + explicit dependencies in
.flow/tasks/, adds doc update acceptance criteria fromdocs-gap-scout - Validate:
flowctl validate --epic fn-N - Review (optional):
/flow-code:plan-review fn-Nwith re-anchor + fix loop until "Ship"
- Re-anchor: Re-read epic + task specs + git state (EVERY task)
- Execute: Implement using existing patterns
- Test: Verify acceptance criteria
- Record:
flowctl doneadds summary + evidence to the task spec - Review (optional):
/flow-code:impl-reviewvia RepoPrompt - Loop: Next ready task → repeat until no ready tasks. Close epic manually (
flowctl epic close fn-N) or let Ralph close at loop end.
Ralph is repo-local and opt-in. Files are created only by /flow-code:ralph-init. Remove manually with rm -rf scripts/ralph/.
/flow-code:ralph-init also writes scripts/ralph/.gitignore so run logs stay out of git.
What it automates (one unit per iteration, fresh context each time):
- Selector chooses plan vs work unit (
flowctl next) - Plan gate = plan review loop until Ship (if enabled)
- Work gate = one task until pass (tests + validate + optional impl review)
- Single run branch: all epics work on one
ralph-<run-id>branch (cherry-pick/revert friendly)
Enable:
/flow-code:ralph-init
./scripts/ralph/ralph_once.sh # one iteration (observe)
./scripts/ralph/ralph.sh # full loop (AFK)Watch mode - see what Claude is doing:
./scripts/ralph/ralph.sh --watch # Stream tool calls in real-time
./scripts/ralph/ralph.sh --watch verbose # Also stream model responsesRun scripts from terminal (not inside Claude Code). ralph_once.sh runs one iteration so you can observe before going fully autonomous.
REQUIRE_PLAN_REVIEW controls whether Ralph must pass the plan review gate before doing any implementation work.
Default (safe, won't stall):
REQUIRE_PLAN_REVIEW=0Ralph can proceed to work tasks even ifrp-cliis missing or unavailable overnight.
Recommended (best results, requires rp-cli):
REQUIRE_PLAN_REVIEW=1PLAN_REVIEW=rp
This forces Ralph to run /flow-code:plan-review until the epic plan is approved before starting tasks.
Tip: If you don't have rp-cli installed, keep REQUIRE_PLAN_REVIEW=0 or Ralph may repeatedly select the plan gate and make no progress.
Ralph verifies RepoPrompt reviews via receipt JSON files in scripts/ralph/runs/<run>/receipts/ (plan + impl).
flowchart TD
A[ralph.sh iteration] --> B[flowctl next]
B -->|status=plan| C[/flow-code:plan-review fn-N/]
C -->|verdict=SHIP| D[flowctl epic set-plan-review-status=ship]
C -->|verdict!=SHIP| A
B -->|status=work| E[/flow-code:work fn-N.M/]
E --> F[tests + validate]
F -->|fail| A
F -->|WORK_REVIEW!=none| R[/flow-code:impl-review/]
R -->|verdict=SHIP| G[flowctl done + git commit]
R -->|verdict!=SHIP| A
F -->|WORK_REVIEW=none| G
G --> A
B -->|status=completion_review| CR[/flow-code:epic-review fn-N/]
CR -->|verdict=SHIP| CRD[flowctl epic set-completion-review-status=ship]
CR -->|verdict!=SHIP| A
CRD --> A
B -->|status=none| H[close done epics]
H --> I[<promise>COMPLETE</promise>]
YOLO safety: YOLO mode uses --dangerously-skip-permissions. Use a sandbox/container and no secrets in env for unattended runs.
.flow/
├── meta.json # Schema version
├── config.json # Project settings (memory enabled, etc.)
├── epics/
│ └── fn-1-add-oauth.json # Epic metadata (id, title, status, deps)
├── specs/
│ └── fn-1-add-oauth.md # Epic spec (plan, scope, acceptance)
├── tasks/
│ ├── fn-1-add-oauth.1.json # Task metadata (id, status, priority, deps, assignee)
│ ├── fn-1-add-oauth.1.md # Task spec (description, acceptance, done summary)
│ └── ...
└── memory/ # Persistent learnings (opt-in)
├── pitfalls.md # Lessons from NEEDS_WORK reviews
├── conventions.md # Project patterns
└── decisions.md # Architectural choices
Flowctl accepts schema v1 and v2; new fields are optional and defaulted.
New fields:
- Epic JSON:
plan_review_status,plan_reviewed_at,completion_review_status,completion_reviewed_at,depends_on_epics,branch_name - Task JSON:
priority
- Epic:
fn-N-slugwhereslugis derived from the epic title (e.g.,fn-1-add-oauth,fn-2-fix-login-bug) - Task:
fn-N-slug.M(e.g.,fn-1-add-oauth.1,fn-2-fix-login-bug.2)
The slug is automatically generated from the epic title (lowercase, hyphens for spaces, max 40 chars). This makes IDs human-readable and self-documenting.
Backwards compatibility: Legacy formats fn-N (no suffix) and fn-N-xxx (random 3-char suffix) are still fully supported. Existing epics don't need migration.
There are no task IDs outside an epic. If you want a single task, create an epic with one task.
- JSON files: Metadata only (IDs, status, dependencies, assignee)
- Markdown files: Narrative content (specs, descriptions, summaries)
Bundled Python script for managing .flow/. Flow-Code's commands handle epic/task creation automatically—use flowctl for direct inspection, fixes, or advanced workflows:
# Setup
flowctl init # Create .flow/ structure
flowctl detect # Check if .flow/ exists
# Epics
flowctl epic create --title "..." # Create epic
flowctl epic create --title "..." --branch "fn-1-epic"
flowctl epic set-plan fn-1 --file spec.md # Set epic spec from file
flowctl epic set-plan-review-status fn-1 --status ship
flowctl epic close fn-1 # Close epic (requires all tasks done)
# Tasks
flowctl task create --epic fn-1 --title "..." --deps fn-1.2,fn-1.3 --priority 10
flowctl task set-description fn-1.1 --file desc.md
flowctl task set-acceptance fn-1.1 --file accept.md
# Dependencies
flowctl dep add fn-1.3 fn-1.2 # fn-1.3 depends on fn-1.2
# Workflow
flowctl ready --epic fn-1 # Show ready/in_progress/blocked
flowctl next # Select next plan/work unit
flowctl start fn-1.1 # Claim and start task
flowctl done fn-1.1 --summary-file s.md --evidence-json e.json
flowctl block fn-1.2 --reason-file r.md
# Queries
flowctl show fn-1 --json # Epic with all tasks
flowctl cat fn-1 # Print epic spec
# Validation
flowctl validate --epic fn-1 # Validate single epic
flowctl validate --all # Validate everything (for CI)
# Review helpers
flowctl rp chat-send --window W --tab T --message-file m.md
flowctl prep-chat --message-file m.md --selected-paths a.ts b.ts -o payload.json📖 Full CLI reference
🤖 Ralph deep dive
When a task completes, flowctl done appends structured data to the task spec:
## Done summary
- Added ContactForm component with Zod validation
- Integrated with server action for submission
- All tests passing
Follow-ups:
- Consider rate limiting (out of scope)## Evidence
- Commits: a3f21b9
- Tests: bun test
- PRs:This creates a complete audit trail: what was planned, what was done, how it was verified.
| Flow | Flow-Code | |
|---|---|---|
| Task tracking | External tracker or standalone plan files | .flow/ directory (bundled flowctl) |
| Install | Plugin + optional external tracker | Plugin only |
| Artifacts | Standalone plan files | .flow/specs/ and .flow/tasks/ |
| Config edits | External config edits (if using tracker) | None |
| Multi-user | Via external tracker | Built-in (scan-based IDs, soft claims) |
| Uninstall | Remove plugin + external tracker config | Delete .flow/ (and scripts/ralph/ if enabled) |
Choose Flow-Code if you want:
- Zero external dependencies
- No config file edits
- Clean uninstall (delete
.flow/, andscripts/ralph/if enabled) - Built-in multi-user safety
Choose Flow if you:
- Already use an external tracker for issue tracking
- Want plan files as standalone artifacts
- Need full issue management features
- Python 3.8+
- git
- Optional: RepoPrompt for macOS GUI reviews + enables context-scout (deeper codebase discovery than repo-scout). Reviews work without it via Codex backend.
- Optional: OpenAI Codex CLI (
npm install -g @openai/codex) for cross-platform terminal-based reviews
Without a review backend, reviews are skipped.
claude --plugin-dir ./plugins/flow-codeFlow-Code works natively in Factory Droid — no modifications needed.
Install:
# In Droid CLI
/plugin marketplace add https://github.com/z23cc/flow-code
/plugin install flow-codeCross-platform patterns used:
- Skills use
${DROID_PLUGIN_ROOT:-${CLAUDE_PLUGIN_ROOT}}bash fallback - Hooks use
Bash|Executeregex matcher (Claude Code = Bash, Droid = Execute) - Agents use
disallowedToolsblacklist (nottoolswhitelist — tool names differ between platforms)
Caveats:
- Subagents may behave differently (Droid's Task tool implementation)
- Hook timing may vary slightly
Rollback: If you experience issues, downgrade to v0.20.9 (last pre-Droid version):
claude plugins install flow-code@0.20.9
Flow-Code works in OpenAI Codex with near-parity to Claude Code. The install script converts Claude Code's plugin system to Codex's multi-agent roles, prompts, and config.
Key difference: Commands use the /prompts: prefix in Codex instead of /flow-code::
| Claude Code | Codex |
|---|---|
/flow-code:plan |
/prompts:plan |
/flow-code:work |
/prompts:work |
/flow-code:impl-review |
/prompts:impl-review |
/flow-code:plan-review |
/prompts:plan-review |
/flow-code:epic-review |
/prompts:epic-review |
/flow-code:interview |
/prompts:interview |
/flow-code:prime |
/prompts:prime |
/flow-code:ralph-init |
/prompts:ralph-init |
What works:
- Planning, work execution, interviews, reviews — full workflow
- Multi-agent roles: 20 agents run as parallel Codex threads (up to 12 concurrent)
- Cross-model reviews (Codex as review backend)
- flowctl CLI
Model mapping (3-tier):
| Tier | Codex Model | Agents | Reasoning |
|---|---|---|---|
| Intelligent | gpt-5.4 |
quality-auditor, flow-gap-analyst, context-scout | high |
| Smart scouts | gpt-5.4 |
epic-scout, agents-md-scout, docs-gap-scout | high |
| Fast scouts | gpt-5.3-codex-spark |
build, env, testing, tooling, observability, security, workflow, memory scouts | skipped |
| Inherited | parent model | worker, plan-sync | parent |
Smart scouts (epic-scout, agents-md-scout, docs-gap-scout) need deeper reasoning for context building and analysis. The remaining 8 scanning scouts run on Spark for speed — they check for file presence and patterns without needing multi-step reasoning.
Override model defaults:
CODEX_MODEL_INTELLIGENT=gpt-5.4 \
CODEX_MODEL_FAST=gpt-5.3-codex-spark \
CODEX_REASONING_EFFORT=high \
CODEX_MAX_THREADS=12 \
./scripts/install-codex.sh flow-codeCaveats:
/prompts:setupnot supported — use manual project setup below- Ralph autonomous mode not supported — requires plugin hooks (guard hooks, receipt gating) which Codex doesn't support
/prompts:ralph-initscaffolds files but the loop won't enforce workflow rules without hooksclaude-md-scoutis auto-renamed toagents-md-scout(CLAUDE.md → AGENTS.md patching)
Install:
# Clone the marketplace repo (one-time)
git clone https://github.com/z23cc/flow-code.git
cd flow-code
# Run the install script
./scripts/install-codex.sh flow-codeCodex doesn't have a plugin marketplace yet, so installation requires cloning this repo and running the install script. The script copies everything to
~/.codex/— you can delete the clone after install (re-clone to update).
Per-project setup (run in each project):
# Initialize .flow/ directory
~/.codex/bin/flowctl init
# Optional: copy flowctl locally for project portability
mkdir -p .flow/bin
cp ~/.codex/bin/flowctl .flow/bin/
cp ~/.codex/bin/flowctl.py .flow/bin/
chmod +x .flow/bin/flowctl
# Optional: configure review backend (codex recommended for Codex CLI)
~/.codex/bin/flowctl config set review.backend codexOptional AGENTS.md snippet (helps Codex understand flow-code):
<!-- BEGIN FLOW-CODE -->
## Flow-Code
This project uses Flow-Code for task tracking. Use `.flow/bin/flowctl` or `~/.codex/bin/flowctl`.
Quick commands:
- `flowctl list` — list epics + tasks
- `flowctl ready --epic fn-N` — what's ready
- `flowctl start fn-N.M` — claim task
- `flowctl done fn-N.M --summary-file s.md --evidence-json e.json`
Prompts (use `/prompts:<name>`):
- `/prompts:plan` — create a build plan
- `/prompts:work` — execute tasks
- `/prompts:impl-review` — implementation review
- `/prompts:interview` — refine specs interactively
<!-- END FLOW-CODE -->| Project | Platform | Notes |
|---|---|---|
| flow-code-opencode | OpenCode | Flow-Code port |
| FlowFactory | Factory.ai Droid | Flow port (note: flow-code now has native Droid support) |

