The coding agent that runs 24/7, learns from its mistakes, and costs $0 when you want it to.
19 built-in tools • 5 LLM providers • 5-tier memory • 24/7 autonomous • $0 local mode
ForgeGod orchestrates multiple LLMs (OpenAI, Anthropic, Google Gemini, Ollama, OpenRouter) into a single autonomous coding engine. It routes tasks to the right model, runs 24/7 from a PRD, learns from every outcome, and self-improves its own strategy. Run it locally for $0 with Ollama, or use cloud models when you need them.
pip install forgegodEvery other coding CLI uses one model at a time and resets to zero each session. ForgeGod doesn't.
| Capability | Claude Code | Codex CLI | Aider | Cursor | ForgeGod |
|---|---|---|---|---|---|
| Multi-model auto-routing | - | - | manual | - | yes |
| Local + cloud hybrid | - | basic | basic | - | native |
| 24/7 autonomous loops | - | - | - | - | yes |
| Cross-session memory | basic | - | - | removed | 5-tier |
| Self-improving strategy | - | - | - | - | yes (SICA) |
| Cost-aware budget modes | - | - | - | - | yes |
| Reflexion code generation | - | - | - | - | 3-attempt |
| Parallel git worktrees | subagents | - | - | - | yes |
| Stress tested + benchmarked | - | - | - | - | 355 + 84 stress |
Scaffolding adds ~11 points on SWE-bench — harness engineering matters as much as the model. ForgeGod is the harness:
- Ralph Loop — 24/7 coding from a PRD. Progress lives in git, not LLM context. Fresh agent per story. No context rot.
- 5-Tier Memory — Episodic (what happened) + Semantic (what I know) + Procedural (how I do things) + Graph (how things connect) + Error-Solutions (what fixes what). Memories decay, consolidate, and reinforce automatically.
- Reflexion Coder — 3-attempt code gen with escalating models: local (free) → cloud (cheap) → frontier (when it matters). AST validation at every step.
- SICA — Self-Improving Coding Agent. Modifies its own prompts, model routing, and strategy based on outcomes. 6 safety layers prevent drift.
- Budget Modes —
normal→throttle→local-only→halt. Auto-triggered by spend. Run forever on Ollama for $0.
You don't need to be a developer to use ForgeGod. If you can describe what you want in plain English, ForgeGod writes the code.
- Install Ollama: https://ollama.com/download
- Pull a model:
ollama pull qwen3.5:9b - Install ForgeGod:
pip install forgegod - Run:
forgegod init(interactive wizard guides you) - Try it:
forgegod run "Create a simple website with a contact form"
- Get an OpenAI key: https://platform.openai.com/api-keys
- Install ForgeGod:
pip install forgegod - Run:
forgegod init→ paste your key when prompted - Try it:
forgegod run "Build a REST API with user authentication"
Run forgegod doctor — it checks your setup and tells you exactly what to fix.
# Install
pip install forgegod
# Initialize a project
forgegod init
# Single task
forgegod run "Add a /health endpoint to server.py with uptime and version info"
# Plan a project → generates PRD
forgegod plan "Build a REST API for a todo app with auth, CRUD, and tests"
# 24/7 autonomous loop from PRD
forgegod loop --prd .forgegod/prd.json
# Caveman mode — 50-75% token savings with ultra-terse prompts
forgegod run --terse "Add a /health endpoint"
# Check what it learned
forgegod memory
# View cost breakdown
forgegod cost
# Benchmark your models
forgegod benchmark
# Health check
forgegod doctorForgeGod auto-detects your environment on first run:
- Finds API keys in env vars (
OPENAI_API_KEY,ANTHROPIC_API_KEY) - Checks if Ollama is running locally
- Detects your project language, test framework, and linter
- Picks the best model for each role based on what's available
- Creates
.forgegod/config.tomlwith sensible defaults
No manual setup required. Just run forgegod init and go.
┌─────────────────────────────────────────────────┐
│ RALPH LOOP │
│ │
│ ┌──────┐ ┌───────┐ ┌─────────┐ ┌─────┐ │
│ │ READ │──▶│ SPAWN │──▶│ EXECUTE │──▶│ VAL │ │
│ │ PRD │ │ AGENT │ │ STORY │ │IDATE│ │
│ └──────┘ └───────┘ └─────────┘ └──┬──┘ │
│ ▲ │ │
│ │ ┌────────┐ ┌────────┐ │ │
│ └─────────│ROTATE │◀───│COMMIT │◀──┘ │
│ │CONTEXT │ │OR RETRY│ pass │
│ └────────┘ └────────┘ │
│ │
│ Progress is in GIT, not LLM context. │
│ Fresh agent per story. No context rot. │
│ Create .forgegod/KILLSWITCH to stop. │
└─────────────────────────────────────────────────┘
- Read PRD — Pick highest-priority TODO story
- Spawn agent — Fresh context (progress is in git, not memory)
- Execute — Agent uses 19 tools to implement the story
- Validate — Tests, lint, syntax, frontier review
- Commit or retry — Pass: commit + mark done. Fail: retry up to 3x with model escalation
- Rotate — Next story. Context is always fresh.
ForgeGod has the most advanced memory system of any open-source coding agent:
| Tier | What | How | Retention |
|---|---|---|---|
| Episodic | What happened per task | Full outcome records | 90 days |
| Semantic | Extracted principles | Confidence + decay + reinforcement | Indefinite |
| Procedural | Code patterns & fix recipes | Success rate tracking | Indefinite |
| Graph | Entity relationships + causal edges | Auto-extracted from outcomes | Indefinite |
| Error-Solution | Error pattern → fix mapping | Fuzzy match lookup | Indefinite |
Memories decay with category-specific half-life (14d debugging → 90d architecture), consolidate via O(n*k) category-bucketed comparison, and are recalled via FTS5 + Jaccard hybrid retrieval (Reciprocal Rank Fusion). SQLite WAL mode for concurrent access.
# Check memory health
forgegod memory
# Memory is stored in .forgegod/memory.db (SQLite)
# Global learnings in ~/.forgegod/memory.db (cross-project)| Mode | Behavior | Trigger |
|---|---|---|
normal |
Use all configured models | Default |
throttle |
Prefer local, cloud for review only | 80% of daily limit |
local-only |
Ollama only, $0 operation | Manual or 95% limit |
halt |
Stop all LLM calls | 100% of daily limit |
# Check spend
forgegod cost
# Override mode
export FORGEGOD_BUDGET_MODE=local-onlyUltra-terse prompts that reduce token usage 50-75% with no accuracy loss for coding tasks. Backed by 2026 research:
- Mini-SWE-Agent — 100 lines, >74% SWE-bench Verified
- Chain of Draft — 7.6% tokens, same accuracy
- CCoT — 48.7% shorter, negligible impact
# Add --terse to any command
forgegod run --terse "Build a REST API"
forgegod loop --terse --prd .forgegod/prd.json
forgegod plan --terse "Refactor auth module"
# Or enable globally in config
# .forgegod/config.toml
# [terse]
# enabled = trueCaveman mode compresses system prompts (~200 → ~80 tokens), tool descriptions (3-8 words each), and tool output (tracebacks → last frame only). JSON schemas for planner/reviewer stay byte-identical.
ForgeGod uses TOML config with 3-level priority: env vars > project > global.
# .forgegod/config.toml
[models]
planner = "openai:gpt-4o-mini" # Cheap planning
coder = "ollama:qwen3-coder-next" # Free local coding
reviewer = "openai:o4-mini" # Quality gate
sentinel = "openai:gpt-4o" # Frontier sampling
escalation = "openai:gpt-4o" # Fallback for hard problems
[budget]
daily_limit_usd = 5.00
mode = "normal"
[loop]
max_iterations = 100
parallel_workers = 2
gutter_detection = true
[ollama]
host = "http://localhost:11434"
model = "qwen3-coder-next"
[terse]
enabled = false # --terse flag or set true here
[security]
sandbox_mode = "standard" # permissive | standard | strict
redact_secrets = true
audit_commands = trueexport OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..." # Optional
export OPENROUTER_API_KEY="sk-or-..." # Optional
export GOOGLE_API_KEY="AIza..." # Optional (Gemini)
export FORGEGOD_BUDGET_DAILY_LIMIT_USD=10| Provider | Models | Cost | Setup |
|---|---|---|---|
| Ollama | qwen3-coder-next, devstral, any | $0 | ollama serve |
| OpenAI | gpt-4o, gpt-4o-mini, o3, o4-mini | $$ | OPENAI_API_KEY |
| Anthropic | claude-sonnet-4-6, claude-opus-4-6 | $$$ | ANTHROPIC_API_KEY |
| Google Gemini | gemini-2.5-pro, gemini-3-flash | $$ | GOOGLE_API_KEY |
| OpenRouter | 200+ models | varies | OPENROUTER_API_KEY |
Run your own: forgegod benchmark
| Model | Composite | Correctness | Quality | Speed | Cost | Self-Repair |
|---|---|---|---|---|---|---|
| openai:gpt-4o-mini | 81.5 | 10/12 | 7.4 | 12s avg | $0.08 | 4/4 |
| ollama:qwen3.5:9b | 72.3 | 8/12 | 6.8 | 45s avg | $0.00 | 3/4 |
Run forgegod benchmark --update-readme to refresh with your own results.
forgegod/
├── cli.py # Typer CLI (init, run, loop, plan, review, cost, memory, status, benchmark, doctor)
├── config.py # TOML config + env vars + 3-level priority
├── router.py # Multi-provider LLM router + persistent pool + cascade routing + half-open circuit breaker
├── agent.py # Core agent loop (tools + context compression + sub-agents)
├── coder.py # Reflexion code generation (3 attempts, model escalation, GOAP)
├── loop.py # Ralph loop (24/7 autonomous coding, parallel workers, story timeout)
├── planner.py # Task decomposition → PRD
├── reviewer.py # Frontier model quality gate (sample-based)
├── sica.py # Self-improving strategy modification (6 safety layers)
├── memory.py # 5-tier cognitive memory (FTS5 + RRF hybrid retrieval, WAL mode)
├── budget.py # SQLite cost + token tracking, forecasting, auto budget modes
├── worktree.py # Parallel git worktree workers
├── tui.py # Rich terminal dashboard
├── terse.py # Caveman mode — terse prompts, tool compression, savings tracker
���── benchmark.py # Model benchmarking engine (12 tasks, 4 tiers, composite scoring)
├── onboarding.py # Interactive setup wizard for new users
├── doctor.py # Installation health check (6 diagnostic checks)
├── i18n.py # Translation strings (English + Spanish es-419)
├── models.py # Pydantic v2 data models
└── tools/
├── filesystem.py # async read/write (aiofiles), atomic writes, fuzzy edit, glob, grep, repo_map
├── shell.py # bash (command denylist + secret redaction)
├── git.py # git status, diff, commit, worktrees
├── mcp.py # MCP server client (5,800+ servers)
└── skills.py # On-demand skill loading
Defense-in-depth, not security theater:
- Command denylist — 13 dangerous patterns blocked (
rm -rf /,curl | sh,sudo, fork bombs) - Secret redaction — 11 patterns strip API keys from tool output before LLM context
- Prompt injection detection — 8 patterns scan for jailbreak/role-override attempts
- AST code validation — Detects obfuscated dangerous calls (
getattr(os, 'system')) that regex misses - Supply chain defense — Flags known-abandoned/typosquat packages (python-jose, jeIlyfish, etc.)
- Canary token system — Detects if system prompt leaks into tool arguments, with per-session rotation
- Budget limits — Cost controls with token tracking + burn-rate forecasting
- Killswitch — Create
.forgegod/KILLSWITCHto immediately halt autonomous loops - Sensitive file protection —
.env, credentials files get warnings + automatic redaction
Warning: ForgeGod executes shell commands and modifies files. Review changes before committing. Start autonomous mode with
--max 5to verify behavior.
See SECURITY.md for the full policy and vulnerability reporting.
We welcome contributions. See CONTRIBUTING.md for guidelines.
- Bug reports and feature requests: GitHub Issues
- Questions and discussion: GitHub Discussions
Apache 2.0 — see LICENSE.
Built by WAITDEAD • Powered by techniques from OpenClaw, Hermes, and SOTA 2026 coding agent research.
