Skip to content

waitdeadai/forgegod

English Español

ForgeGod

ForgeGod

The coding agent that runs 24/7, learns from its mistakes, and costs $0 when you want it to.

PyPI License Python 3.11+ CI Website Tests

19 built-in tools5 LLM providers5-tier memory24/7 autonomous$0 local mode


ForgeGod orchestrates multiple LLMs (OpenAI, Anthropic, Google Gemini, Ollama, OpenRouter) into a single autonomous coding engine. It routes tasks to the right model, runs 24/7 from a PRD, learns from every outcome, and self-improves its own strategy. Run it locally for $0 with Ollama, or use cloud models when you need them.

pip install forgegod

What Makes ForgeGod Different

Every other coding CLI uses one model at a time and resets to zero each session. ForgeGod doesn't.

Capability Claude Code Codex CLI Aider Cursor ForgeGod
Multi-model auto-routing - - manual - yes
Local + cloud hybrid - basic basic - native
24/7 autonomous loops - - - - yes
Cross-session memory basic - - removed 5-tier
Self-improving strategy - - - - yes (SICA)
Cost-aware budget modes - - - - yes
Reflexion code generation - - - - 3-attempt
Parallel git worktrees subagents - - - yes
Stress tested + benchmarked - - - - 355 + 84 stress

The Moat: Harness > Model

Scaffolding adds ~11 points on SWE-bench — harness engineering matters as much as the model. ForgeGod is the harness:

  • Ralph Loop — 24/7 coding from a PRD. Progress lives in git, not LLM context. Fresh agent per story. No context rot.
  • 5-Tier Memory — Episodic (what happened) + Semantic (what I know) + Procedural (how I do things) + Graph (how things connect) + Error-Solutions (what fixes what). Memories decay, consolidate, and reinforce automatically.
  • Reflexion Coder — 3-attempt code gen with escalating models: local (free) → cloud (cheap) → frontier (when it matters). AST validation at every step.
  • SICA — Self-Improving Coding Agent. Modifies its own prompts, model routing, and strategy based on outcomes. 6 safety layers prevent drift.
  • Budget Modesnormalthrottlelocal-onlyhalt. Auto-triggered by spend. Run forever on Ollama for $0.

Getting Started (No Coding Required)

You don't need to be a developer to use ForgeGod. If you can describe what you want in plain English, ForgeGod writes the code.

Option A: Free Local Mode ($0)

  1. Install Ollama: https://ollama.com/download
  2. Pull a model: ollama pull qwen3.5:9b
  3. Install ForgeGod: pip install forgegod
  4. Run: forgegod init (interactive wizard guides you)
  5. Try it: forgegod run "Create a simple website with a contact form"

Option B: Cloud Mode (faster, ~$0.01/task)

  1. Get an OpenAI key: https://platform.openai.com/api-keys
  2. Install ForgeGod: pip install forgegod
  3. Run: forgegod init → paste your key when prompted
  4. Try it: forgegod run "Build a REST API with user authentication"

Something not working?

Run forgegod doctor — it checks your setup and tells you exactly what to fix.

Quickstart

# Install
pip install forgegod

# Initialize a project
forgegod init

# Single task
forgegod run "Add a /health endpoint to server.py with uptime and version info"

# Plan a project → generates PRD
forgegod plan "Build a REST API for a todo app with auth, CRUD, and tests"

# 24/7 autonomous loop from PRD
forgegod loop --prd .forgegod/prd.json

# Caveman mode — 50-75% token savings with ultra-terse prompts
forgegod run --terse "Add a /health endpoint"

# Check what it learned
forgegod memory

# View cost breakdown
forgegod cost

# Benchmark your models
forgegod benchmark

# Health check
forgegod doctor

Zero-Config Start

ForgeGod auto-detects your environment on first run:

  1. Finds API keys in env vars (OPENAI_API_KEY, ANTHROPIC_API_KEY)
  2. Checks if Ollama is running locally
  3. Detects your project language, test framework, and linter
  4. Picks the best model for each role based on what's available
  5. Creates .forgegod/config.toml with sensible defaults

No manual setup required. Just run forgegod init and go.

How the Ralph Loop Works

┌─────────────────────────────────────────────────┐
│                  RALPH LOOP                      │
│                                                  │
│  ┌──────┐   ┌───────┐   ┌─────────┐   ┌─────┐ │
│  │ READ │──▶│ SPAWN │──▶│ EXECUTE │──▶│ VAL │ │
│  │ PRD  │   │ AGENT │   │  STORY  │   │IDATE│ │
│  └──────┘   └───────┘   └─────────┘   └──┬──┘ │
│      ▲                                    │     │
│      │         ┌────────┐    ┌────────┐   │     │
│      └─────────│ROTATE  │◀───│COMMIT  │◀──┘     │
│                │CONTEXT │    │OR RETRY│   pass   │
│                └────────┘    └────────┘          │
│                                                  │
│  Progress is in GIT, not LLM context.           │
│  Fresh agent per story. No context rot.          │
│  Create .forgegod/KILLSWITCH to stop.           │
└─────────────────────────────────────────────────┘
  1. Read PRD — Pick highest-priority TODO story
  2. Spawn agent — Fresh context (progress is in git, not memory)
  3. Execute — Agent uses 19 tools to implement the story
  4. Validate — Tests, lint, syntax, frontier review
  5. Commit or retry — Pass: commit + mark done. Fail: retry up to 3x with model escalation
  6. Rotate — Next story. Context is always fresh.

5-Tier Memory System

ForgeGod has the most advanced memory system of any open-source coding agent:

Tier What How Retention
Episodic What happened per task Full outcome records 90 days
Semantic Extracted principles Confidence + decay + reinforcement Indefinite
Procedural Code patterns & fix recipes Success rate tracking Indefinite
Graph Entity relationships + causal edges Auto-extracted from outcomes Indefinite
Error-Solution Error pattern → fix mapping Fuzzy match lookup Indefinite

Memories decay with category-specific half-life (14d debugging → 90d architecture), consolidate via O(n*k) category-bucketed comparison, and are recalled via FTS5 + Jaccard hybrid retrieval (Reciprocal Rank Fusion). SQLite WAL mode for concurrent access.

# Check memory health
forgegod memory

# Memory is stored in .forgegod/memory.db (SQLite)
# Global learnings in ~/.forgegod/memory.db (cross-project)

Budget Modes

Mode Behavior Trigger
normal Use all configured models Default
throttle Prefer local, cloud for review only 80% of daily limit
local-only Ollama only, $0 operation Manual or 95% limit
halt Stop all LLM calls 100% of daily limit
# Check spend
forgegod cost

# Override mode
export FORGEGOD_BUDGET_MODE=local-only

Caveman Mode (--terse)

Ultra-terse prompts that reduce token usage 50-75% with no accuracy loss for coding tasks. Backed by 2026 research:

# Add --terse to any command
forgegod run --terse "Build a REST API"
forgegod loop --terse --prd .forgegod/prd.json
forgegod plan --terse "Refactor auth module"

# Or enable globally in config
# .forgegod/config.toml
# [terse]
# enabled = true

Caveman mode compresses system prompts (~200 → ~80 tokens), tool descriptions (3-8 words each), and tool output (tracebacks → last frame only). JSON schemas for planner/reviewer stay byte-identical.

Configuration

ForgeGod uses TOML config with 3-level priority: env vars > project > global.

# .forgegod/config.toml

[models]
planner = "openai:gpt-4o-mini"        # Cheap planning
coder = "ollama:qwen3-coder-next"     # Free local coding
reviewer = "openai:o4-mini"           # Quality gate
sentinel = "openai:gpt-4o"            # Frontier sampling
escalation = "openai:gpt-4o"          # Fallback for hard problems

[budget]
daily_limit_usd = 5.00
mode = "normal"

[loop]
max_iterations = 100
parallel_workers = 2
gutter_detection = true

[ollama]
host = "http://localhost:11434"
model = "qwen3-coder-next"

[terse]
enabled = false              # --terse flag or set true here

[security]
sandbox_mode = "standard"    # permissive | standard | strict
redact_secrets = true
audit_commands = true

Environment Variables

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."     # Optional
export OPENROUTER_API_KEY="sk-or-..."     # Optional
export GOOGLE_API_KEY="AIza..."           # Optional (Gemini)
export FORGEGOD_BUDGET_DAILY_LIMIT_USD=10

Supported Models

Provider Models Cost Setup
Ollama qwen3-coder-next, devstral, any $0 ollama serve
OpenAI gpt-4o, gpt-4o-mini, o3, o4-mini $$ OPENAI_API_KEY
Anthropic claude-sonnet-4-6, claude-opus-4-6 $$$ ANTHROPIC_API_KEY
Google Gemini gemini-2.5-pro, gemini-3-flash $$ GOOGLE_API_KEY
OpenRouter 200+ models varies OPENROUTER_API_KEY

Model Leaderboard

Run your own: forgegod benchmark

Model Composite Correctness Quality Speed Cost Self-Repair
openai:gpt-4o-mini 81.5 10/12 7.4 12s avg $0.08 4/4
ollama:qwen3.5:9b 72.3 8/12 6.8 45s avg $0.00 3/4

Run forgegod benchmark --update-readme to refresh with your own results.

Architecture

forgegod/
├── cli.py          # Typer CLI (init, run, loop, plan, review, cost, memory, status, benchmark, doctor)
├── config.py       # TOML config + env vars + 3-level priority
├── router.py       # Multi-provider LLM router + persistent pool + cascade routing + half-open circuit breaker
├── agent.py        # Core agent loop (tools + context compression + sub-agents)
├── coder.py        # Reflexion code generation (3 attempts, model escalation, GOAP)
├── loop.py         # Ralph loop (24/7 autonomous coding, parallel workers, story timeout)
├── planner.py      # Task decomposition → PRD
├── reviewer.py     # Frontier model quality gate (sample-based)
├── sica.py         # Self-improving strategy modification (6 safety layers)
├── memory.py       # 5-tier cognitive memory (FTS5 + RRF hybrid retrieval, WAL mode)
├── budget.py       # SQLite cost + token tracking, forecasting, auto budget modes
├── worktree.py     # Parallel git worktree workers
├── tui.py          # Rich terminal dashboard
├── terse.py        # Caveman mode — terse prompts, tool compression, savings tracker
���── benchmark.py    # Model benchmarking engine (12 tasks, 4 tiers, composite scoring)
├── onboarding.py   # Interactive setup wizard for new users
├── doctor.py       # Installation health check (6 diagnostic checks)
├── i18n.py         # Translation strings (English + Spanish es-419)
├── models.py       # Pydantic v2 data models
└── tools/
    ├── filesystem.py  # async read/write (aiofiles), atomic writes, fuzzy edit, glob, grep, repo_map
    ├── shell.py       # bash (command denylist + secret redaction)
    ├── git.py         # git status, diff, commit, worktrees
    ├── mcp.py         # MCP server client (5,800+ servers)
    └── skills.py      # On-demand skill loading

Security

Defense-in-depth, not security theater:

  • Command denylist — 13 dangerous patterns blocked (rm -rf /, curl | sh, sudo, fork bombs)
  • Secret redaction — 11 patterns strip API keys from tool output before LLM context
  • Prompt injection detection — 8 patterns scan for jailbreak/role-override attempts
  • AST code validation — Detects obfuscated dangerous calls (getattr(os, 'system')) that regex misses
  • Supply chain defense — Flags known-abandoned/typosquat packages (python-jose, jeIlyfish, etc.)
  • Canary token system — Detects if system prompt leaks into tool arguments, with per-session rotation
  • Budget limits — Cost controls with token tracking + burn-rate forecasting
  • Killswitch — Create .forgegod/KILLSWITCH to immediately halt autonomous loops
  • Sensitive file protection.env, credentials files get warnings + automatic redaction

Warning: ForgeGod executes shell commands and modifies files. Review changes before committing. Start autonomous mode with --max 5 to verify behavior.

See SECURITY.md for the full policy and vulnerability reporting.

Contributing

We welcome contributions. See CONTRIBUTING.md for guidelines.

License

Apache 2.0 — see LICENSE.


Built by WAITDEAD • Powered by techniques from OpenClaw, Hermes, and SOTA 2026 coding agent research.

About

Autonomous coding agent with web research (Recon), adversarial plan debate, 5-tier cognitive memory, multi-model routing (Gemini + DeepSeek + Ollama), 24/7 loops, and $0 local mode. Apache 2.0.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors