ForgeGod

The coding agent that runs 24/7, learns from its mistakes, and costs $0 when you want it to.

19 built-in tools • 5 LLM providers • 5-tier memory • 24/7 autonomous • $0 local mode

ForgeGod orchestrates multiple LLMs (OpenAI, Anthropic, Google Gemini, Ollama, OpenRouter) into a single autonomous coding engine. It routes tasks to the right model, runs 24/7 from a PRD, learns from every outcome, and self-improves its own strategy. Run it locally for $0 with Ollama, or use cloud models when you need them.

pip install forgegod

What Makes ForgeGod Different

Every other coding CLI uses one model at a time and resets to zero each session. ForgeGod doesn't.

Capability	Claude Code	Codex CLI	Aider	Cursor	ForgeGod
Multi-model auto-routing	-	-	manual	-	yes
Local + cloud hybrid	-	basic	basic	-	native
24/7 autonomous loops	-	-	-	-	yes
Cross-session memory	basic	-	-	removed	5-tier
Self-improving strategy	-	-	-	-	yes (SICA)
Cost-aware budget modes	-	-	-	-	yes
Reflexion code generation	-	-	-	-	3-attempt
Parallel git worktrees	subagents	-	-	-	yes
Stress tested + benchmarked	-	-	-	-	355 + 84 stress

The Moat: Harness > Model

Scaffolding adds ~11 points on SWE-bench — harness engineering matters as much as the model. ForgeGod is the harness:

Ralph Loop — 24/7 coding from a PRD. Progress lives in git, not LLM context. Fresh agent per story. No context rot.
5-Tier Memory — Episodic (what happened) + Semantic (what I know) + Procedural (how I do things) + Graph (how things connect) + Error-Solutions (what fixes what). Memories decay, consolidate, and reinforce automatically.
Reflexion Coder — 3-attempt code gen with escalating models: local (free) → cloud (cheap) → frontier (when it matters). AST validation at every step.
SICA — Self-Improving Coding Agent. Modifies its own prompts, model routing, and strategy based on outcomes. 6 safety layers prevent drift.
Budget Modes — normal → throttle → local-only → halt. Auto-triggered by spend. Run forever on Ollama for $0.

Getting Started (No Coding Required)

You don't need to be a developer to use ForgeGod. If you can describe what you want in plain English, ForgeGod writes the code.

Option A: Free Local Mode ($0)

Install Ollama: https://ollama.com/download
Pull a model: ollama pull qwen3.5:9b
Install ForgeGod: pip install forgegod
Run: forgegod init (interactive wizard guides you)
Try it: forgegod run "Create a simple website with a contact form"

Option B: Cloud Mode (faster, ~$0.01/task)

Get an OpenAI key: https://platform.openai.com/api-keys
Install ForgeGod: pip install forgegod
Run: forgegod init → paste your key when prompted
Try it: forgegod run "Build a REST API with user authentication"

Something not working?

Run forgegod doctor — it checks your setup and tells you exactly what to fix.

Quickstart

# Install
pip install forgegod

# Initialize a project
forgegod init

# Single task
forgegod run "Add a /health endpoint to server.py with uptime and version info"

# Plan a project → generates PRD
forgegod plan "Build a REST API for a todo app with auth, CRUD, and tests"

# 24/7 autonomous loop from PRD
forgegod loop --prd .forgegod/prd.json

# Caveman mode — 50-75% token savings with ultra-terse prompts
forgegod run --terse "Add a /health endpoint"

# Check what it learned
forgegod memory

# View cost breakdown
forgegod cost

# Benchmark your models
forgegod benchmark

# Health check
forgegod doctor

Zero-Config Start

ForgeGod auto-detects your environment on first run:

Finds API keys in env vars (OPENAI_API_KEY, ANTHROPIC_API_KEY)
Checks if Ollama is running locally
Detects your project language, test framework, and linter
Picks the best model for each role based on what's available
Creates .forgegod/config.toml with sensible defaults

No manual setup required. Just run forgegod init and go.

How the Ralph Loop Works

┌─────────────────────────────────────────────────┐
│                  RALPH LOOP                      │
│                                                  │
│  ┌──────┐   ┌───────┐   ┌─────────┐   ┌─────┐ │
│  │ READ │──▶│ SPAWN │──▶│ EXECUTE │──▶│ VAL │ │
│  │ PRD  │   │ AGENT │   │  STORY  │   │IDATE│ │
│  └──────┘   └───────┘   └─────────┘   └──┬──┘ │
│      ▲                                    │     │
│      │         ┌────────┐    ┌────────┐   │     │
│      └─────────│ROTATE  │◀───│COMMIT  │◀──┘     │
│                │CONTEXT │    │OR RETRY│   pass   │
│                └────────┘    └────────┘          │
│                                                  │
│  Progress is in GIT, not LLM context.           │
│  Fresh agent per story. No context rot.          │
│  Create .forgegod/KILLSWITCH to stop.           │
└─────────────────────────────────────────────────┘

Read PRD — Pick highest-priority TODO story
Spawn agent — Fresh context (progress is in git, not memory)
Execute — Agent uses 19 tools to implement the story
Validate — Tests, lint, syntax, frontier review
Commit or retry — Pass: commit + mark done. Fail: retry up to 3x with model escalation
Rotate — Next story. Context is always fresh.

5-Tier Memory System

ForgeGod has the most advanced memory system of any open-source coding agent:

Tier	What	How	Retention
Episodic	What happened per task	Full outcome records	90 days
Semantic	Extracted principles	Confidence + decay + reinforcement	Indefinite
Procedural	Code patterns & fix recipes	Success rate tracking	Indefinite
Graph	Entity relationships + causal edges	Auto-extracted from outcomes	Indefinite
Error-Solution	Error pattern → fix mapping	Fuzzy match lookup	Indefinite

Memories decay with category-specific half-life (14d debugging → 90d architecture), consolidate via O(n*k) category-bucketed comparison, and are recalled via FTS5 + Jaccard hybrid retrieval (Reciprocal Rank Fusion). SQLite WAL mode for concurrent access.

# Check memory health
forgegod memory

# Memory is stored in .forgegod/memory.db (SQLite)
# Global learnings in ~/.forgegod/memory.db (cross-project)

Budget Modes

Mode	Behavior	Trigger
`normal`	Use all configured models	Default
`throttle`	Prefer local, cloud for review only	80% of daily limit
`local-only`	Ollama only, $0 operation	Manual or 95% limit
`halt`	Stop all LLM calls	100% of daily limit

# Check spend
forgegod cost

# Override mode
export FORGEGOD_BUDGET_MODE=local-only

Caveman Mode (`--terse`)

Ultra-terse prompts that reduce token usage 50-75% with no accuracy loss for coding tasks. Backed by 2026 research:

Mini-SWE-Agent — 100 lines, >74% SWE-bench Verified
Chain of Draft — 7.6% tokens, same accuracy
CCoT — 48.7% shorter, negligible impact

# Add --terse to any command
forgegod run --terse "Build a REST API"
forgegod loop --terse --prd .forgegod/prd.json
forgegod plan --terse "Refactor auth module"

# Or enable globally in config
# .forgegod/config.toml
# [terse]
# enabled = true

Caveman mode compresses system prompts (~200 → ~80 tokens), tool descriptions (3-8 words each), and tool output (tracebacks → last frame only). JSON schemas for planner/reviewer stay byte-identical.

Configuration

ForgeGod uses TOML config with 3-level priority: env vars > project > global.

# .forgegod/config.toml

[models]
planner = "openai:gpt-4o-mini"        # Cheap planning
coder = "ollama:qwen3-coder-next"     # Free local coding
reviewer = "openai:o4-mini"           # Quality gate
sentinel = "openai:gpt-4o"            # Frontier sampling
escalation = "openai:gpt-4o"          # Fallback for hard problems

[budget]
daily_limit_usd = 5.00
mode = "normal"

[loop]
max_iterations = 100
parallel_workers = 2
gutter_detection = true

[ollama]
host = "http://localhost:11434"
model = "qwen3-coder-next"

[terse]
enabled = false              # --terse flag or set true here

[security]
sandbox_mode = "standard"    # permissive | standard | strict
redact_secrets = true
audit_commands = true

Environment Variables

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."     # Optional
export OPENROUTER_API_KEY="sk-or-..."     # Optional
export GOOGLE_API_KEY="AIza..."           # Optional (Gemini)
export FORGEGOD_BUDGET_DAILY_LIMIT_USD=10

Supported Models

Provider	Models	Cost	Setup
Ollama	qwen3-coder-next, devstral, any	$0	`ollama serve`
OpenAI	gpt-4o, gpt-4o-mini, o3, o4-mini	$$	`OPENAI_API_KEY`
Anthropic	claude-sonnet-4-6, claude-opus-4-6	$$$	`ANTHROPIC_API_KEY`
Google Gemini	gemini-2.5-pro, gemini-3-flash	$$	`GOOGLE_API_KEY`
OpenRouter	200+ models	varies	`OPENROUTER_API_KEY`

Model Leaderboard

Run your own: forgegod benchmark

Model	Composite	Correctness	Quality	Speed	Cost	Self-Repair
openai:gpt-4o-mini	81.5	10/12	7.4	12s avg	$0.08	4/4
ollama:qwen3.5:9b	72.3	8/12	6.8	45s avg	$0.00	3/4

Run forgegod benchmark --update-readme to refresh with your own results.

Architecture

forgegod/
├── cli.py          # Typer CLI (init, run, loop, plan, review, cost, memory, status, benchmark, doctor)
├── config.py       # TOML config + env vars + 3-level priority
├── router.py       # Multi-provider LLM router + persistent pool + cascade routing + half-open circuit breaker
├── agent.py        # Core agent loop (tools + context compression + sub-agents)
├── coder.py        # Reflexion code generation (3 attempts, model escalation, GOAP)
├── loop.py         # Ralph loop (24/7 autonomous coding, parallel workers, story timeout)
├── planner.py      # Task decomposition → PRD
├── reviewer.py     # Frontier model quality gate (sample-based)
├── sica.py         # Self-improving strategy modification (6 safety layers)
├── memory.py       # 5-tier cognitive memory (FTS5 + RRF hybrid retrieval, WAL mode)
├── budget.py       # SQLite cost + token tracking, forecasting, auto budget modes
├── worktree.py     # Parallel git worktree workers
├── tui.py          # Rich terminal dashboard
├── terse.py        # Caveman mode — terse prompts, tool compression, savings tracker
���── benchmark.py    # Model benchmarking engine (12 tasks, 4 tiers, composite scoring)
├── onboarding.py   # Interactive setup wizard for new users
├── doctor.py       # Installation health check (6 diagnostic checks)
├── i18n.py         # Translation strings (English + Spanish es-419)
├── models.py       # Pydantic v2 data models
└── tools/
    ├── filesystem.py  # async read/write (aiofiles), atomic writes, fuzzy edit, glob, grep, repo_map
    ├── shell.py       # bash (command denylist + secret redaction)
    ├── git.py         # git status, diff, commit, worktrees
    ├── mcp.py         # MCP server client (5,800+ servers)
    └── skills.py      # On-demand skill loading

Security

Defense-in-depth, not security theater:

Command denylist — 13 dangerous patterns blocked (rm -rf /, curl | sh, sudo, fork bombs)
Secret redaction — 11 patterns strip API keys from tool output before LLM context
Prompt injection detection — 8 patterns scan for jailbreak/role-override attempts
AST code validation — Detects obfuscated dangerous calls (getattr(os, 'system')) that regex misses
Supply chain defense — Flags known-abandoned/typosquat packages (python-jose, jeIlyfish, etc.)
Canary token system — Detects if system prompt leaks into tool arguments, with per-session rotation
Budget limits — Cost controls with token tracking + burn-rate forecasting
Killswitch — Create .forgegod/KILLSWITCH to immediately halt autonomous loops
Sensitive file protection — .env, credentials files get warnings + automatic redaction

Warning: ForgeGod executes shell commands and modifies files. Review changes before committing. Start autonomous mode with --max 5 to verify behavior.

See SECURITY.md for the full policy and vulnerability reporting.

Contributing

We welcome contributions. See CONTRIBUTING.md for guidelines.

Bug reports and feature requests: GitHub Issues
Questions and discussion: GitHub Discussions

License

Apache 2.0 — see LICENSE.

Built by WAITDEAD • Powered by techniques from OpenClaw, Hermes, and SOTA 2026 coding agent research.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github		.github
docs		docs
forgegod		forgegod
scripts		scripts
tests		tests
.gitignore		.gitignore
BENCHMARKS.md		BENCHMARKS.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.es.md		README.es.md
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ForgeGod

What Makes ForgeGod Different

The Moat: Harness > Model

Getting Started (No Coding Required)

Option A: Free Local Mode ($0)

Option B: Cloud Mode (faster, ~$0.01/task)

Something not working?

Quickstart

Zero-Config Start

How the Ralph Loop Works

5-Tier Memory System

Budget Modes

Caveman Mode (`--terse`)

Configuration

Environment Variables

Supported Models

Model Leaderboard

Architecture

Security

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ForgeGod

What Makes ForgeGod Different

The Moat: Harness > Model

Getting Started (No Coding Required)

Option A: Free Local Mode ($0)

Option B: Cloud Mode (faster, ~$0.01/task)

Something not working?

Quickstart

Zero-Config Start

How the Ralph Loop Works

5-Tier Memory System

Budget Modes

Caveman Mode (--terse)

Configuration

Environment Variables

Supported Models

Model Leaderboard

Architecture

Security

Contributing

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Caveman Mode (`--terse`)

Packages