Route local. Escalate smart. Never overspend.
Open-source intelligent model router for agentic pipelines. 74% of requests handled locally at $0. Escalate to cloud only when stuck.
pip install modelcascadefrom modelcascade import CascadeRouter
router = CascadeRouter.from_config("mc.yaml")
result = await router.complete(prompt)
# → routed LOCAL · $0.000 · 47ms| Tier | Models | Cost/1K | Coverage |
|---|---|---|---|
| LOCAL | Ollama, llama.cpp, vLLM | $0 | 74% |
| FAST | claude-haiku-4-5, Groq | $0.001 | +18% |
| CAPABLE | claude-sonnet-4-6, GPT-4o | $0.005 | +8% |
# mc.yaml
providers:
local:
type: ollama
model: llama3.2:3b
cost_per_1k: 0.0
fast:
type: anthropic
api_key: ${ANTHROPIC_API_KEY}
model: claude-haiku-4-5-20251001
cost_per_1k: 0.001
capable:
type: anthropic
api_key: ${ANTHROPIC_API_KEY}
model: claude-sonnet-4-6
cost_per_1k: 0.005
routing:
cost_ceiling: 0.01
cascade_on_failure: true
calibration: preset_v1- Classify first, spend second — Every request gets a difficulty score before a provider is chosen
- Fail cheap, succeed capable — Lower tiers fail fast, escalation is automatic
- Your keys, your data — BYOK, no telemetry, no vendor lock-in
- $3/night operating cost across 10K+ daily dispatches
- 74% local coverage at $0
- 21/21 A/B calibration tests passed
- Works with LangChain, CrewAI, Claude Code, and custom pipelines
MIT
The arbitrage was always going to close. Route responsibly.
Built by WayneColt | Research: catalytic-computing.ai | Enterprise: wayneia.com