AI interactions, not transactions.
Run AI locally. Route intelligently. Use the cloud only when it matters.
ToastStack Starter is an open-source reference implementation for building local-first, cloud-backed LLM workflows.
Instead of sending every prompt to expensive cloud models, ToastStack routes requests intelligently:
- Local models handle most tasks
- Cloud models handle critical moments
- Routing logic decides automatically
Result: 80–95% cost reduction without sacrificing workflow quality
Modern AI development is expensive, unpredictable, and inefficient.
- Every prompt = cost
- Iteration becomes constrained
- Sensitive data leaves your environment
- Teams lack visibility and control
Most setups look like this:
flowchart LR
IDE["IDE / CLI"] --> CloudLLM["Cloud LLM"]
CloudLLM --> Cost["High cost"]
ToastStack flips the model:
flowchart TD
Dev["Developer / Agent"] --> GW["ToastStack Gateway"]
GW --> Local["Local models (Ollama)"]
GW --> Cloud["Cloud models (Claude / GPT)"]
- Local = default
- Cloud = escalation
This is a starter system, not a full platform.
You get:
- Pre-configured LiteLLM gateway
- Local model setup (Ollama)
- Example routing strategies
- Developer workflows (local-first + validation)
- Multi-agent patterns (planner, coder, reviewer)
- Benchmarks (cost, latency, quality)
Clone, run, and you have a working hybrid AI stack (once the setup scripts and gateway config are in place).
Prerequisites: The commands below match the intended layout for this repo. Some paths (Ollama setup scripts, Docker Compose for the gateway, and the sample app entrypoint) may still be stubs on your clone. Add or generate those assets from the docs when they land, or adjust paths to match your environment.
./local/setup-ollama.sh./local/pull-models.shdocker-compose upnode examples/sample-app/index.jsBasic routing strategy:
routes:
- match: "simple"
provider: "ollama"
- match: "complex"
provider: "anthropic"
fallback:
provider: "anthropic"- Local-first by default
- Cloud when needed
Example scenario: 1000 prompts
| Setup | Cost |
|---|---|
| Cloud-only | $42.00 |
| ToastStack | $4.80 |
| Savings | ~88% |
See benchmarks/ for full breakdowns.
Fast iteration using local models for:
- coding
- debugging
- drafting
Escalate only when needed for:
- final review
- complex reasoning
- production checks
Agent-based workflow:
- Planner: breaks tasks down
- Coder: implements changes
- Reviewer: validates output
Pre-defined agent roles:
- Planner — breaks down tasks
- Coder — implements changes
- Reviewer — validates output
These mimic real-world dev workflows.
ToastStack Starter is designed for developers and small teams.
As usage grows, teams typically need:
- centralized routing
- usage visibility
- cost tracking
- policy enforcement
This is where ToastStack evolves beyond this repo.
ToastStack is built on one core principle:
Run cheap and private by default.
Escalate to premium intelligence only when necessary.
This creates:
- faster iteration
- lower costs
- better control
- scalable workflows
This repository is:
- NOT a production-ready routing engine
- NOT a policy enforcement system
- NOT a cost optimization platform
- NOT a team-level control plane
It is a reference implementation.
- Local-first routing
- Cloud fallback
- Example workflows
- Smarter routing strategies
- Performance-aware selection
- Cost-aware execution
- Team-level policies
- Cost dashboards
- Prompt analytics
- Shared workflows
- Governance layer
AI is becoming infrastructure.
But right now, it is:
- expensive
- fragmented
- hard to control
ToastStack is an attempt to define a better pattern:
Hybrid, local-first AI development
Star the repo.
Share it.
Build on it.
This is not just a starter kit.
It is the beginning of a new standard for how developers work with AI.
Run local. Route smart. Scale intentionally.