Skip to content

toaststackhq/toaststack-starter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ToastStack Starter

AI interactions, not transactions.

Run AI locally. Route intelligently. Use the cloud only when it matters.


What is ToastStack?

ToastStack Starter is an open-source reference implementation for building local-first, cloud-backed LLM workflows.

Instead of sending every prompt to expensive cloud models, ToastStack routes requests intelligently:

  • Local models handle most tasks
  • Cloud models handle critical moments
  • Routing logic decides automatically

Result: 80–95% cost reduction without sacrificing workflow quality


The Problem

Modern AI development is expensive, unpredictable, and inefficient.

  • Every prompt = cost
  • Iteration becomes constrained
  • Sensitive data leaves your environment
  • Teams lack visibility and control

Most setups look like this:

flowchart LR
    IDE["IDE / CLI"] --> CloudLLM["Cloud LLM"]
    CloudLLM --> Cost["High cost"]
Loading

The Shift

ToastStack flips the model:

flowchart TD
    Dev["Developer / Agent"] --> GW["ToastStack Gateway"]
    GW --> Local["Local models (Ollama)"]
    GW --> Cloud["Cloud models (Claude / GPT)"]
Loading
  • Local = default
  • Cloud = escalation

What This Repo Gives You

This is a starter system, not a full platform.

You get:

  • Pre-configured LiteLLM gateway
  • Local model setup (Ollama)
  • Example routing strategies
  • Developer workflows (local-first + validation)
  • Multi-agent patterns (planner, coder, reviewer)
  • Benchmarks (cost, latency, quality)

Clone, run, and you have a working hybrid AI stack (once the setup scripts and gateway config are in place).


Quick Start

Prerequisites: The commands below match the intended layout for this repo. Some paths (Ollama setup scripts, Docker Compose for the gateway, and the sample app entrypoint) may still be stubs on your clone. Add or generate those assets from the docs when they land, or adjust paths to match your environment.

1. Setup local models

./local/setup-ollama.sh

2. Pull recommended models

./local/pull-models.sh

3. Start the gateway

docker-compose up

4. Run an example

node examples/sample-app/index.js

Routing Example

Basic routing strategy:

routes:
  - match: "simple"
    provider: "ollama"

  - match: "complex"
    provider: "anthropic"

fallback:
  provider: "anthropic"
  • Local-first by default
  • Cloud when needed

Cost Impact

Example scenario: 1000 prompts

Setup Cost
Cloud-only $42.00
ToastStack $4.80
Savings ~88%

See benchmarks/ for full breakdowns.


Workflows Included

Local-First Development

Fast iteration using local models for:

  • coding
  • debugging
  • drafting

Cloud Validation

Escalate only when needed for:

  • final review
  • complex reasoning
  • production checks

PR Review Flow

Agent-based workflow:

  • Planner: breaks tasks down
  • Coder: implements changes
  • Reviewer: validates output

Agents

Pre-defined agent roles:

  • Planner — breaks down tasks
  • Coder — implements changes
  • Reviewer — validates output

These mimic real-world dev workflows.


For Teams

ToastStack Starter is designed for developers and small teams.

As usage grows, teams typically need:

  • centralized routing
  • usage visibility
  • cost tracking
  • policy enforcement

This is where ToastStack evolves beyond this repo.


Architecture Philosophy

ToastStack is built on one core principle:

Run cheap and private by default.
Escalate to premium intelligence only when necessary.

This creates:

  • faster iteration
  • lower costs
  • better control
  • scalable workflows

What This Is NOT

This repository is:

  • NOT a production-ready routing engine
  • NOT a policy enforcement system
  • NOT a cost optimization platform
  • NOT a team-level control plane

It is a reference implementation.


Roadmap

Current

  • Local-first routing
  • Cloud fallback
  • Example workflows

In Progress

  • Smarter routing strategies
  • Performance-aware selection
  • Cost-aware execution

Future (ToastStack Platform)

  • Team-level policies
  • Cost dashboards
  • Prompt analytics
  • Shared workflows
  • Governance layer

Learn More

https://toaststack.com


Why This Exists

AI is becoming infrastructure.

But right now, it is:

  • expensive
  • fragmented
  • hard to control

ToastStack is an attempt to define a better pattern:

Hybrid, local-first AI development


If This Helps You

Star the repo.
Share it.
Build on it.


Final Thought

This is not just a starter kit.

It is the beginning of a new standard for how developers work with AI.

Run local. Route smart. Scale intentionally.

About

Open-source hybrid LLM starter for local-first workflows with smart cloud fallback and cost optimization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages