freerouter

An OpenAI-compatible proxy that gathers only OpenRouter's free model endpoints and routes/falls back across them automatically.

Point any existing OpenAI SDK/client at freerouter, and it picks a model from OpenRouter's free pool (pricing.prompt == 0 && pricing.completion == 0), then falls back to the next free model automatically on a rate limit (429) or a transient error.

How it works

On startup, fetch OpenRouter /api/v1/models, keep only free models, and cache them (TTL 600s by default).
On POST /v1/chat/completions, the router builds a candidate order.
- A specific free model goes first; otherwise (auto/empty/paid/unknown) the free-pool priority is used (:free-suffixed first, then larger context).
- A model that just returned 429 is pushed back during its cooldown (60s by default).
Try candidates in order, falling back on 429/402/404/5xx. If all fail, return 502.
The response header X-freerouter-Model carries the model actually used.

Free models

The routable pool is OpenRouter's free, chat-capable models. The list below is generated from the live catalog — run python scripts/update_models.py to refresh it locally, and CI refreshes it automatically once a week.

26 free, chat-capable models (prompt + completion priced at $0). Auto-generated by scripts/update_models.py.

Model ID	Context
`qwen/qwen3-coder:free`	1,048,576
`nvidia/nemotron-3-ultra-550b-a55b:free`	1,000,000
`nvidia/nemotron-3-super-120b-a12b:free`	1,000,000
`poolside/laguna-xs.2:free`	262,144
`poolside/laguna-m.1:free`	262,144
`google/gemma-4-26b-a4b-it:free`	262,144
`google/gemma-4-31b-it:free`	262,144
`qwen/qwen3-next-80b-a3b-instruct:free`	262,144
`cohere/north-mini-code:free`	256,000
`nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free`	256,000
`nvidia/nemotron-3-nano-30b-a3b:free`	256,000
`openai/gpt-oss-120b:free`	131,072
`openai/gpt-oss-20b:free`	131,072
`meta-llama/llama-3.3-70b-instruct:free`	131,072
`meta-llama/llama-3.2-3b-instruct:free`	131,072
`nousresearch/hermes-3-llama-3.1-405b:free`	131,072
`nvidia/nemotron-3.5-content-safety:free`	128,000
`nvidia/nemotron-nano-12b-v2-vl:free`	128,000
`nvidia/nemotron-nano-9b-v2:free`	128,000
`liquid/lfm-2.5-1.2b-thinking:free`	32,768
`liquid/lfm-2.5-1.2b-instruct:free`	32,768
`cognitivecomputations/dolphin-mistral-24b-venice-edition:free`	32,768
`openrouter/owl-alpha`	1,048,756
`google/lyria-3-pro-preview`	1,048,576
`google/lyria-3-clip-preview`	1,048,576
`openrouter/free`	200,000

Install

python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env   # fill in OPENROUTER_API_KEY

Run

freerouter            # or: python -m freerouter
# defaults to http://127.0.0.1:8000

Usage (with the OpenAI SDK as-is)

from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:8000/v1", api_key="unused")

resp = client.chat.completions.create(
    model="auto",  # freerouter picks from the free pool automatically
    messages=[{"role": "user", "content": "hello"}],
)
print(resp.choices[0].message.content)

curl:

curl http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"auto","messages":[{"role":"user","content":"hi"}]}'

Use as a library (no server)

You can embed freerouter into another agent program as a submodule without running an HTTP server. FreeRouterClient fetches the free-model list over REST (TTL-cached) and routes with fallback in-process.

import asyncio
from freerouter import FreeRouterClient

async def main():
    async with FreeRouterClient() as fr:           # uses OPENROUTER_API_KEY from .env
        free = await fr.models()                    # routable free models
        data = await fr.chat(
            [{"role": "user", "content": "hello"}],
            model="auto",                           # auto = pick from the free pool + fallback
            max_tokens=64,
        )
        print(data["model"], data["choices"][0]["message"]["content"])

asyncio.run(main())

In synchronous code (outside an event loop), use the sync wrapper:

from freerouter import FreeRouterClient

fr = FreeRouterClient(api_key="sk-or-...")          # passing the key directly also works
data = fr.chat_sync([{"role": "user", "content": "hi"}], model="auto")

Share an httpx client: if the agent already uses an httpx.AsyncClient, inject it — FreeRouterClient(http_client=my_client). When injected, the caller owns its lifecycle.
Share state: inject registry/router so several clients share the free-list cache and cooldowns.
Streaming: async for chunk in fr.stream_raw(payload) — raw SSE bytes.
A non-retryable 4xx (e.g. a bad request) raises httpx.HTTPStatusError; all-candidates-failed raises FreeRouterError.

The same routing/fallback core is shared by the proxy server (proxy.py) and the library.

Endpoints

Method	Path	Description
POST	`/v1/chat/completions`	OpenAI-compatible. Supports streaming (`"stream": true`).
GET	`/v1/models`	Exposes only routable free models.
GET	`/health`	Health check.

Configuration

Controlled via .env or environment variables. See src/freerouter/config.py for the full list.

Key	Default	Description
`OPENROUTER_API_KEY`	(required)	OpenRouter key
`MODEL_REFRESH_TTL`	600	free-list cache TTL (seconds)
`MAX_ATTEMPTS`	8	max number of models to try as fallbacks
`COOLDOWN_SECONDS`	60	how long to skip a model after a 429 (seconds)

Tests

pytest

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
scripts		scripts
src/freerouter		src/freerouter
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

freerouter

How it works

Free models

Install

Run

Usage (with the OpenAI SDK as-is)

Use as a library (no server)

Endpoints

Configuration

Tests

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

freerouter

How it works

Free models

Install

Run

Usage (with the OpenAI SDK as-is)

Use as a library (no server)

Endpoints

Configuration

Tests

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages