An OpenAI-compatible proxy that gathers only OpenRouter's free model endpoints and routes/falls back across them automatically.
Point any existing OpenAI SDK/client at freerouter, and it picks a model from OpenRouter's
free pool (pricing.prompt == 0 && pricing.completion == 0), then falls back to the next
free model automatically on a rate limit (429) or a transient error.
- On startup, fetch OpenRouter
/api/v1/models, keep only free models, and cache them (TTL 600s by default). - On
POST /v1/chat/completions, the router builds a candidate order.- A specific free
modelgoes first; otherwise (auto/empty/paid/unknown) the free-pool priority is used (:free-suffixed first, then larger context). - A model that just returned 429 is pushed back during its cooldown (60s by default).
- A specific free
- Try candidates in order, falling back on 429/402/404/5xx. If all fail, return 502.
- The response header
X-freerouter-Modelcarries the model actually used.
The routable pool is OpenRouter's free, chat-capable models. The list below is generated from
the live catalog — run python scripts/update_models.py to refresh it locally, and CI refreshes
it automatically once a week.
26 free, chat-capable models (prompt + completion priced at $0). Auto-generated by scripts/update_models.py.
| Model ID | Context |
|---|---|
qwen/qwen3-coder:free |
1,048,576 |
nvidia/nemotron-3-ultra-550b-a55b:free |
1,000,000 |
nvidia/nemotron-3-super-120b-a12b:free |
1,000,000 |
poolside/laguna-xs.2:free |
262,144 |
poolside/laguna-m.1:free |
262,144 |
google/gemma-4-26b-a4b-it:free |
262,144 |
google/gemma-4-31b-it:free |
262,144 |
qwen/qwen3-next-80b-a3b-instruct:free |
262,144 |
cohere/north-mini-code:free |
256,000 |
nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free |
256,000 |
nvidia/nemotron-3-nano-30b-a3b:free |
256,000 |
openai/gpt-oss-120b:free |
131,072 |
openai/gpt-oss-20b:free |
131,072 |
meta-llama/llama-3.3-70b-instruct:free |
131,072 |
meta-llama/llama-3.2-3b-instruct:free |
131,072 |
nousresearch/hermes-3-llama-3.1-405b:free |
131,072 |
nvidia/nemotron-3.5-content-safety:free |
128,000 |
nvidia/nemotron-nano-12b-v2-vl:free |
128,000 |
nvidia/nemotron-nano-9b-v2:free |
128,000 |
liquid/lfm-2.5-1.2b-thinking:free |
32,768 |
liquid/lfm-2.5-1.2b-instruct:free |
32,768 |
cognitivecomputations/dolphin-mistral-24b-venice-edition:free |
32,768 |
openrouter/owl-alpha |
1,048,756 |
google/lyria-3-pro-preview |
1,048,576 |
google/lyria-3-clip-preview |
1,048,576 |
openrouter/free |
200,000 |
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env # fill in OPENROUTER_API_KEYfreerouter # or: python -m freerouter
# defaults to http://127.0.0.1:8000from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:8000/v1", api_key="unused")
resp = client.chat.completions.create(
model="auto", # freerouter picks from the free pool automatically
messages=[{"role": "user", "content": "hello"}],
)
print(resp.choices[0].message.content)curl:
curl http://127.0.0.1:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"auto","messages":[{"role":"user","content":"hi"}]}'You can embed freerouter into another agent program as a submodule without running an
HTTP server. FreeRouterClient fetches the free-model list over REST (TTL-cached) and routes
with fallback in-process.
import asyncio
from freerouter import FreeRouterClient
async def main():
async with FreeRouterClient() as fr: # uses OPENROUTER_API_KEY from .env
free = await fr.models() # routable free models
data = await fr.chat(
[{"role": "user", "content": "hello"}],
model="auto", # auto = pick from the free pool + fallback
max_tokens=64,
)
print(data["model"], data["choices"][0]["message"]["content"])
asyncio.run(main())In synchronous code (outside an event loop), use the sync wrapper:
from freerouter import FreeRouterClient
fr = FreeRouterClient(api_key="sk-or-...") # passing the key directly also works
data = fr.chat_sync([{"role": "user", "content": "hi"}], model="auto")- Share an httpx client: if the agent already uses an
httpx.AsyncClient, inject it —FreeRouterClient(http_client=my_client). When injected, the caller owns its lifecycle. - Share state: inject
registry/routerso several clients share the free-list cache and cooldowns. - Streaming:
async for chunk in fr.stream_raw(payload)— raw SSE bytes. - A non-retryable 4xx (e.g. a bad request) raises
httpx.HTTPStatusError; all-candidates-failed raisesFreeRouterError.
The same routing/fallback core is shared by the proxy server (
proxy.py) and the library.
| Method | Path | Description |
|---|---|---|
| POST | /v1/chat/completions |
OpenAI-compatible. Supports streaming ("stream": true). |
| GET | /v1/models |
Exposes only routable free models. |
| GET | /health |
Health check. |
Controlled via .env or environment variables. See src/freerouter/config.py for the full list.
| Key | Default | Description |
|---|---|---|
OPENROUTER_API_KEY |
(required) | OpenRouter key |
MODEL_REFRESH_TTL |
600 | free-list cache TTL (seconds) |
MAX_ATTEMPTS |
8 | max number of models to try as fallbacks |
COOLDOWN_SECONDS |
60 | how long to skip a model after a 429 (seconds) |
pytestMIT — see LICENSE.