Skip to content

unohee/freerouter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

freerouter

CI License: MIT

An OpenAI-compatible proxy that gathers only OpenRouter's free model endpoints and routes/falls back across them automatically.

Point any existing OpenAI SDK/client at freerouter, and it picks a model from OpenRouter's free pool (pricing.prompt == 0 && pricing.completion == 0), then falls back to the next free model automatically on a rate limit (429) or a transient error.

How it works

  1. On startup, fetch OpenRouter /api/v1/models, keep only free models, and cache them (TTL 600s by default).
  2. On POST /v1/chat/completions, the router builds a candidate order.
    • A specific free model goes first; otherwise (auto/empty/paid/unknown) the free-pool priority is used (:free-suffixed first, then larger context).
    • A model that just returned 429 is pushed back during its cooldown (60s by default).
  3. Try candidates in order, falling back on 429/402/404/5xx. If all fail, return 502.
  4. The response header X-freerouter-Model carries the model actually used.

Free models

The routable pool is OpenRouter's free, chat-capable models. The list below is generated from the live catalog — run python scripts/update_models.py to refresh it locally, and CI refreshes it automatically once a week.

26 free, chat-capable models (prompt + completion priced at $0). Auto-generated by scripts/update_models.py.

Model ID Context
qwen/qwen3-coder:free 1,048,576
nvidia/nemotron-3-ultra-550b-a55b:free 1,000,000
nvidia/nemotron-3-super-120b-a12b:free 1,000,000
poolside/laguna-xs.2:free 262,144
poolside/laguna-m.1:free 262,144
google/gemma-4-26b-a4b-it:free 262,144
google/gemma-4-31b-it:free 262,144
qwen/qwen3-next-80b-a3b-instruct:free 262,144
cohere/north-mini-code:free 256,000
nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free 256,000
nvidia/nemotron-3-nano-30b-a3b:free 256,000
openai/gpt-oss-120b:free 131,072
openai/gpt-oss-20b:free 131,072
meta-llama/llama-3.3-70b-instruct:free 131,072
meta-llama/llama-3.2-3b-instruct:free 131,072
nousresearch/hermes-3-llama-3.1-405b:free 131,072
nvidia/nemotron-3.5-content-safety:free 128,000
nvidia/nemotron-nano-12b-v2-vl:free 128,000
nvidia/nemotron-nano-9b-v2:free 128,000
liquid/lfm-2.5-1.2b-thinking:free 32,768
liquid/lfm-2.5-1.2b-instruct:free 32,768
cognitivecomputations/dolphin-mistral-24b-venice-edition:free 32,768
openrouter/owl-alpha 1,048,756
google/lyria-3-pro-preview 1,048,576
google/lyria-3-clip-preview 1,048,576
openrouter/free 200,000

Install

python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env   # fill in OPENROUTER_API_KEY

Run

freerouter            # or: python -m freerouter
# defaults to http://127.0.0.1:8000

Usage (with the OpenAI SDK as-is)

from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:8000/v1", api_key="unused")

resp = client.chat.completions.create(
    model="auto",  # freerouter picks from the free pool automatically
    messages=[{"role": "user", "content": "hello"}],
)
print(resp.choices[0].message.content)

curl:

curl http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"auto","messages":[{"role":"user","content":"hi"}]}'

Use as a library (no server)

You can embed freerouter into another agent program as a submodule without running an HTTP server. FreeRouterClient fetches the free-model list over REST (TTL-cached) and routes with fallback in-process.

import asyncio
from freerouter import FreeRouterClient

async def main():
    async with FreeRouterClient() as fr:           # uses OPENROUTER_API_KEY from .env
        free = await fr.models()                    # routable free models
        data = await fr.chat(
            [{"role": "user", "content": "hello"}],
            model="auto",                           # auto = pick from the free pool + fallback
            max_tokens=64,
        )
        print(data["model"], data["choices"][0]["message"]["content"])

asyncio.run(main())

In synchronous code (outside an event loop), use the sync wrapper:

from freerouter import FreeRouterClient

fr = FreeRouterClient(api_key="sk-or-...")          # passing the key directly also works
data = fr.chat_sync([{"role": "user", "content": "hi"}], model="auto")
  • Share an httpx client: if the agent already uses an httpx.AsyncClient, inject it — FreeRouterClient(http_client=my_client). When injected, the caller owns its lifecycle.
  • Share state: inject registry/router so several clients share the free-list cache and cooldowns.
  • Streaming: async for chunk in fr.stream_raw(payload) — raw SSE bytes.
  • A non-retryable 4xx (e.g. a bad request) raises httpx.HTTPStatusError; all-candidates-failed raises FreeRouterError.

The same routing/fallback core is shared by the proxy server (proxy.py) and the library.

Endpoints

Method Path Description
POST /v1/chat/completions OpenAI-compatible. Supports streaming ("stream": true).
GET /v1/models Exposes only routable free models.
GET /health Health check.

Configuration

Controlled via .env or environment variables. See src/freerouter/config.py for the full list.

Key Default Description
OPENROUTER_API_KEY (required) OpenRouter key
MODEL_REFRESH_TTL 600 free-list cache TTL (seconds)
MAX_ATTEMPTS 8 max number of models to try as fallbacks
COOLDOWN_SECONDS 60 how long to skip a model after a 429 (seconds)

Tests

pytest

License

MIT — see LICENSE.

About

OpenAI-compatible proxy & library that auto-routes across OpenRouter's free models, with fallback on rate limits. Run it as a server or import it as a Python submodule.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages