KV-cache-aware intelligent routing for self-hosted and hybrid LLM fleets. Route requests using model quality, latency, cost, policy, and live GPU state.
multi-armed-bandit routing-controllers mlops fleets kv-cache openai-api pii-redaction llm llmops vllm ai-gateway semantic-routing llm-gateway llm-routing self-hosted-llm
-
Updated
Mar 20, 2026 - Python