Semantic model router with parallel LLM classification, prompt caching, and vision short-circuiting. Optimizes request routing with sub-100ms overhead for Open WebUI.
caching machine-learning performance ai async-python request-routing model-optimization llm open-webui semantic-routing
-
Updated
Oct 20, 2025 - Python