Stateless LLM runtime that dynamically routes, loads, executes, and unloads models per request with bounded VRAM caching and intelligent model selection.
systems-programming llm generative-ai ai-infrastructure latency-optimization model-routing vram-optimization model-scheduling
-
Updated
Apr 12, 2026 - Rust