SageSched: Intelligent LLM Request Scheduler with Workload Prediction — QoS-aware dual-queue scheduling for black-box LLM APIs (OpenAI/Azure/Doubao/Gemini)
api-gateway scheduler load-balancer openai qos faiss fastapi workload-prediction llm llm-inference llm-proxy gittins-index
-
Updated
May 18, 2026 - Python