Multi-instance llama.cpp orchestration — GPU pinning, heterogeneous pools, round-robin routing — one dashboard & API.
nodejs self-hosted nvidia multi-model inference-server homelab multi-gpu rocm load-balancing model-serving llm llama-cpp llm-inference local-ai gguf llama-server llm-orchestration openai-compatible gpu-pinning llamafleet
-
Updated
Apr 28, 2026 - JavaScript