Run multiple llama.cpp instances in parallel, each pinned to its own GPUs, from one browser dashboard.
nodejs self-hosted nvidia multi-model inference-server homelab multi-gpu rocm load-balancing model-serving llm llama-cpp llm-inference local-ai gguf llama-server llm-orchestration openai-compatible gpu-pinning llamafleet
-
Updated
Apr 28, 2026 - JavaScript