Problem
On Linux x64 machines without a working CUDA toolkit (e.g. AMD GPU boxes), QMD attempts to use CUDA first, fails with a CMake build error, then falls back to CPU — even when Vulkan is perfectly functional.
Root cause
src/llm.ts:
const gpuTypes = await getLlamaGpuTypes();
const preferred = (["cuda", "metal", "vulkan"] as const).find(g => gpuTypes.includes(g));
getLlamaGpuTypes() without an argument returns a static platform-capability list, not actual runtime availability. On Linux x64 it always includes "cuda" regardless of whether NVIDIA hardware or a CUDA toolkit is present. The prioritized .find() therefore picks "cuda" on every Linux x64 system — including AMD-only boxes — and initialization fails before the Vulkan fallback is ever tried.
Evidence
On my machine (AMD Radeon RX 7900 XTX, no NVIDIA, no CUDA toolkit):
getLlamaGpuTypes() = ["cuda", "vulkan", false] ← false positive
getLlamaGpuTypes("supported") = ["vulkan", false] ← correct
With the current code, every qmd invocation runs a failed cmake attempt before CPU fallback. With gpu: "vulkan" forced directly on the same machine, the embedding model loads cleanly to VRAM (all 25 layers offloaded, +323 MB VRAM usage).
Suggested fix
Pass the "supported" argument — node-llama-cpp already exposes it for exactly this purpose:
- const gpuTypes = await getLlamaGpuTypes();
+ const gpuTypes = await getLlamaGpuTypes("supported");
This also avoids the cmake spam that #426, #412, and others report on CPU-only systems, since CUDA is never attempted when unsupported.
Environment
- QMD 1.0.6 (1254b3c)
- node-llama-cpp 3.15.1
- Bun 1.3.9 runtime (wrapper uses
bun run src/qmd.ts)
- Arch Linux x86_64, AMD Radeon RX 7900 XTX, Vulkan 1.4.341
- No NVIDIA, no CUDA toolkit installed
Related
Problem
On Linux x64 machines without a working CUDA toolkit (e.g. AMD GPU boxes), QMD attempts to use CUDA first, fails with a CMake build error, then falls back to CPU — even when Vulkan is perfectly functional.
Root cause
src/llm.ts:getLlamaGpuTypes()without an argument returns a static platform-capability list, not actual runtime availability. On Linux x64 it always includes"cuda"regardless of whether NVIDIA hardware or a CUDA toolkit is present. The prioritized.find()therefore picks"cuda"on every Linux x64 system — including AMD-only boxes — and initialization fails before the Vulkan fallback is ever tried.Evidence
On my machine (AMD Radeon RX 7900 XTX, no NVIDIA, no CUDA toolkit):
With the current code, every
qmdinvocation runs a failedcmakeattempt before CPU fallback. Withgpu: "vulkan"forced directly on the same machine, the embedding model loads cleanly to VRAM (all 25 layers offloaded, +323 MB VRAM usage).Suggested fix
Pass the
"supported"argument —node-llama-cppalready exposes it for exactly this purpose:This also avoids the cmake spam that #426, #412, and others report on CPU-only systems, since CUDA is never attempted when unsupported.
Environment
bun run src/qmd.ts)Related
NODE_LLAMA_CPP_GPU=falseignored (same root cause: capability list treated as availability)