v3.19.0

Latest

Latest

github-actions released this 30 Jun 01:30

f53ea51

✨ Gemma 4 is here! ✨

Read about the release in the blog post

3.19.0 (2026-06-30)

Features

Gemma 4 support (#591) (5fe6e27) (documentation: Gemma 4)
riscv64 prebuilt binaries (#615) (e8336a4)
automatically enable flash attention when optimal
improve inference performance when a grammar is active
more precise resource usage estimation
resource usage capping (documentation: Resource Capping)
automatically enable or disable mmap depending on the environment
support Q1_0 quant
improve stability on unified memory systems
disable residency sets on macOS by default for better OS responsiveness
default progressLogs to "stderr" to avoid polluting stdout with logs
optimized prebuilt binaries for arm architectures

Bug Fixes

MXFP4_MOE quant name
Vulkan backend successful load detection even when no devices are available
CLI: avoid redownloading existing models that consist of multiple parts from a URI
optimize checkpoints management when using grammar
improve stability when loading huge models
reranking result range for Qwen 3 reranker
adapt to breaking llama.cpp changes

Shipped with llama.cpp release b9842

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)

Assets 16

0 Join discussion