Skip to content

v3.19.0

Latest

Choose a tag to compare

@github-actions github-actions released this 30 Jun 01:30
f53ea51

Gemma 4 is here!

Read about the release in the blog post


3.19.0 (2026-06-30)

Features

  • Gemma 4 support (#591) (5fe6e27) (documentation: Gemma 4)
  • riscv64 prebuilt binaries (#615) (e8336a4)
  • automatically enable flash attention when optimal
  • improve inference performance when a grammar is active
  • more precise resource usage estimation
  • resource usage capping (documentation: Resource Capping)
  • automatically enable or disable mmap depending on the environment
  • support Q1_0 quant
  • improve stability on unified memory systems
  • disable residency sets on macOS by default for better OS responsiveness
  • default progressLogs to "stderr" to avoid polluting stdout with logs
  • optimized prebuilt binaries for arm architectures

Bug Fixes

  • MXFP4_MOE quant name
  • Vulkan backend successful load detection even when no devices are available
  • CLI: avoid redownloading existing models that consist of multiple parts from a URI
  • optimize checkpoints management when using grammar
  • improve stability when loading huge models
  • reranking result range for Qwen 3 reranker
  • adapt to breaking llama.cpp changes

Shipped with llama.cpp release b9842

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)