Pulse · ggml-org/llama.cpp · GitHub

March 27, 2025 – March 30, 2025

Overview

29 Active pull requests

34 Active issues

Could not load contribution data

Please try again later

14 Releases published by 1 person

b4978
published Mar 27, 2025
b4980
published Mar 27, 2025
b4981
published Mar 28, 2025
b4982
published Mar 28, 2025
b4984
published Mar 28, 2025
b4985
published Mar 28, 2025
b4986
published Mar 28, 2025
b4987
published Mar 28, 2025
b4988
published Mar 28, 2025
b4990
published Mar 29, 2025
b4991
published Mar 29, 2025
b4992
published Mar 29, 2025
b4997
published Mar 30, 2025
b4998
published Mar 30, 2025

17 Pull requests merged by 14 people

musa: fix all warnings, re-enable -DLLAMA_FATAL_WARNINGS=ON in ci and update doc
#12611 merged Mar 30, 2025
sync : ggml
#12645 merged Mar 30, 2025
llama : fix non-causal mask for gemma 3
#12615 merged Mar 29, 2025
change cpu_buft_list order: ACCEL -> GPU host -> CPU extra -> CPU
#12632 merged Mar 29, 2025
cmake: fix ccache conflict
#12522 merged Mar 29, 2025
[CANN]: remove clang-format in ggml-cann
#12607 merged Mar 29, 2025
llama : fix incorrect Qwen2Moe ffn_moe_out graph callback
#12631 merged Mar 28, 2025
metal : improve FA + improve MoE
#12612 merged Mar 28, 2025
vulkan: fix coopmat shader generation when cross-compiling
#12272 merged Mar 28, 2025
llama: fix error on bad grammar
#12628 merged Mar 28, 2025
Include speculative decoding stats when timings_per_token is enabled
#12603 merged Mar 28, 2025
rpc : update README for cache usage
#12620 merged Mar 28, 2025
llamafile : ppc64le GEMV forwarding for FP32.
#12594 merged Mar 28, 2025
rpc : send hash when tensor data is above some fixed threshold
#12496 merged Mar 28, 2025
server : Support listening on a unix socket
#12613 merged Mar 27, 2025
media : add SVG logo [no ci]
#12616 merged Mar 27, 2025
opencl: add multi and vision rope, gelu_quick and im2col
#12600 merged Mar 27, 2025

12 Pull requests opened by 9 people

Add Yandex instruct model template support
#12621 opened Mar 28, 2025
opencl: Add support for multiple devices
#12622 opened Mar 28, 2025
sycl: allow ggml-sycl configuration and compilation using Visual Studio project/solution
#12625 opened Mar 28, 2025
opencl: remove a self-referential macro
#12626 opened Mar 28, 2025
vulkan: Implement split_k for coopmat2 flash attention.
#12627 opened Mar 28, 2025
vulkan: Hybrid waitForFences/getFenceStatus to reduce fence latency
#12630 opened Mar 28, 2025
llama : support BailingMoE (Ling)
#12634 opened Mar 28, 2025
llama-server : implement universal assisted decoding
#12635 opened Mar 28, 2025
llama-tts refactor console output
#12640 opened Mar 29, 2025
tts : implement sesame CSM + Mimi decoder
#12648 opened Mar 29, 2025
opencl : fix memory allocation size
#12649 opened Mar 30, 2025
llama : nit, DeepSeek V1 MoE is 16B and GigaChat is 20B
#12652 opened Mar 30, 2025

22 Issues closed by 10 people

Eval bug: convert_hf_to_gguf.py Can not map tensor 'model.layers.0.mlp.down_proj.weight_scale_inv'
#12644 closed Mar 30, 2025
Misc. bug: convert_hf_to_gguf.py Can not map tensor 'model.layers.0.mlp.down_proj.weight_scale_inv'
#12650 closed Mar 30, 2025
Misc. bug: examples/gguf/gguf.cpp always fails with data check
#12647 closed Mar 30, 2025
Misc. bug: convert_hf_to_gguf failed !! ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.weight_scale_inv'
#12341 closed Mar 30, 2025
Compile bug: parameter packs not expanded with ‘...’:
#11112 closed Mar 30, 2025
Misc. bug: llama-server `--ctx-size` is divided by `--parallel` and cannot be increased?
#11681 closed Mar 30, 2025
When Running deepseek-r1-dynamic-1.58-bit，the KV cache question
#11757 closed Mar 30, 2025
cudaErrorIllegalAddress (error 700) due to "an illegal memory access was encountered" on CUDA API call to cudaDeviceSynchronize.
#11829 closed Mar 30, 2025
Misc. bug: CUDA error: CUDA-capable device(s) is/are busy or unavailable from `cudaSetDevice(device)`
#11841 closed Mar 30, 2025
Eval bug: Gemma3 <unused32> spam
#12433 closed Mar 29, 2025
Misc. bug: llama-server does not print model loading errors by default (log level misconfigured?)
#11819 closed Mar 29, 2025
Misc. bug: llama-cli crash on ubuntu with GGML-VULKAN=ON
#11823 closed Mar 29, 2025
Misc. bug: Quantization process 100 times slower on Windows (dockerized)
#11825 closed Mar 29, 2025
[BENCHMARKS] DeepScaleR-1.5B-Preview F16 ollama GGUF vs llama.cpp
#11828 closed Mar 29, 2025
Feature Request: Direct way to check the status of the abort mechanism.
#12525 closed Mar 28, 2025
Feature Request: RPC offloading using a local model copy
#10095 closed Mar 28, 2025
why assert(!isnan(wp[i])) in softmax_forward function
#12542 closed Mar 28, 2025
llama.cpp didn’t use GPU to accelerate inference for gguf file.
#12614 closed Mar 28, 2025
Misc. bug: Virus detected
#10768 closed Mar 28, 2025
Eval bug: [CANN] inference not use NPU
#11799 closed Mar 28, 2025
Urgent Help Needed! Problems Encountered in Hybrid Inference Function Verification Based on llama.cpp
#11805 closed Mar 28, 2025
Eval bug: Incorrect n_gpu_layer settings for MoE models
#12596 closed Mar 27, 2025

12 Issues opened by 10 people

Misc. bug: rpc - Flash Attention Failure in Metal/CUDA RPC Mixed Environment
#12655 opened Mar 30, 2025
Feature Request: Splitting layers according to VRAM usage on multi GPUs setups
#12654 opened Mar 30, 2025
Feature Request: support DeepSeek-V3's "Scaled ReLU or SwiGLU activation functions"
#12653 opened Mar 30, 2025
Eval bug: Unusual high RAM usage on Windows when running DeepSeek V3 Q2_K_XL/IQ2_XXS, on Hybrid CPU+GPU (vs Linux).
#12651 opened Mar 30, 2025
Feature Request: convert_hf_to_gguf.py to support model type Qwen2_5_VLForConditionalGeneration
#12642 opened Mar 29, 2025
Feature Request: Add support of convert.py for model Qwen2.5-Omni-7B
#12641 opened Mar 29, 2025
Compile bug: there is a build bug in examples/llama.android and it will brings build failure in CI
#12638 opened Mar 29, 2025
Feature Request: Interleaved sliding window attention support for gemma 2 and 3
#12637 opened Mar 29, 2025
Misc. bug: HIP when using llama.bench and kv cache quant cpu is doing the work instead of gpu
#12624 opened Mar 28, 2025
Misc. bug:
#12623 opened Mar 28, 2025
Compile bug: There was a errror while compiling support for the backend Vulkan
#12619 opened Mar 28, 2025
Misc. bug: Data check in examples/gguf
#12617 opened Mar 27, 2025

44 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on Mar 30, 2025 • 10 new comments
Vulkan: Add DP4A MMQ and Q8_1 quantization shader
#12135 commented on Mar 29, 2025 • 7 new comments
perplexity: Add option to ignore context window overflow errors and continue score calculation
#12512 commented on Mar 30, 2025 • 2 new comments
SYCL: Remove misleading ggml_sycl_op_flatten function
#12387 commented on Mar 28, 2025 • 2 new comments
Feature Request: resize an existing context
#11577 commented on Mar 30, 2025 • 0 new comments
Feature Request: NUMA-aware MoE Expert Allocation for Improved Performanc
#11333 commented on Mar 30, 2025 • 0 new comments
"CPU_AARCH64 model buffer" appears when not using AARCH64
#11204 commented on Mar 30, 2025 • 0 new comments
Compile bug: iOS version able to build not not able to run
#10922 commented on Mar 30, 2025 • 0 new comments
csm : implement Sesame-based conversation example
#12392 commented on Mar 30, 2025 • 0 new comments
Feature Request: Qwen 2.5 VL
#11483 commented on Mar 30, 2025 • 0 new comments
Simplify and improve CUDA graphs through use of indirect copy pointers
#9017 commented on Mar 29, 2025 • 0 new comments
add FP8 support to gguf/llama:
#10055 commented on Mar 29, 2025 • 0 new comments
llama : add option to override model tensor buffers
#11397 commented on Mar 27, 2025 • 0 new comments
llama : add llama_batch_ext
#11875 commented on Mar 27, 2025 • 0 new comments
SYCL: Rename oneMKL to oneMath
#12192 commented on Mar 28, 2025 • 0 new comments
`server`: streaming of tool calls and thoughts when `--jinja` is on
#12379 commented on Mar 28, 2025 • 0 new comments
[WIP] MUSA: enable fastfp16, correct warp reduce impl and perf tuning
#12383 commented on Mar 30, 2025 • 0 new comments
ci: add Linux cross-compile build
#12428 commented on Mar 28, 2025 • 0 new comments
Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture
#12466 commented on Mar 28, 2025 • 0 new comments
(draft) tts: Orpheus support
#12487 commented on Mar 28, 2025 • 0 new comments
quantize: Handle user-defined quantization levels for additional tensors
#12511 commented on Mar 30, 2025 • 0 new comments
ggml-quants : weighted rounding algorithms with cumulative search
#12557 commented on Mar 30, 2025 • 0 new comments
ggml : add ANE backend
#10453 commented on Mar 27, 2025 • 0 new comments
Misc. bug: Crashing, forcing BMI2 on non BMI2 CPUs
#12500 commented on Mar 27, 2025 • 0 new comments
Misc. bug: ggml-backend.cpp:746: pre-allocated tensor (cache_k_l0 (view) (copy of cache_k_l0 (view))) in a buffer (Vulkan0) that cannot run the operation (CPY)
#12045 commented on Mar 28, 2025 • 0 new comments
Compile bug: Build failure on VirtualBox: ggml-cpu-aarch64.cpp invalid conversion error
#11783 commented on Mar 28, 2025 • 0 new comments
Eval bug: got exception: {"code":500,"message":"Unsupported param: echo","type":"server_error"}
#12591 commented on Mar 28, 2025 • 0 new comments
Compile bug: SYCL backend build fail on debug config
#12602 commented on Mar 28, 2025 • 0 new comments
Eval bug: Phi-4 mini in iOS with xcframework
#12232 commented on Mar 28, 2025 • 0 new comments
Compile bug: How to compile llama.cpp with Vulkan for android device
#11695 commented on Mar 29, 2025 • 0 new comments
Misc. bug: Loop range computation question of Vulkan matmul shaders
#12082 commented on Mar 29, 2025 • 0 new comments
Eval bug: MUSA error: operation not supported
#12077 commented on Mar 29, 2025 • 0 new comments
Misc. bug: llama-cli llama_backend_free may not free all the gpu memory
#12057 commented on Mar 29, 2025 • 0 new comments
Eval bug: TikTokenTokenizer has no attribute vocab
#12044 commented on Mar 29, 2025 • 0 new comments
Enhancement: Improve ROCm performance on various quants (benchmarks included)
#11931 commented on Mar 29, 2025 • 0 new comments
Eval bug: rpc backend surport cpu?
#11807 commented on Mar 29, 2025 • 0 new comments
Eval bug: llama.cpp CPU bound while inferencing against DeepSeek-R1 GGUF
#11635 commented on Mar 29, 2025 • 0 new comments
Feature Request: Support Codestral Mamba
#8519 commented on Mar 29, 2025 • 0 new comments
Move gguf fuzzers to the llama.cpp repository
#11514 commented on Mar 29, 2025 • 0 new comments
kubernetes example
#6546 commented on Mar 29, 2025 • 0 new comments
Feature request: Graphical GGUF viewer
#6715 commented on Mar 29, 2025 • 0 new comments
tts : add support for Orpheus
#12476 commented on Mar 29, 2025 • 0 new comments
Compile bug: Failed to compile on centos8 system
#12092 commented on Mar 30, 2025 • 0 new comments
Eval bug: granite-vision-3.1-2b-preview ERROR:hf-to-gguf:Model LlavaNextForConditionalGeneration is not supported
#12053 commented on Mar 30, 2025 • 0 new comments