-
Notifications
You must be signed in to change notification settings - Fork 11.2k
Insights: ggml-org/llama.cpp
Overview
Could not load contribution data
Please try again later
14 Releases published by 1 person
-
b4978
published
Mar 27, 2025 -
b4980
published
Mar 27, 2025 -
b4981
published
Mar 28, 2025 -
b4982
published
Mar 28, 2025 -
b4984
published
Mar 28, 2025 -
b4985
published
Mar 28, 2025 -
b4986
published
Mar 28, 2025 -
b4987
published
Mar 28, 2025 -
b4988
published
Mar 28, 2025 -
b4990
published
Mar 29, 2025 -
b4991
published
Mar 29, 2025 -
b4992
published
Mar 29, 2025 -
b4997
published
Mar 30, 2025 -
b4998
published
Mar 30, 2025
17 Pull requests merged by 14 people
-
musa: fix all warnings, re-enable
-DLLAMA_FATAL_WARNINGS=ON
in ci and update doc#12611 merged
Mar 30, 2025 -
sync : ggml
#12645 merged
Mar 30, 2025 -
llama : fix non-causal mask for gemma 3
#12615 merged
Mar 29, 2025 -
change cpu_buft_list order: ACCEL -> GPU host -> CPU extra -> CPU
#12632 merged
Mar 29, 2025 -
cmake: fix ccache conflict
#12522 merged
Mar 29, 2025 -
[CANN]: remove clang-format in ggml-cann
#12607 merged
Mar 29, 2025 -
llama : fix incorrect Qwen2Moe ffn_moe_out graph callback
#12631 merged
Mar 28, 2025 -
metal : improve FA + improve MoE
#12612 merged
Mar 28, 2025 -
vulkan: fix coopmat shader generation when cross-compiling
#12272 merged
Mar 28, 2025 -
llama: fix error on bad grammar
#12628 merged
Mar 28, 2025 -
Include speculative decoding stats when timings_per_token is enabled
#12603 merged
Mar 28, 2025 -
rpc : update README for cache usage
#12620 merged
Mar 28, 2025 -
llamafile : ppc64le GEMV forwarding for FP32.
#12594 merged
Mar 28, 2025 -
rpc : send hash when tensor data is above some fixed threshold
#12496 merged
Mar 28, 2025 -
server : Support listening on a unix socket
#12613 merged
Mar 27, 2025 -
media : add SVG logo [no ci]
#12616 merged
Mar 27, 2025 -
opencl: add multi and vision rope,
gelu_quick
andim2col
#12600 merged
Mar 27, 2025
12 Pull requests opened by 9 people
-
Add Yandex instruct model template support
#12621 opened
Mar 28, 2025 -
opencl: Add support for multiple devices
#12622 opened
Mar 28, 2025 -
sycl: allow ggml-sycl configuration and compilation using Visual Studio project/solution
#12625 opened
Mar 28, 2025 -
opencl: remove a self-referential macro
#12626 opened
Mar 28, 2025 -
vulkan: Implement split_k for coopmat2 flash attention.
#12627 opened
Mar 28, 2025 -
vulkan: Hybrid waitForFences/getFenceStatus to reduce fence latency
#12630 opened
Mar 28, 2025 -
llama : support BailingMoE (Ling)
#12634 opened
Mar 28, 2025 -
llama-server : implement universal assisted decoding
#12635 opened
Mar 28, 2025 -
llama-tts refactor console output
#12640 opened
Mar 29, 2025 -
tts : implement sesame CSM + Mimi decoder
#12648 opened
Mar 29, 2025 -
opencl : fix memory allocation size
#12649 opened
Mar 30, 2025 -
llama : nit, DeepSeek V1 MoE is 16B and GigaChat is 20B
#12652 opened
Mar 30, 2025
22 Issues closed by 10 people
-
Eval bug: convert_hf_to_gguf.py Can not map tensor 'model.layers.0.mlp.down_proj.weight_scale_inv'
#12644 closed
Mar 30, 2025 -
Misc. bug: convert_hf_to_gguf.py Can not map tensor 'model.layers.0.mlp.down_proj.weight_scale_inv'
#12650 closed
Mar 30, 2025 -
Misc. bug: examples/gguf/gguf.cpp always fails with data check
#12647 closed
Mar 30, 2025 -
Compile bug: parameter packs not expanded with ‘...’:
#11112 closed
Mar 30, 2025 -
Misc. bug: llama-server `--ctx-size` is divided by `--parallel` and cannot be increased?
#11681 closed
Mar 30, 2025 -
When Running deepseek-r1-dynamic-1.58-bit,the KV cache question
#11757 closed
Mar 30, 2025 -
Misc. bug: CUDA error: CUDA-capable device(s) is/are busy or unavailable from `cudaSetDevice(device)`
#11841 closed
Mar 30, 2025 -
Eval bug: Gemma3 <unused32> spam
#12433 closed
Mar 29, 2025 -
Misc. bug: llama-server does not print model loading errors by default (log level misconfigured?)
#11819 closed
Mar 29, 2025 -
Misc. bug: llama-cli crash on ubuntu with GGML-VULKAN=ON
#11823 closed
Mar 29, 2025 -
Misc. bug: Quantization process 100 times slower on Windows (dockerized)
#11825 closed
Mar 29, 2025 -
[BENCHMARKS] DeepScaleR-1.5B-Preview F16 ollama GGUF vs llama.cpp
#11828 closed
Mar 29, 2025 -
Feature Request: Direct way to check the status of the abort mechanism.
#12525 closed
Mar 28, 2025 -
Feature Request: RPC offloading using a local model copy
#10095 closed
Mar 28, 2025 -
why assert(!isnan(wp[i])) in softmax_forward function
#12542 closed
Mar 28, 2025 -
llama.cpp didn’t use GPU to accelerate inference for gguf file.
#12614 closed
Mar 28, 2025 -
Misc. bug: Virus detected
#10768 closed
Mar 28, 2025 -
Eval bug: [CANN] inference not use NPU
#11799 closed
Mar 28, 2025 -
Urgent Help Needed! Problems Encountered in Hybrid Inference Function Verification Based on llama.cpp
#11805 closed
Mar 28, 2025 -
Eval bug: Incorrect n_gpu_layer settings for MoE models
#12596 closed
Mar 27, 2025
12 Issues opened by 10 people
-
Misc. bug: rpc - Flash Attention Failure in Metal/CUDA RPC Mixed Environment
#12655 opened
Mar 30, 2025 -
Feature Request: Splitting layers according to VRAM usage on multi GPUs setups
#12654 opened
Mar 30, 2025 -
Feature Request: support DeepSeek-V3's "Scaled ReLU or SwiGLU activation functions"
#12653 opened
Mar 30, 2025 -
Feature Request: convert_hf_to_gguf.py to support model type Qwen2_5_VLForConditionalGeneration
#12642 opened
Mar 29, 2025 -
Feature Request: Add support of convert.py for model Qwen2.5-Omni-7B
#12641 opened
Mar 29, 2025 -
Compile bug: there is a build bug in examples/llama.android and it will brings build failure in CI
#12638 opened
Mar 29, 2025 -
Feature Request: Interleaved sliding window attention support for gemma 2 and 3
#12637 opened
Mar 29, 2025 -
Misc. bug: HIP when using llama.bench and kv cache quant cpu is doing the work instead of gpu
#12624 opened
Mar 28, 2025 -
Misc. bug:
#12623 opened
Mar 28, 2025 -
Compile bug: There was a errror while compiling support for the backend Vulkan
#12619 opened
Mar 28, 2025 -
Misc. bug: Data check in examples/gguf
#12617 opened
Mar 27, 2025
44 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on
Mar 30, 2025 • 10 new comments -
Vulkan: Add DP4A MMQ and Q8_1 quantization shader
#12135 commented on
Mar 29, 2025 • 7 new comments -
perplexity: Add option to ignore context window overflow errors and continue score calculation
#12512 commented on
Mar 30, 2025 • 2 new comments -
SYCL: Remove misleading ggml_sycl_op_flatten function
#12387 commented on
Mar 28, 2025 • 2 new comments -
Feature Request: resize an existing context
#11577 commented on
Mar 30, 2025 • 0 new comments -
Feature Request: NUMA-aware MoE Expert Allocation for Improved Performanc
#11333 commented on
Mar 30, 2025 • 0 new comments -
"CPU_AARCH64 model buffer" appears when not using AARCH64
#11204 commented on
Mar 30, 2025 • 0 new comments -
Compile bug: iOS version able to build not not able to run
#10922 commented on
Mar 30, 2025 • 0 new comments -
csm : implement Sesame-based conversation example
#12392 commented on
Mar 30, 2025 • 0 new comments -
Feature Request: Qwen 2.5 VL
#11483 commented on
Mar 30, 2025 • 0 new comments -
Simplify and improve CUDA graphs through use of indirect copy pointers
#9017 commented on
Mar 29, 2025 • 0 new comments -
add FP8 support to gguf/llama:
#10055 commented on
Mar 29, 2025 • 0 new comments -
llama : add option to override model tensor buffers
#11397 commented on
Mar 27, 2025 • 0 new comments -
llama : add llama_batch_ext
#11875 commented on
Mar 27, 2025 • 0 new comments -
SYCL: Rename oneMKL to oneMath
#12192 commented on
Mar 28, 2025 • 0 new comments -
`server`: streaming of tool calls and thoughts when `--jinja` is on
#12379 commented on
Mar 28, 2025 • 0 new comments -
[WIP] MUSA: enable fastfp16, correct warp reduce impl and perf tuning
#12383 commented on
Mar 30, 2025 • 0 new comments -
ci: add Linux cross-compile build
#12428 commented on
Mar 28, 2025 • 0 new comments -
Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture
#12466 commented on
Mar 28, 2025 • 0 new comments -
(draft) tts: Orpheus support
#12487 commented on
Mar 28, 2025 • 0 new comments -
quantize: Handle user-defined quantization levels for additional tensors
#12511 commented on
Mar 30, 2025 • 0 new comments -
ggml-quants : weighted rounding algorithms with cumulative search
#12557 commented on
Mar 30, 2025 • 0 new comments -
ggml : add ANE backend
#10453 commented on
Mar 27, 2025 • 0 new comments -
Misc. bug: Crashing, forcing BMI2 on non BMI2 CPUs
#12500 commented on
Mar 27, 2025 • 0 new comments -
Misc. bug: ggml-backend.cpp:746: pre-allocated tensor (cache_k_l0 (view) (copy of cache_k_l0 (view))) in a buffer (Vulkan0) that cannot run the operation (CPY)
#12045 commented on
Mar 28, 2025 • 0 new comments -
Compile bug: Build failure on VirtualBox: ggml-cpu-aarch64.cpp invalid conversion error
#11783 commented on
Mar 28, 2025 • 0 new comments -
Eval bug: got exception: {"code":500,"message":"Unsupported param: echo","type":"server_error"}
#12591 commented on
Mar 28, 2025 • 0 new comments -
Compile bug: SYCL backend build fail on debug config
#12602 commented on
Mar 28, 2025 • 0 new comments -
Eval bug: Phi-4 mini in iOS with xcframework
#12232 commented on
Mar 28, 2025 • 0 new comments -
Compile bug: How to compile llama.cpp with Vulkan for android device
#11695 commented on
Mar 29, 2025 • 0 new comments -
Misc. bug: Loop range computation question of Vulkan matmul shaders
#12082 commented on
Mar 29, 2025 • 0 new comments -
Eval bug: MUSA error: operation not supported
#12077 commented on
Mar 29, 2025 • 0 new comments -
Misc. bug: llama-cli llama_backend_free may not free all the gpu memory
#12057 commented on
Mar 29, 2025 • 0 new comments -
Eval bug: TikTokenTokenizer has no attribute vocab
#12044 commented on
Mar 29, 2025 • 0 new comments -
Enhancement: Improve ROCm performance on various quants (benchmarks included)
#11931 commented on
Mar 29, 2025 • 0 new comments -
Eval bug: rpc backend surport cpu?
#11807 commented on
Mar 29, 2025 • 0 new comments -
Eval bug: llama.cpp CPU bound while inferencing against DeepSeek-R1 GGUF
#11635 commented on
Mar 29, 2025 • 0 new comments -
Feature Request: Support Codestral Mamba
#8519 commented on
Mar 29, 2025 • 0 new comments -
Move gguf fuzzers to the llama.cpp repository
#11514 commented on
Mar 29, 2025 • 0 new comments -
kubernetes example
#6546 commented on
Mar 29, 2025 • 0 new comments -
Feature request: Graphical GGUF viewer
#6715 commented on
Mar 29, 2025 • 0 new comments -
tts : add support for Orpheus
#12476 commented on
Mar 29, 2025 • 0 new comments -
Compile bug: Failed to compile on centos8 system
#12092 commented on
Mar 30, 2025 • 0 new comments -
Eval bug: granite-vision-3.1-2b-preview ERROR:hf-to-gguf:Model LlavaNextForConditionalGeneration is not supported
#12053 commented on
Mar 30, 2025 • 0 new comments