-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Insights: ggml-org/llama.cpp
Overview
Could not load contribution data
Please try again later
7 Releases published by 1 person
9 Pull requests merged by 9 people
-
common : refactor downloading system, handle mmproj with -hf option
#12694 merged
Apr 1, 2025 -
opencl : fix memory allocation size
#12649 merged
Apr 1, 2025 -
use LLM_KV instead of gguf_find_key
#12672 merged
Apr 1, 2025 -
convert : BailingMoE : fix qkv split when head_dim is 0
#12687 merged
Apr 1, 2025 -
metal : use F32 prec in FA kernels
#12688 merged
Apr 1, 2025 -
Fix clang warning in gguf_check_reserved_keys
#12686 merged
Apr 1, 2025 -
vulkan: fix build when glslc doesn't support coopmat
#12683 merged
Apr 1, 2025 -
SYCL: Rename oneMKL to oneMath
#12192 merged
Apr 1, 2025 -
SYCL: switch to SYCL namespace
#12674 merged
Apr 1, 2025
6 Pull requests opened by 6 people
-
llama : refactor kv cache guard
#12695 opened
Apr 1, 2025 -
common : remove json.hpp from common.cpp
#12697 opened
Apr 1, 2025 -
[RFC][WIP] Common: Add an Initial Chat Memory Interface/Implementation
#12698 opened
Apr 1, 2025 -
Sync minja patch to support inclusionAI/Ling-lite template and update tests
#12699 opened
Apr 2, 2025 -
opencl: update doc for OpenCL
#12702 opened
Apr 2, 2025 -
fix MUSA compiler warning
#12704 opened
Apr 2, 2025
5 Issues closed by 3 people
-
Misc. bug: runtime repack of Q4_0 quantized models not working on llama cpp server with ARM processor
#12701 closed
Apr 2, 2025 -
Eval bug: very slow inference on DeepSeek-R1-Distill-Qwen-32B
#11361 closed
Apr 2, 2025 -
Misc. bug: Crashed on Intel Mac with AMD GPU.
#11903 closed
Apr 2, 2025 -
Feature Request: Use direct_io for model load and inference
#11912 closed
Apr 2, 2025 -
Eval bug: Command A only outputs 88888888 with -fa
#12441 closed
Apr 1, 2025
5 Issues opened by 4 people
-
Cannot compile SYCL backend SYCL_LIBRARY=SYCL_LIBRARY - NOTFOUND as per documentation
#12696 opened
Apr 1, 2025 -
Compile bug: RISCV cross-compile warnings cause build failure
#12693 opened
Apr 1, 2025 -
Eval bug: Qwerky 72B (rwkv6qwen2) failed to load with `--split-mode row` option
#12692 opened
Apr 1, 2025 -
When will llama.cpp's vulkan provide support for Intel Arc's matrix core?
#12690 opened
Apr 1, 2025 -
Feature Request: Method that counts the number of image tokens in LLAVA_API
#12689 opened
Apr 1, 2025
29 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
ggml-quants : weighted rounding algorithms with cumulative search
#12557 commented on
Apr 1, 2025 • 3 new comments -
contrib: support modelscope community
#12664 commented on
Apr 1, 2025 • 2 new comments -
sycl: allow ggml-sycl configuration and compilation using Visual Studio project/solution
#12625 commented on
Apr 1, 2025 • 1 new comment -
(draft) tts: Orpheus support
#12487 commented on
Apr 2, 2025 • 1 new comment -
gguf-split now respects dry-run option
#12681 commented on
Apr 2, 2025 • 0 new comments -
vocab : BailingMoE : change possessive quantifiers to greedy
#12677 commented on
Apr 1, 2025 • 0 new comments -
[CANN]get_rows and dup optimization
#12671 commented on
Apr 2, 2025 • 0 new comments -
tts : implement sesame CSM + Mimi decoder
#12648 commented on
Apr 1, 2025 • 0 new comments -
Draft: vulkan: Add bfloat16 support
#12554 commented on
Apr 2, 2025 • 0 new comments -
Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture
#12466 commented on
Apr 1, 2025 • 0 new comments -
Add Qwen2.5VL support
#12402 commented on
Apr 1, 2025 • 0 new comments -
`server`: streaming of tool calls and thoughts when `--jinja` is on
#12379 commented on
Apr 1, 2025 • 0 new comments -
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on
Apr 2, 2025 • 0 new comments -
llama : add llama_batch_ext
#11875 commented on
Apr 1, 2025 • 0 new comments -
tool-call: add support for tool-calls using Model Context Protocol
#11556 commented on
Apr 1, 2025 • 0 new comments -
llama : add option to override model tensor buffers
#11397 commented on
Apr 2, 2025 • 0 new comments -
Simplify and improve CUDA graphs through use of indirect copy pointers
#9017 commented on
Apr 1, 2025 • 0 new comments -
Misc. bug: CUDA errors with multi-threaded use
#11804 commented on
Apr 2, 2025 • 0 new comments -
Misc. bug: vulkan on Adreno GPU
#12139 commented on
Apr 2, 2025 • 0 new comments -
Misc. bug: gguf-dump 'newbyteorder' was removed
#12146 commented on
Apr 2, 2025 • 0 new comments -
Replacement for deprecated codevct string conversion
#12151 commented on
Apr 2, 2025 • 0 new comments -
Compile bug: compilation warnings (clang) Introduced in #10558
#12685 commented on
Apr 2, 2025 • 0 new comments -
Feature Request: Add support of convert.py for model Qwen2.5-Omni-7B
#12641 commented on
Apr 1, 2025 • 0 new comments -
ggml : add DirectML backend
#7772 commented on
Apr 1, 2025 • 0 new comments -
Misc. bug: vulkan on 6900xt
#12147 commented on
Apr 1, 2025 • 0 new comments -
Feature request: Graphical GGUF viewer
#6715 commented on
Apr 1, 2025 • 0 new comments -
Feature Request: Splitting layers according to VRAM usage on multi GPUs setups
#12654 commented on
Apr 1, 2025 • 0 new comments -
Feature Request: Qwen 2.5 VL
#11483 commented on
Apr 1, 2025 • 0 new comments -
llama-gemma3-cli: output degeneration after repeated uses
#12499 commented on
Apr 1, 2025 • 0 new comments