Pulse · ggml-org/llama.cpp · GitHub

June 27, 2025 – July 4, 2025

Overview

79 Active pull requests

60 Active issues

Could not load contribution data

Please try again later

44 Releases published by 1 person

b5769
published Jun 28, 2025
b5770
published Jun 28, 2025
b5771
published Jun 28, 2025
b5772
published Jun 28, 2025
b5773
published Jun 28, 2025
b5774
published Jun 28, 2025
b5775
published Jun 29, 2025
b5777
published Jun 29, 2025
b5778
published Jun 29, 2025
b5780
published Jun 29, 2025
b5782
published Jun 30, 2025
b5783
published Jun 30, 2025
b5784
published Jun 30, 2025
b5785
published Jun 30, 2025
b5787
published Jun 30, 2025
b5788
published Jul 1, 2025
b5792
published Jul 1, 2025
b5793
published Jul 1, 2025
b5794
published Jul 1, 2025
b5795
published Jul 1, 2025
b5797
published Jul 1, 2025
b5798
published Jul 2, 2025
b5801
published Jul 2, 2025
b5802
published Jul 2, 2025
b5803
published Jul 2, 2025
b5804
published Jul 2, 2025
b5808
published Jul 2, 2025
b5809
published Jul 2, 2025
b5811
published Jul 2, 2025
b5812
published Jul 2, 2025
b5814
published Jul 3, 2025
b5815
published Jul 3, 2025
b5816
published Jul 3, 2025
b5817
published Jul 3, 2025
b5819
published Jul 3, 2025
b5820
published Jul 3, 2025
b5821
published Jul 3, 2025
b5822
published Jul 3, 2025
b5823
published Jul 3, 2025
b5824
published Jul 4, 2025
b5825
published Jul 4, 2025
b5826
published Jul 4, 2025
b5827
published Jul 4, 2025
b5828
published Jul 4, 2025

58 Pull requests merged by 25 people

metal : disable fast math in all quantize kernels
#14528 merged Jul 4, 2025
batch : add optional for sequential equal split
#14511 merged Jul 4, 2025
graph : prepare for 4D mask
#14515 merged Jul 4, 2025
batch : add n_used count
#14512 merged Jul 4, 2025
[CANN]:Replace aclrtMemsetSync with InplaceZero operator for zero tensor creation
#14002 merged Jul 4, 2025
ggml : implement GEGLU_ERF and GEGLU_QUICK ops
#14445 merged Jul 3, 2025
opencl: broadcast for soft_max
#14510 merged Jul 3, 2025
vulkan: support mixed/deepseekR1 FA head sizes
#14509 merged Jul 3, 2025
ggml: backward pass for split swiglu
#14483 merged Jul 3, 2025
sycl: Fix conditional enabling following arch checks for ggml-sycl
#14504 merged Jul 3, 2025
convert : correct gemma 3n conversion
#14450 merged Jul 3, 2025
kv-cache : use ggml_set_rows
#14285 merged Jul 3, 2025
ggml : fix FA mask dim 2 and 3
#14505 merged Jul 3, 2025
ggml : remove kompute backend
#14501 merged Jul 3, 2025
CUDA: add dynamic shared mem to softmax, refactor general usage
#14497 merged Jul 2, 2025
gguf-py : add support for chat template jinja files
#14508 merged Jul 2, 2025
llama : initial Mamba-2 support
#9126 merged Jul 2, 2025
sync : ggml
#14507 merged Jul 2, 2025
GitHub workflow: set RPATH to "@loader_path" / "$ORIGIN" to ensure executables and dynamic libraries search for dependencies in their origin directory.
#14309 merged Jul 2, 2025
ggml : support broadcast for ggml_soft_max_ext and ggml_flash_attn_ext
#14435 merged Jul 2, 2025
opencl: preventing buffer overflows in debugging utils
#14490 merged Jul 2, 2025
CUDA: add softmax broadcast
#14475 merged Jul 2, 2025
CUDA: broadcasting for FlashAttention mask
#14500 merged Jul 2, 2025
simple-chat : fix context-exceeded condition
#14494 merged Jul 2, 2025
opencl : skip empty nodes on cgraph compute
#14491 merged Jul 2, 2025
opencl: update upscale to support align corners
#14488 merged Jul 2, 2025
ci : add OpenCL to labeler workflow
#14496 merged Jul 2, 2025
github : add OpenCL backend to issue templates
#14492 merged Jul 2, 2025
Callback before abort
#14481 merged Jul 2, 2025
ci : disable fast-math for Metal GHA CI
#14478 merged Jul 1, 2025
Add Vulkan images to docker.md
#14472 merged Jul 1, 2025
[CANN]update aclnnGroupedMatmulV2 to aclnnGroupedMatmulV3
#14411 merged Jul 1, 2025
vulkan: Split large mul_mat_id to fit in shared memory
#14451 merged Jul 1, 2025
vulkan: support softmax/FA batch and broadcast
#14449 merged Jul 1, 2025
vulkan : add GELU_ERF
#14455 merged Jul 1, 2025
sync : ggml
#14473 merged Jul 1, 2025
opencl: add GEGLU, REGLU, SWIGLU
#14456 merged Jul 1, 2025
Add Conv2d for CPU
#14388 merged Jun 30, 2025
memory : correctly handle failure in apply()
#14438 merged Jun 30, 2025
metal : disable fast-math for some cpy kernels
#14460 merged Jun 30, 2025
ggml-cpu: sycl: Re-enable exp f16
#14462 merged Jun 30, 2025
test-backend-ops : disable llama test
#14461 merged Jun 30, 2025
Remove redundant include path in CMakeLists.txt
#14452 merged Jun 30, 2025
Make the shell scripts cross-platform
#14341 merged Jun 30, 2025
Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client
#13196 merged Jun 29, 2025
Fix appearance of the chats list context menu for the browser Safari
#14322 merged Jun 29, 2025
SYCL: disable faulty fp16 exp kernel
#14395 merged Jun 29, 2025
ggml : fix unmerged GGML_FPxx_TO_FPxx refactoring
#14443 merged Jun 29, 2025
ggml : implement REGLU/GEGLU/SWIGLU ops
#14158 merged Jun 29, 2025
vulkan: Add fusion support for RMS_NORM+MUL
#14366 merged Jun 29, 2025
CUDA: add bf16 and f32 support to cublas_mul_mat_batched
#14361 merged Jun 28, 2025
vulkan: Increase workgroup size for GLU, for performance
#14345 merged Jun 28, 2025
vulkan: handle noncontig in the final case of ggml_vk_get_cpy_pipeline
#14378 merged Jun 28, 2025
vulkan: lock accesses of pinned_memory vector
#14333 merged Jun 28, 2025
model : add support for ERNIE 4.5 0.3B model
#14408 merged Jun 28, 2025
[CANN] Fix a bug related to enabling async_mode
#14432 merged Jun 28, 2025
ci : fix windows build and release
#14431 merged Jun 28, 2025
vulkan: Fix GGML_VULKAN_SHADER_DEBUG_INFO
#14427 merged Jun 28, 2025

21 Pull requests opened by 19 people

Added CI with RISC-V RVV1.0 Hardware
#14439 opened Jun 29, 2025
Pr/7191
#14447 opened Jun 29, 2025
Chore: batch prompts, extract tensors specific layer
#14463 opened Jun 30, 2025
server : (webui) let server send locally-defined default webui settings
#14468 opened Jun 30, 2025
opencl : add GELU_ERF
#14476 opened Jul 1, 2025
llama : reuse compute graphs
#14482 opened Jul 1, 2025
Compute buffer and KV-cache aware layer distribution for multi-GPU inference
#14484 opened Jul 1, 2025
vulkan: unpack more values at a time for iquants mat mul
#14485 opened Jul 1, 2025
Allow truncation when embedding
#14493 opened Jul 2, 2025
MUSA: upgrade musa sdk to <<TBD>>
#14498 opened Jul 2, 2025
mtmd : Fix 32-bit narrowing issue in export-lora and mtmd clip
#14503 opened Jul 2, 2025
kv-cache : prepare K/V buffers for separation
#14517 opened Jul 3, 2025
vulkan: Handle updated FA dim2/3 definition
#14518 opened Jul 3, 2025
ggml: Add initial WebGPU backend
#14521 opened Jul 3, 2025
train: add simple loading already tokenized data from parquet dataset
#14522 opened Jul 3, 2025
webui : add a preset feature to the settings
#14523 opened Jul 3, 2025
CUDA: add bf16 and i32 to getrows
#14529 opened Jul 4, 2025
ggml: fix typo in ggml.c
#14531 opened Jul 4, 2025
common: detect and prefer big cores on AArch64 hybrid CPU on linux
#14532 opened Jul 4, 2025
llama: add initial support for Falcon-H1 model family
#14534 opened Jul 4, 2025
OpenCL: add tiled mul_mat_f16_f32
#14535 opened Jul 4, 2025

36 Issues closed by 12 people

Misc. bug: Inconsistency between llama cpp server values and transformers library for reranking
#14533 closed Jul 4, 2025
Compile bug: llama.cpp-master/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:80:54:error '_mm256_set_m128i' was not declared in this scope
#11385 closed Jul 4, 2025
Eval bug: repeated output for llama-server
#12782 closed Jul 4, 2025
How to start gemma3 multimodal model service using llama_server
#13465 closed Jul 4, 2025
Eval bug: Not splitting model across rows correctly
#13661 closed Jul 4, 2025
Feature Request: Procedure for reproducing test models
#13662 closed Jul 4, 2025
Feature Request: Llama-bench improvement
#13671 closed Jul 4, 2025
Eval bug: example/finetune.cpp crashing
#14424 closed Jul 3, 2025
Eval bug: Assertion `status == LLAMA_MEMORY_STATUS_SUCCESS' failed
#14506 closed Jul 3, 2025
Feature Request: dynamic number of experts (hyperparam per request)
#13572 closed Jul 3, 2025
Misc. bug: logit-bias doesn't seem to work
#13605 closed Jul 3, 2025
can't quant llama3 with expanded tokenizer
#13628 closed Jul 3, 2025
Feature Request: Support for Qwen with Parallel Scaling
#13632 closed Jul 3, 2025
Compile bug: GPU Detection Fails during cmake --build
#13636 closed Jul 3, 2025
Compile bug: ValueError: Can not map tensor 'lm_head.biases' when converting Qwen3-8B (MLX fused LoRA) model
#14467 closed Jul 2, 2025
Feature Request: Support Codestral Mamba
#8519 closed Jul 2, 2025
llama : support Mamba-2
#7727 closed Jul 2, 2025
Eval bug: llama-simple-chat crashes with "failed to decode" after some requests
#14487 closed Jul 2, 2025
Eval bug: "Floating point exception" on OpenCL backend when using MoE models and processing prompt longer than ubatch size
#14453 closed Jul 2, 2025
Misc. bug: --split-mode none ≠ --tensor-split 100,0,0 (all layers on GPU0)
#13612 closed Jul 2, 2025
llama_model_load: error loading model: error loading model vocabulary: std::bad_cast
#13613 closed Jul 2, 2025
Compile bug: tools build failing
#13614 closed Jul 2, 2025
Feature Request: update readme for ideal MOE tensor override calculation
#13616 closed Jul 2, 2025
Eval bug: GGML_ASSERT(nei0 * nei1 <= 4096) failed when setting ubatch to 2048 on Qwen 3-30B
#14426 closed Jul 1, 2025
Feature Request: add jina embeddings model availible convert to gguf
#12327 closed Jun 30, 2025
Eval bug: [CANN] When use aclnnMatmul with cube_math_type=2
#14441 closed Jun 30, 2025
Eval bug: multimodal llama-gemma3-cli gives nonsensical outputs when used with Vulkan
#13046 closed Jun 30, 2025
Misc. bug: GGML_ASSERT(view_src == NULL || data_size == 0 || data_size + view_offs <= ggml_nbytes(view_src)) failed
#13581 closed Jun 30, 2025
Eval bug:GGUF Conversion from LLaVA 1.6(LLaVA NeXT) doesn't work
#13593 closed Jun 30, 2025
Is PLE offloading to GPU supported?
#14430 closed Jun 29, 2025
Eval bug: Weight repacking for AVX2 block interleaving is very slow and NUMA unfriendly
#12759 closed Jun 29, 2025
Feature Proposal: Server Model Switching at Runtime
#13027 closed Jun 29, 2025
Feature Request: Add new model support Hunyuan-A13B
#14433 closed Jun 28, 2025
Misc. bug: Fix CI for windows
#14412 closed Jun 28, 2025
(Discussion) Improve usability of llama-server
#13367 closed Jun 28, 2025
Research: How to integrate VITA 1.5 for multi-modal GGUF deployment?
#13520 closed Jun 28, 2025

24 Issues opened by 24 people

Compile bug: Built target undefined reference std::filesystem
#14536 opened Jul 4, 2025
Feature Request: to enable real batch for multiple images input of VLM
#14530 opened Jul 4, 2025
Feature Request: Speed up image encode with Metal
#14527 opened Jul 4, 2025
Eval bug: Gemma 3n on Vulkan on Ryzen APUs produces garbled output
#14525 opened Jul 3, 2025
Misc. bug: There's no `\n` token in Llama 3.2 vocab!
#14524 opened Jul 3, 2025
Refactor: mtmd_get_output_embd() does not return embedding vector length
#14516 opened Jul 3, 2025
Eval bug: "zsh: IOT instruction (core dumped)" in RWKV when use reverse prompt without `--prompt` or `-p` option
#14513 opened Jul 3, 2025
Feature Request: Support GLM-4.1V-9B-Thinking
#14495 opened Jul 2, 2025
Feature Request: Support (Huawei) Pangu Pro 72B MoE Model
#14486 opened Jul 1, 2025
Feature Request: Support EXAONE 4.0
#14474 opened Jul 1, 2025
Feature Request: per-chat prompt caching
#14470 opened Jul 1, 2025
Eval bug: Gemma vision head (possibly Siglip) yields garbage on vulkan / sycl on Intel N150
#14469 opened Jun 30, 2025
Feature Request: Add Ernie4.5MoE support
#14465 opened Jun 30, 2025
Compile bug: zero-size array ‘gemm_gemv_kernels’ / invalid feature modifier ‘sme’
#14464 opened Jun 30, 2025
Misc. bug: convert_hf_to_gguf.py not working on qwen3-embedding and qwen3-embedding lora tuned models
#14459 opened Jun 30, 2025
Misc. bug: oom ，The process does not exit.
#14458 opened Jun 30, 2025
Eval bug: gemma-3n crash when using HIP
#14448 opened Jun 29, 2025
Memory isn't freed with a particular set of options
#14446 opened Jun 29, 2025
Eval bug: Loading multimodal ultravox model locally fails at loading clip model without any errors.
#14444 opened Jun 29, 2025
Feature Request: Adding Parquet support for tokenized datasets
#14442 opened Jun 29, 2025
Compile bug: SYCL with OneAPI Toolkit 2025.2 & NixOS
#14440 opened Jun 29, 2025
Eval bug: Extreme perplexity for gemma 3n
#14437 opened Jun 29, 2025
Misc. bug: Saving and restoring an empty slot does a crasherino
#14434 opened Jun 28, 2025
Feature Request: Gemma3n multimodal support
#14429 opened Jun 28, 2025

83 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

test-backend-ops: add support for specifying output format
#14368 commented on Jul 4, 2025 • 25 new comments
Granite Four
#13550 commented on Jul 3, 2025 • 24 new comments
llama : support Jamba hybrid Transformer-Mamba models
#7531 commented on Jul 4, 2025 • 15 new comments
model : add hunyuan moe
#14425 commented on Jul 4, 2025 • 14 new comments
llama : add high-throughput mode
#14363 commented on Jul 4, 2025 • 8 new comments
ggml: adds CONV_2D op and direct GEMM Vulkan implementation
#14316 commented on Jun 30, 2025 • 6 new comments
ggml: aarch64: Implement SVE Kernels for Int 8 Quantization
#14117 commented on Jul 4, 2025 • 2 new comments
ggml: introduce GGML_NUMA_MIGRATE to optimize cross NUMA op computation
#14232 commented on Jul 3, 2025 • 1 new comment
webui: preserve partial content when streaming errors occur
#14374 commented on Jul 3, 2025 • 1 new comment
Misc. bug: missing messages in JSON export via llama-server web UI
#13552 commented on Jul 4, 2025 • 0 new comments
CUDA: update build CTK version to 12.8
#13360 commented on Jul 2, 2025 • 0 new comments
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method (TMAC)
#13206 commented on Jul 2, 2025 • 0 new comments
Update llama-quant.cpp llama_tensor_get_type with DeepSeek friendly modifications
#12727 commented on Jun 30, 2025 • 0 new comments
imatrix: add option to display importance score statistics for a given imatrix file
#12718 commented on Jul 1, 2025 • 0 new comments
llama-server : implement universal assisted decoding
#12635 commented on Jul 2, 2025 • 0 new comments
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on Jul 4, 2025 • 0 new comments
Eval bug: SIGILL
#13161 commented on Jul 4, 2025 • 0 new comments
Cache based tokenization for the server input prompts
#12067 commented on Jul 3, 2025 • 0 new comments
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on Jun 30, 2025 • 0 new comments
Overlap CUDA graph building and processing to minimize GPU idle time and improve tokens per seconds performance.
#11867 commented on Jun 29, 2025 • 0 new comments
llama_eval removed, no deprecation info, still referenced in comments
#14271 commented on Jul 4, 2025 • 0 new comments
[Draft] Tensor Parallel support to llama.cpp
#9648 commented on Jun 28, 2025 • 0 new comments
Feature Request: Granite 4 Support
#13275 commented on Jun 30, 2025 • 0 new comments
Misc. bug: [SERVER] Multiple slots, generation speed is degraded after each generation/slot used
#10860 commented on Jul 4, 2025 • 0 new comments
Misc. bug: sentencepiece not included in requirements.txt
#13982 commented on Jul 4, 2025 • 0 new comments
ggml : add ggml_scale_bias
#14417 commented on Jun 29, 2025 • 0 new comments
[CANN] weight format to nz for Ascend310P3
#14407 commented on Jul 1, 2025 • 0 new comments
OpenCL: add conv2d kernel
#14403 commented on Jul 4, 2025 • 0 new comments
ggml-cpu: Build variant targeting Neoverse-V2
#14380 commented on Jun 30, 2025 • 0 new comments
Q2k interleaving implementation - x86/x64 SIMD
#14373 commented on Jul 4, 2025 • 0 new comments
server : fix assistant prefilling when content is an array
#14360 commented on Jul 4, 2025 • 0 new comments
llama : expose C API to get layer device type
#14358 commented on Jul 4, 2025 • 0 new comments
make "server-core" library
#14331 commented on Jun 30, 2025 • 0 new comments
Add SmolLM3
#14240 commented on Jul 4, 2025 • 0 new comments
logit_bias: apply configurable escalating EOG bias at low n_remain
#14229 commented on Jul 2, 2025 • 0 new comments
tests : enhance llama-bench with separate timings (pp/gen t/s), added n_threads_batch
#14219 commented on Jul 2, 2025 • 0 new comments
Add plamo2
#13930 commented on Jul 3, 2025 • 0 new comments
finetune.cpp command-line arg
#13873 commented on Jul 4, 2025 • 0 new comments
Move page cache via mbind to prevent cross-NUMA access
#13731 commented on Jun 30, 2025 • 0 new comments
remove templates from soft_max_f32_submitter to allow SYCL graph updates
#13724 commented on Jul 1, 2025 • 0 new comments
model : jina-embeddings-v3 support
#13693 commented on Jun 28, 2025 • 0 new comments
scripts: update pyproject.toml - deprecated poetry config + support uv
#13615 commented on Jul 3, 2025 • 0 new comments
Feature Request: Hunyuan-A13B model support
#14415 commented on Jun 30, 2025 • 0 new comments
Feature Request: Qwen2.5-Omni
#12673 commented on Jun 30, 2025 • 0 new comments
Compile bug: nvcc fatal : Unsupported gpu architecture 'compute_'
#13893 commented on Jun 30, 2025 • 0 new comments
Feature Request: Generate Image Embeddings with llama.cpp
#13913 commented on Jun 30, 2025 • 0 new comments
Compile bug: allocator.h:165:24 Call to implicitly-deleted copy constructor of 'std::unique_ptr<llama_adapter_lora, llama_adapter_lora_deleter>'
#13925 commented on Jun 30, 2025 • 0 new comments
Eval bug: llama-cli, Qwen3 jinja template will break CLI multiturn conversation
#13404 commented on Jun 29, 2025 • 0 new comments
Misc. bug: Complex tool calling schema causes an "Unrecognized Schema" exception
#14227 commented on Jun 29, 2025 • 0 new comments
main: failed to quantize model from 'gemma-3n-E2B-it.f16.gguf'
#14405 commented on Jun 29, 2025 • 0 new comments
LoRA training example
#13485 commented on Jun 29, 2025 • 0 new comments
Automatic optimization of runtime parameters such as -ngl given memory constraints
#13860 commented on Jun 29, 2025 • 0 new comments
Feature Request: Make the `/completion` endpoint in `llama-server` work with multimodal models
#13872 commented on Jun 29, 2025 • 0 new comments
Feature Request: Multimodal: llama-server support for Qwen2.5-VL chat template type: list of image paths (type: "video")
#13905 commented on Jun 29, 2025 • 0 new comments
Misc. bug: Server/Chat parallel tool calling not working
#14101 commented on Jun 28, 2025 • 0 new comments
changelog : `llama-server` REST API
#9291 commented on Jun 28, 2025 • 0 new comments
Request for Official Support of AMD Ryzen AI Platform NPU
#14377 commented on Jun 28, 2025 • 0 new comments
Eval bug: llama-mtmd-cli doesn't support system prompts
#13454 commented on Jun 28, 2025 • 0 new comments
Feature Request: video support in mtmd-cli / server
#13754 commented on Jun 28, 2025 • 0 new comments
Feature Request: Set default of --numa to distribute
#13850 commented on Jun 28, 2025 • 0 new comments
Eval bug: Embeddings Always returned as non
#13854 commented on Jun 28, 2025 • 0 new comments
Feature Request: Optimize for Nvidia Jetson Series' truly Unified Memory Architecture
#13856 commented on Jun 28, 2025 • 0 new comments
Compile bug:
#13992 commented on Jul 4, 2025 • 0 new comments
Feature Request: allow spacebar to confirm web UI prompts [like the deleting a chat confirmation]
#13999 commented on Jul 4, 2025 • 0 new comments
Suport for Jamba JambaForCausalLM
#6372 commented on Jul 3, 2025 • 0 new comments
Feature Request: Support Jina V3 arch
#9585 commented on Jul 3, 2025 • 0 new comments
Misc. bug: linux/arm64 does not exist for the server docker image
#13891 commented on Jul 3, 2025 • 0 new comments
Intel® Core™ Ultra processors NPU Support
#5079 commented on Jul 3, 2025 • 0 new comments
Eval bug: microsoft/bitnet-b1.58-2B-4T-gguf
#12997 commented on Jul 3, 2025 • 0 new comments
Feature Request: WINA
#13964 commented on Jul 3, 2025 • 0 new comments
make using shifting context easier.
#13969 commented on Jul 3, 2025 • 0 new comments
Eval bug: Unable to load the model on GPU
#13967 commented on Jul 3, 2025 • 0 new comments
context shifting should be default option?
#13971 commented on Jul 3, 2025 • 0 new comments
Misc. bug: llama-bench improper tensor split
#13972 commented on Jul 3, 2025 • 0 new comments
Misc. bug: ROCm images cannot be found
#11913 commented on Jul 2, 2025 • 0 new comments
Enhancement: Improve ROCm performance on various quants (benchmarks included)
#11931 commented on Jul 2, 2025 • 0 new comments
Eval bug: llama-tts abort
#13955 commented on Jul 2, 2025 • 0 new comments
Eval bug: llama-mtmd-cli : option --image failed to load image
#13959 commented on Jul 2, 2025 • 0 new comments
Compile bug: HIP compile fails during linking stage, undefined reference error repeats
#14155 commented on Jul 1, 2025 • 0 new comments
Feature Request: --swa-extra parameter needed to restore speculative decode function with SWA
#13747 commented on Jul 1, 2025 • 0 new comments
Misc. bug: Decreased success rate for tool calling
#13769 commented on Jul 1, 2025 • 0 new comments
Feature Request: Regarding Hardcoded GGML Tensor Name Length Limit (GGML_MAX_NAME)
#13947 commented on Jul 1, 2025 • 0 new comments
Feature Request: Can the embeddings endpoint with llama.cpp server generate sparse vectors using models like bge-me that support dense/sparse embeddings
#14404 commented on Jun 30, 2025 • 0 new comments