-
Notifications
You must be signed in to change notification settings - Fork 12.3k
Insights: ggml-org/llama.cpp
Overview
Could not load contribution data
Please try again later
44 Releases published by 1 person
-
b5769
published
Jun 28, 2025 -
b5770
published
Jun 28, 2025 -
b5771
published
Jun 28, 2025 -
b5772
published
Jun 28, 2025 -
b5773
published
Jun 28, 2025 -
b5774
published
Jun 28, 2025 -
b5775
published
Jun 29, 2025 -
b5777
published
Jun 29, 2025 -
b5778
published
Jun 29, 2025 -
b5780
published
Jun 29, 2025 -
b5782
published
Jun 30, 2025 -
b5783
published
Jun 30, 2025 -
b5784
published
Jun 30, 2025 -
b5785
published
Jun 30, 2025 -
b5787
published
Jun 30, 2025 -
b5788
published
Jul 1, 2025 -
b5792
published
Jul 1, 2025 -
b5793
published
Jul 1, 2025 -
b5794
published
Jul 1, 2025 -
b5795
published
Jul 1, 2025 -
b5797
published
Jul 1, 2025 -
b5798
published
Jul 2, 2025 -
b5801
published
Jul 2, 2025 -
b5802
published
Jul 2, 2025 -
b5803
published
Jul 2, 2025 -
b5804
published
Jul 2, 2025 -
b5808
published
Jul 2, 2025 -
b5809
published
Jul 2, 2025 -
b5811
published
Jul 2, 2025 -
b5812
published
Jul 2, 2025 -
b5814
published
Jul 3, 2025 -
b5815
published
Jul 3, 2025 -
b5816
published
Jul 3, 2025 -
b5817
published
Jul 3, 2025 -
b5819
published
Jul 3, 2025 -
b5820
published
Jul 3, 2025 -
b5821
published
Jul 3, 2025 -
b5822
published
Jul 3, 2025 -
b5823
published
Jul 3, 2025 -
b5824
published
Jul 4, 2025 -
b5825
published
Jul 4, 2025 -
b5826
published
Jul 4, 2025 -
b5827
published
Jul 4, 2025 -
b5828
published
Jul 4, 2025
58 Pull requests merged by 25 people
-
metal : disable fast math in all quantize kernels
#14528 merged
Jul 4, 2025 -
batch : add optional for sequential equal split
#14511 merged
Jul 4, 2025 -
graph : prepare for 4D mask
#14515 merged
Jul 4, 2025 -
batch : add n_used count
#14512 merged
Jul 4, 2025 -
[CANN]:Replace aclrtMemsetSync with InplaceZero operator for zero tensor creation
#14002 merged
Jul 4, 2025 -
ggml : implement GEGLU_ERF and GEGLU_QUICK ops
#14445 merged
Jul 3, 2025 -
opencl: broadcast for
soft_max
#14510 merged
Jul 3, 2025 -
vulkan: support mixed/deepseekR1 FA head sizes
#14509 merged
Jul 3, 2025 -
ggml: backward pass for split swiglu
#14483 merged
Jul 3, 2025 -
sycl: Fix conditional enabling following arch checks for ggml-sycl
#14504 merged
Jul 3, 2025 -
convert : correct gemma 3n conversion
#14450 merged
Jul 3, 2025 -
kv-cache : use ggml_set_rows
#14285 merged
Jul 3, 2025 -
ggml : fix FA mask dim 2 and 3
#14505 merged
Jul 3, 2025 -
ggml : remove kompute backend
#14501 merged
Jul 3, 2025 -
CUDA: add dynamic shared mem to softmax, refactor general usage
#14497 merged
Jul 2, 2025 -
gguf-py : add support for chat template jinja files
#14508 merged
Jul 2, 2025 -
llama : initial Mamba-2 support
#9126 merged
Jul 2, 2025 -
sync : ggml
#14507 merged
Jul 2, 2025 -
ggml : support broadcast for ggml_soft_max_ext and ggml_flash_attn_ext
#14435 merged
Jul 2, 2025 -
opencl: preventing buffer overflows in debugging utils
#14490 merged
Jul 2, 2025 -
CUDA: add softmax broadcast
#14475 merged
Jul 2, 2025 -
CUDA: broadcasting for FlashAttention mask
#14500 merged
Jul 2, 2025 -
simple-chat : fix context-exceeded condition
#14494 merged
Jul 2, 2025 -
opencl : skip empty nodes on cgraph compute
#14491 merged
Jul 2, 2025 -
opencl: update
upscale
to supportalign corners
#14488 merged
Jul 2, 2025 -
ci : add OpenCL to labeler workflow
#14496 merged
Jul 2, 2025 -
github : add OpenCL backend to issue templates
#14492 merged
Jul 2, 2025 -
Callback before abort
#14481 merged
Jul 2, 2025 -
ci : disable fast-math for Metal GHA CI
#14478 merged
Jul 1, 2025 -
Add Vulkan images to docker.md
#14472 merged
Jul 1, 2025 -
[CANN]update aclnnGroupedMatmulV2 to aclnnGroupedMatmulV3
#14411 merged
Jul 1, 2025 -
vulkan: Split large mul_mat_id to fit in shared memory
#14451 merged
Jul 1, 2025 -
vulkan: support softmax/FA batch and broadcast
#14449 merged
Jul 1, 2025 -
vulkan : add GELU_ERF
#14455 merged
Jul 1, 2025 -
sync : ggml
#14473 merged
Jul 1, 2025 -
opencl: add
GEGLU
,REGLU
,SWIGLU
#14456 merged
Jul 1, 2025 -
Add Conv2d for CPU
#14388 merged
Jun 30, 2025 -
memory : correctly handle failure in apply()
#14438 merged
Jun 30, 2025 -
metal : disable fast-math for some cpy kernels
#14460 merged
Jun 30, 2025 -
ggml-cpu: sycl: Re-enable exp f16
#14462 merged
Jun 30, 2025 -
test-backend-ops : disable llama test
#14461 merged
Jun 30, 2025 -
Remove redundant include path in CMakeLists.txt
#14452 merged
Jun 30, 2025 -
Make the shell scripts cross-platform
#14341 merged
Jun 30, 2025 -
Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client
#13196 merged
Jun 29, 2025 -
Fix appearance of the chats list context menu for the browser Safari
#14322 merged
Jun 29, 2025 -
SYCL: disable faulty fp16 exp kernel
#14395 merged
Jun 29, 2025 -
ggml : fix unmerged GGML_FPxx_TO_FPxx refactoring
#14443 merged
Jun 29, 2025 -
ggml : implement REGLU/GEGLU/SWIGLU ops
#14158 merged
Jun 29, 2025 -
vulkan: Add fusion support for RMS_NORM+MUL
#14366 merged
Jun 29, 2025 -
CUDA: add bf16 and f32 support to cublas_mul_mat_batched
#14361 merged
Jun 28, 2025 -
vulkan: Increase workgroup size for GLU, for performance
#14345 merged
Jun 28, 2025 -
vulkan: handle noncontig in the final case of ggml_vk_get_cpy_pipeline
#14378 merged
Jun 28, 2025 -
vulkan: lock accesses of pinned_memory vector
#14333 merged
Jun 28, 2025 -
model : add support for ERNIE 4.5 0.3B model
#14408 merged
Jun 28, 2025 -
[CANN] Fix a bug related to enabling async_mode
#14432 merged
Jun 28, 2025 -
ci : fix windows build and release
#14431 merged
Jun 28, 2025 -
vulkan: Fix GGML_VULKAN_SHADER_DEBUG_INFO
#14427 merged
Jun 28, 2025
21 Pull requests opened by 19 people
-
Added CI with RISC-V RVV1.0 Hardware
#14439 opened
Jun 29, 2025 -
Pr/7191
#14447 opened
Jun 29, 2025 -
Chore: batch prompts, extract tensors specific layer
#14463 opened
Jun 30, 2025 -
server : (webui) let server send locally-defined default webui settings
#14468 opened
Jun 30, 2025 -
opencl : add GELU_ERF
#14476 opened
Jul 1, 2025 -
llama : reuse compute graphs
#14482 opened
Jul 1, 2025 -
Compute buffer and KV-cache aware layer distribution for multi-GPU inference
#14484 opened
Jul 1, 2025 -
vulkan: unpack more values at a time for iquants mat mul
#14485 opened
Jul 1, 2025 -
Allow truncation when embedding
#14493 opened
Jul 2, 2025 -
MUSA: upgrade musa sdk to <<TBD>>
#14498 opened
Jul 2, 2025 -
mtmd : Fix 32-bit narrowing issue in export-lora and mtmd clip
#14503 opened
Jul 2, 2025 -
kv-cache : prepare K/V buffers for separation
#14517 opened
Jul 3, 2025 -
vulkan: Handle updated FA dim2/3 definition
#14518 opened
Jul 3, 2025 -
ggml: Add initial WebGPU backend
#14521 opened
Jul 3, 2025 -
train: add simple loading already tokenized data from parquet dataset
#14522 opened
Jul 3, 2025 -
webui : add a preset feature to the settings
#14523 opened
Jul 3, 2025 -
CUDA: add bf16 and i32 to getrows
#14529 opened
Jul 4, 2025 -
ggml: fix typo in ggml.c
#14531 opened
Jul 4, 2025 -
common: detect and prefer big cores on AArch64 hybrid CPU on linux
#14532 opened
Jul 4, 2025 -
llama: add initial support for Falcon-H1 model family
#14534 opened
Jul 4, 2025 -
OpenCL: add tiled mul_mat_f16_f32
#14535 opened
Jul 4, 2025
36 Issues closed by 12 people
-
Misc. bug: Inconsistency between llama cpp server values and transformers library for reranking
#14533 closed
Jul 4, 2025 -
Eval bug: repeated output for llama-server
#12782 closed
Jul 4, 2025 -
How to start gemma3 multimodal model service using llama_server
#13465 closed
Jul 4, 2025 -
Eval bug: Not splitting model across rows correctly
#13661 closed
Jul 4, 2025 -
Feature Request: Procedure for reproducing test models
#13662 closed
Jul 4, 2025 -
Feature Request: Llama-bench improvement
#13671 closed
Jul 4, 2025 -
Eval bug: example/finetune.cpp crashing
#14424 closed
Jul 3, 2025 -
Eval bug: Assertion `status == LLAMA_MEMORY_STATUS_SUCCESS' failed
#14506 closed
Jul 3, 2025 -
Feature Request: dynamic number of experts (hyperparam per request)
#13572 closed
Jul 3, 2025 -
Misc. bug: logit-bias doesn't seem to work
#13605 closed
Jul 3, 2025 -
can't quant llama3 with expanded tokenizer
#13628 closed
Jul 3, 2025 -
Feature Request: Support for Qwen with Parallel Scaling
#13632 closed
Jul 3, 2025 -
Compile bug: GPU Detection Fails during cmake --build
#13636 closed
Jul 3, 2025 -
Feature Request: Support Codestral Mamba
#8519 closed
Jul 2, 2025 -
llama : support Mamba-2
#7727 closed
Jul 2, 2025 -
Eval bug: llama-simple-chat crashes with "failed to decode" after some requests
#14487 closed
Jul 2, 2025 -
Misc. bug: --split-mode none ≠ --tensor-split 100,0,0 (all layers on GPU0)
#13612 closed
Jul 2, 2025 -
llama_model_load: error loading model: error loading model vocabulary: std::bad_cast
#13613 closed
Jul 2, 2025 -
Compile bug: tools build failing
#13614 closed
Jul 2, 2025 -
Feature Request: update readme for ideal MOE tensor override calculation
#13616 closed
Jul 2, 2025 -
Eval bug: GGML_ASSERT(nei0 * nei1 <= 4096) failed when setting ubatch to 2048 on Qwen 3-30B
#14426 closed
Jul 1, 2025 -
Feature Request: add jina embeddings model availible convert to gguf
#12327 closed
Jun 30, 2025 -
Eval bug: [CANN] When use aclnnMatmul with cube_math_type=2
#14441 closed
Jun 30, 2025 -
Eval bug: multimodal llama-gemma3-cli gives nonsensical outputs when used with Vulkan
#13046 closed
Jun 30, 2025 -
Eval bug:GGUF Conversion from LLaVA 1.6(LLaVA NeXT) doesn't work
#13593 closed
Jun 30, 2025 -
Is PLE offloading to GPU supported?
#14430 closed
Jun 29, 2025 -
Eval bug: Weight repacking for AVX2 block interleaving is very slow and NUMA unfriendly
#12759 closed
Jun 29, 2025 -
Feature Proposal: Server Model Switching at Runtime
#13027 closed
Jun 29, 2025 -
Feature Request: Add new model support Hunyuan-A13B
#14433 closed
Jun 28, 2025 -
Misc. bug: Fix CI for windows
#14412 closed
Jun 28, 2025 -
(Discussion) Improve usability of llama-server
#13367 closed
Jun 28, 2025 -
Research: How to integrate VITA 1.5 for multi-modal GGUF deployment?
#13520 closed
Jun 28, 2025
24 Issues opened by 24 people
-
Compile bug: Built target undefined reference std::filesystem
#14536 opened
Jul 4, 2025 -
Feature Request: to enable real batch for multiple images input of VLM
#14530 opened
Jul 4, 2025 -
Feature Request: Speed up image encode with Metal
#14527 opened
Jul 4, 2025 -
Eval bug: Gemma 3n on Vulkan on Ryzen APUs produces garbled output
#14525 opened
Jul 3, 2025 -
Misc. bug: There's no `\n` token in Llama 3.2 vocab!
#14524 opened
Jul 3, 2025 -
Refactor: mtmd_get_output_embd() does not return embedding vector length
#14516 opened
Jul 3, 2025 -
Feature Request: Support GLM-4.1V-9B-Thinking
#14495 opened
Jul 2, 2025 -
Feature Request: Support (Huawei) Pangu Pro 72B MoE Model
#14486 opened
Jul 1, 2025 -
Feature Request: Support EXAONE 4.0
#14474 opened
Jul 1, 2025 -
Feature Request: per-chat prompt caching
#14470 opened
Jul 1, 2025 -
Eval bug: Gemma vision head (possibly Siglip) yields garbage on vulkan / sycl on Intel N150
#14469 opened
Jun 30, 2025 -
Feature Request: Add Ernie4.5MoE support
#14465 opened
Jun 30, 2025 -
Compile bug: zero-size array ‘gemm_gemv_kernels’ / invalid feature modifier ‘sme’
#14464 opened
Jun 30, 2025 -
Misc. bug: convert_hf_to_gguf.py not working on qwen3-embedding and qwen3-embedding lora tuned models
#14459 opened
Jun 30, 2025 -
Misc. bug: oom ,The process does not exit.
#14458 opened
Jun 30, 2025 -
Eval bug: gemma-3n crash when using HIP
#14448 opened
Jun 29, 2025 -
Memory isn't freed with a particular set of options
#14446 opened
Jun 29, 2025 -
Eval bug: Loading multimodal ultravox model locally fails at loading clip model without any errors.
#14444 opened
Jun 29, 2025 -
Feature Request: Adding Parquet support for tokenized datasets
#14442 opened
Jun 29, 2025 -
Compile bug: SYCL with OneAPI Toolkit 2025.2 & NixOS
#14440 opened
Jun 29, 2025 -
Eval bug: Extreme perplexity for gemma 3n
#14437 opened
Jun 29, 2025 -
Misc. bug: Saving and restoring an empty slot does a crasherino
#14434 opened
Jun 28, 2025 -
Feature Request: Gemma3n multimodal support
#14429 opened
Jun 28, 2025
83 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
test-backend-ops: add support for specifying output format
#14368 commented on
Jul 4, 2025 • 25 new comments -
Granite Four
#13550 commented on
Jul 3, 2025 • 24 new comments -
llama : support Jamba hybrid Transformer-Mamba models
#7531 commented on
Jul 4, 2025 • 15 new comments -
model : add hunyuan moe
#14425 commented on
Jul 4, 2025 • 14 new comments -
llama : add high-throughput mode
#14363 commented on
Jul 4, 2025 • 8 new comments -
ggml: adds CONV_2D op and direct GEMM Vulkan implementation
#14316 commented on
Jun 30, 2025 • 6 new comments -
ggml: aarch64: Implement SVE Kernels for Int 8 Quantization
#14117 commented on
Jul 4, 2025 • 2 new comments -
ggml: introduce GGML_NUMA_MIGRATE to optimize cross NUMA op computation
#14232 commented on
Jul 3, 2025 • 1 new comment -
webui: preserve partial content when streaming errors occur
#14374 commented on
Jul 3, 2025 • 1 new comment -
Misc. bug: missing messages in JSON export via llama-server web UI
#13552 commented on
Jul 4, 2025 • 0 new comments -
CUDA: update build CTK version to 12.8
#13360 commented on
Jul 2, 2025 • 0 new comments -
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method (TMAC)
#13206 commented on
Jul 2, 2025 • 0 new comments -
Update llama-quant.cpp llama_tensor_get_type with DeepSeek friendly modifications
#12727 commented on
Jun 30, 2025 • 0 new comments -
imatrix: add option to display importance score statistics for a given imatrix file
#12718 commented on
Jul 1, 2025 • 0 new comments -
llama-server : implement universal assisted decoding
#12635 commented on
Jul 2, 2025 • 0 new comments -
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on
Jul 4, 2025 • 0 new comments -
Eval bug: SIGILL
#13161 commented on
Jul 4, 2025 • 0 new comments -
Cache based tokenization for the server input prompts
#12067 commented on
Jul 3, 2025 • 0 new comments -
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on
Jun 30, 2025 • 0 new comments -
Overlap CUDA graph building and processing to minimize GPU idle time and improve tokens per seconds performance.
#11867 commented on
Jun 29, 2025 • 0 new comments -
llama_eval removed, no deprecation info, still referenced in comments
#14271 commented on
Jul 4, 2025 • 0 new comments -
[Draft] Tensor Parallel support to llama.cpp
#9648 commented on
Jun 28, 2025 • 0 new comments -
Feature Request: Granite 4 Support
#13275 commented on
Jun 30, 2025 • 0 new comments -
Misc. bug: [SERVER] Multiple slots, generation speed is degraded after each generation/slot used
#10860 commented on
Jul 4, 2025 • 0 new comments -
Misc. bug: sentencepiece not included in requirements.txt
#13982 commented on
Jul 4, 2025 • 0 new comments -
ggml : add ggml_scale_bias
#14417 commented on
Jun 29, 2025 • 0 new comments -
[CANN] weight format to nz for Ascend310P3
#14407 commented on
Jul 1, 2025 • 0 new comments -
OpenCL: add conv2d kernel
#14403 commented on
Jul 4, 2025 • 0 new comments -
ggml-cpu: Build variant targeting Neoverse-V2
#14380 commented on
Jun 30, 2025 • 0 new comments -
Q2k interleaving implementation - x86/x64 SIMD
#14373 commented on
Jul 4, 2025 • 0 new comments -
server : fix assistant prefilling when content is an array
#14360 commented on
Jul 4, 2025 • 0 new comments -
llama : expose C API to get layer device type
#14358 commented on
Jul 4, 2025 • 0 new comments -
make "server-core" library
#14331 commented on
Jun 30, 2025 • 0 new comments -
Add SmolLM3
#14240 commented on
Jul 4, 2025 • 0 new comments -
logit_bias: apply configurable escalating EOG bias at low n_remain
#14229 commented on
Jul 2, 2025 • 0 new comments -
tests : enhance llama-bench with separate timings (pp/gen t/s), added n_threads_batch
#14219 commented on
Jul 2, 2025 • 0 new comments -
Add plamo2
#13930 commented on
Jul 3, 2025 • 0 new comments -
finetune.cpp command-line arg
#13873 commented on
Jul 4, 2025 • 0 new comments -
Move page cache via mbind to prevent cross-NUMA access
#13731 commented on
Jun 30, 2025 • 0 new comments -
remove templates from soft_max_f32_submitter to allow SYCL graph updates
#13724 commented on
Jul 1, 2025 • 0 new comments -
model : jina-embeddings-v3 support
#13693 commented on
Jun 28, 2025 • 0 new comments -
scripts: update pyproject.toml - deprecated poetry config + support uv
#13615 commented on
Jul 3, 2025 • 0 new comments -
Feature Request: Hunyuan-A13B model support
#14415 commented on
Jun 30, 2025 • 0 new comments -
Feature Request: Qwen2.5-Omni
#12673 commented on
Jun 30, 2025 • 0 new comments -
Compile bug: nvcc fatal : Unsupported gpu architecture 'compute_'
#13893 commented on
Jun 30, 2025 • 0 new comments -
Feature Request: Generate Image Embeddings with llama.cpp
#13913 commented on
Jun 30, 2025 • 0 new comments -
Compile bug: allocator.h:165:24 Call to implicitly-deleted copy constructor of 'std::unique_ptr<llama_adapter_lora, llama_adapter_lora_deleter>'
#13925 commented on
Jun 30, 2025 • 0 new comments -
Eval bug: llama-cli, Qwen3 jinja template will break CLI multiturn conversation
#13404 commented on
Jun 29, 2025 • 0 new comments -
Misc. bug: Complex tool calling schema causes an "Unrecognized Schema" exception
#14227 commented on
Jun 29, 2025 • 0 new comments -
main: failed to quantize model from 'gemma-3n-E2B-it.f16.gguf'
#14405 commented on
Jun 29, 2025 • 0 new comments -
LoRA training example
#13485 commented on
Jun 29, 2025 • 0 new comments -
Automatic optimization of runtime parameters such as -ngl given memory constraints
#13860 commented on
Jun 29, 2025 • 0 new comments -
Feature Request: Make the `/completion` endpoint in `llama-server` work with multimodal models
#13872 commented on
Jun 29, 2025 • 0 new comments -
Feature Request: Multimodal: llama-server support for Qwen2.5-VL chat template type: list of image paths (type: "video")
#13905 commented on
Jun 29, 2025 • 0 new comments -
Misc. bug: Server/Chat parallel tool calling not working
#14101 commented on
Jun 28, 2025 • 0 new comments -
changelog : `llama-server` REST API
#9291 commented on
Jun 28, 2025 • 0 new comments -
Request for Official Support of AMD Ryzen AI Platform NPU
#14377 commented on
Jun 28, 2025 • 0 new comments -
Eval bug: llama-mtmd-cli doesn't support system prompts
#13454 commented on
Jun 28, 2025 • 0 new comments -
Feature Request: video support in mtmd-cli / server
#13754 commented on
Jun 28, 2025 • 0 new comments -
Feature Request: Set default of --numa to distribute
#13850 commented on
Jun 28, 2025 • 0 new comments -
Eval bug: Embeddings Always returned as non
#13854 commented on
Jun 28, 2025 • 0 new comments -
Feature Request: Optimize for Nvidia Jetson Series' truly Unified Memory Architecture
#13856 commented on
Jun 28, 2025 • 0 new comments -
Compile bug:
#13992 commented on
Jul 4, 2025 • 0 new comments -
Feature Request: allow spacebar to confirm web UI prompts [like the deleting a chat confirmation]
#13999 commented on
Jul 4, 2025 • 0 new comments -
Suport for Jamba JambaForCausalLM
#6372 commented on
Jul 3, 2025 • 0 new comments -
Feature Request: Support Jina V3 arch
#9585 commented on
Jul 3, 2025 • 0 new comments -
Misc. bug: linux/arm64 does not exist for the server docker image
#13891 commented on
Jul 3, 2025 • 0 new comments -
Intel® Core™ Ultra processors NPU Support
#5079 commented on
Jul 3, 2025 • 0 new comments -
Eval bug: microsoft/bitnet-b1.58-2B-4T-gguf
#12997 commented on
Jul 3, 2025 • 0 new comments -
Feature Request: WINA
#13964 commented on
Jul 3, 2025 • 0 new comments -
make using shifting context easier.
#13969 commented on
Jul 3, 2025 • 0 new comments -
Eval bug: Unable to load the model on GPU
#13967 commented on
Jul 3, 2025 • 0 new comments -
context shifting should be default option?
#13971 commented on
Jul 3, 2025 • 0 new comments -
Misc. bug: llama-bench improper tensor split
#13972 commented on
Jul 3, 2025 • 0 new comments -
Misc. bug: ROCm images cannot be found
#11913 commented on
Jul 2, 2025 • 0 new comments -
Enhancement: Improve ROCm performance on various quants (benchmarks included)
#11931 commented on
Jul 2, 2025 • 0 new comments -
Eval bug: llama-tts abort
#13955 commented on
Jul 2, 2025 • 0 new comments -
Eval bug: llama-mtmd-cli : option --image failed to load image
#13959 commented on
Jul 2, 2025 • 0 new comments -
Compile bug: HIP compile fails during linking stage, undefined reference error repeats
#14155 commented on
Jul 1, 2025 • 0 new comments -
Feature Request: --swa-extra parameter needed to restore speculative decode function with SWA
#13747 commented on
Jul 1, 2025 • 0 new comments -
Misc. bug: Decreased success rate for tool calling
#13769 commented on
Jul 1, 2025 • 0 new comments -
Feature Request: Regarding Hardcoded GGML Tensor Name Length Limit (GGML_MAX_NAME)
#13947 commented on
Jul 1, 2025 • 0 new comments -
Feature Request: Can the embeddings endpoint with llama.cpp server generate sparse vectors using models like bge-me that support dense/sparse embeddings
#14404 commented on
Jun 30, 2025 • 0 new comments