Pulse · ggml-org/llama.cpp · GitHub

March 19, 2025 – March 26, 2025

Overview

65 Active pull requests

86 Active issues

Could not load contribution data

Please try again later

27 Releases published by 1 person

b4929
published Mar 20, 2025
b4930
published Mar 20, 2025
b4932
published Mar 21, 2025
b4933
published Mar 21, 2025
b4934
published Mar 21, 2025
b4935
published Mar 21, 2025
b4936
published Mar 21, 2025
b4937
published Mar 21, 2025
b4938
published Mar 21, 2025
b4939
published Mar 22, 2025
b4940
published Mar 22, 2025
b4942
published Mar 22, 2025
b4944
published Mar 23, 2025
b4945
published Mar 24, 2025
b4946
published Mar 24, 2025
b4947
published Mar 24, 2025
b4948
published Mar 24, 2025
b4951
published Mar 24, 2025
b4953
published Mar 25, 2025
b4956
published Mar 25, 2025
b4957
published Mar 25, 2025
b4958
published Mar 25, 2025
b4961
published Mar 26, 2025
b4963
published Mar 26, 2025
b4964
published Mar 26, 2025
b4966
published Mar 26, 2025
b4967
published Mar 27, 2025

40 Pull requests merged by 25 people

SYCL: implement memset ggml backend buffer interface
#12580 merged Mar 27, 2025
Add support for new gfx1200 and gfx1201 targets
#12372 merged Mar 26, 2025
metal : refactor mat-vec code
#12569 merged Mar 26, 2025
grammars: upgrade to llguidance 0.7.10
#12576 merged Mar 26, 2025
clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend
#12566 merged Mar 26, 2025
convert : fix squeeze for ssm_conv tensors
#12573 merged Mar 26, 2025
ggml : fix MUL_MAT_ID repack with Q8_K
#12544 merged Mar 26, 2025
doc: [MUSA] minor changes
#12583 merged Mar 26, 2025
convert: fix Mistral3/Gemma3 model hparams init
#12571 merged Mar 25, 2025
De-duplicate fmt and format functions and optimize
#11596 merged Mar 25, 2025
ggml-cpu : bug fix related to KleidiAI multithreaded LHS packing
#12568 merged Mar 25, 2025
SYCL: disable Q4_0 reorder optimization by default
#12560 merged Mar 25, 2025
docs : add build instructions for KleidiAI
#12563 merged Mar 25, 2025
ci: [MUSA] add CI and update doc
#12562 merged Mar 25, 2025
context : fix worst-case reserve outputs
#12545 merged Mar 25, 2025
ci: [SYCL] Use main GPU and enable sysman
#12547 merged Mar 24, 2025
opencl: simplify kernel embedding logic in CMakeLists.txt
#12503 merged Mar 24, 2025
ci: fix SYCL build
#12546 merged Mar 24, 2025
docs: update: improve the Fedoa CUDA guide
#12536 merged Mar 24, 2025
llama-vocab : add SuperBPE pre-tokenizer
#12532 merged Mar 24, 2025
Fix clang warnings
#12540 merged Mar 24, 2025
Issues while building on AIX OS
#12541 merged Mar 24, 2025
vulkan: fix mul_mat_vec failure in backend tests
#12529 merged Mar 24, 2025
server : Add verbose output to OAI compatible chat endpoint.
#12246 merged Mar 23, 2025
Update install.md to include MacPorts section
#12518 merged Mar 23, 2025
llama : gemma3 : use output tensor if it exists in model weight
#12506 merged Mar 22, 2025
ggml : fix quantized cpy op
#12310 merged Mar 22, 2025
musa: refine compute capability
#12493 merged Mar 22, 2025
vulkan: Optimize mul_mat_vec p021 and nc shaders
#12505 merged Mar 22, 2025
Vulkan: RTE rounding for cpy to quant
#12480 merged Mar 21, 2025
vulkan: workaround for #10710 and #12147 16 bit unpack8 bug
#12472 merged Mar 21, 2025
model : do not repack if a GPU device is present
#12498 merged Mar 21, 2025
chore : cleanup llama_model_loader::TENSOR_ usage
#12492 merged Mar 21, 2025
llama-tts : avoid crashes related to bad model file paths
#12482 merged Mar 21, 2025
[SYCL] Fix build on Windows when ccache enabled (#9954)
#9976 merged Mar 21, 2025
sycl: cleanup oneDNN related code
#12097 merged Mar 21, 2025
webui: Stop rerender on textarea input and end the devastating lag
#12299 merged Mar 20, 2025
llama : make Qwen2MoE QKV bias optional
#12477 merged Mar 20, 2025
Block interleaving support for Q4_K quantization for x86 AVX2 architecture
#12332 merged Mar 20, 2025
Avoid calls to tokenizer.added_tokens_decoder
#12473 merged Mar 20, 2025

25 Pull requests opened by 22 people

Metal TQ2_0
#12485 opened Mar 20, 2025
(draft) tts: Orpheus support
#12487 opened Mar 21, 2025
Evenly and stably pinning thread pool
#12488 opened Mar 21, 2025
llamafile : ppc64le MMA implementation for Q4_0.
#12489 opened Mar 21, 2025
rpc : send hash when tensor data is above some fixed threshold
#12496 opened Mar 21, 2025
llama: support Qwen3
#12501 opened Mar 21, 2025
cmake: Allow to configure GGML_BUILD_NUMBER with file
#12509 opened Mar 22, 2025
quantize: Handle user-defined quantization levels for additional tensors
#12511 opened Mar 22, 2025
perplexity: Add option to ignore context window overflow errors and continue score calculation
#12512 opened Mar 22, 2025
llama-tts : precompute irFFT theta
#12514 opened Mar 22, 2025
Vulkan: Remove dedicated aligned matrix matrix multiplication shaders
#12515 opened Mar 22, 2025
cmake: fix ccache conflict
#12522 opened Mar 23, 2025
ggml : riscv: add 128-bit RVV support
#12530 opened Mar 23, 2025
(draft) tts: Sesame support
#12549 opened Mar 24, 2025
llama-map to support hugepage feature of pagesize 2M or 1G which can …
#12552 opened Mar 24, 2025
Draft: vulkan: Add bfloat16 support
#12554 opened Mar 24, 2025
Add Trillion 7B model support
#12556 opened Mar 25, 2025
ggml-quants : weighted rounding algorithms with cumulative search
#12557 opened Mar 25, 2025
vulkan: Implement grouped query attention in the coopmat2 FA shader
#12559 opened Mar 25, 2025
Enable MMA for BF16 data types on Powerpc
#12565 opened Mar 25, 2025
Fix T5Encoder model handling.
#12590 opened Mar 26, 2025
llama : make loras compatible with repacking
#12593 opened Mar 26, 2025
llamafile : ppc64le GEMV forwarding for FP32.
#12594 opened Mar 26, 2025
Support Qwen2_5_VLForConditionalGeneration
#12595 opened Mar 26, 2025
opencl: add multi and vision rope, `gelu_quick` and `im2col`
#12600 opened Mar 27, 2025

53 Issues closed by 21 people

Compile bug: Fails to compile with undefined references in libggml.so
#11562 closed Mar 27, 2025
Eval bug: Abnormal memory usage on Metal backend
#12574 closed Mar 26, 2025
Eval bug: GPU Hang Error on Metal backend
#12277 closed Mar 26, 2025
Misc. bug: Falcon3-Mamba-7B fails on ggml_ssm_conv
#12572 closed Mar 26, 2025
Eval bug: Program not working properly due to new features of "repack Q4_K tensor"
#12528 closed Mar 26, 2025
Misc. bug: All llama executables exit immediately without console output
#10929 closed Mar 26, 2025
Eval bug: error: Double type is not supported on this platform.
#11266 closed Mar 26, 2025
Feature Request: llama-server support continue_final_message
#11755 closed Mar 26, 2025
Misc. bug: embedding example coredump since
#12561 closed Mar 26, 2025
Misc. bug: Gemma3 adapter gguf conversion fails
#12551 closed Mar 25, 2025
GPT2: llama_model_load: error loading model: missing tensor 'output.weight'
#12567 closed Mar 25, 2025
Regression: e0dbec0 (aka #12181) breaks pooled embeddings: mean
#12517 closed Mar 25, 2025
Feature Request: Implement Qwen2Model
#12142 closed Mar 25, 2025
Misc. bug: LLGuidance sampler appears to need special treatment compared to other samplers
#12474 closed Mar 25, 2025
Eval bug: How to load clip_model_load to CUDA
#11250 closed Mar 25, 2025
Misc. bug: [json.exception.type_error.316] invalid UTF-8 byte at index 145: 0x27
#11738 closed Mar 25, 2025
Misc. bug: Could not find backend when using -O0/-Og
#11748 closed Mar 25, 2025
Feature Request: Can I choose which layer to offload when using -ngl option?
#11752 closed Mar 25, 2025
Misc. bug: Quantizing Olmo models with imatrix failing on some sizes
#11764 closed Mar 25, 2025
Eval bug: Issue with mradermacher/Nous-Hermes-2-Mixtral-8x7B-DPO-GGUF - Only works with version b4927, newer versions fail with strange output
#12504 closed Mar 24, 2025
Misc. bug: server metrics sometimes return "-nan" values
#11868 closed Mar 24, 2025
Compile bug: GPU Acceleration Produces Garbled Output on AMD RX 9070 with ROCm (hipBLAS Error Fallback to Tensile)
#12527 closed Mar 24, 2025
Misc. bug: Failed to convert Mistral-Small-3.1-24B-Instruct-2503
#12524 closed Mar 24, 2025
Feature Request: Add support for SmolVLM
#10877 closed Mar 24, 2025
Eval bug: ggml_sycl_cpy: unsupported type combination (q8_0 to f32)
#11078 closed Mar 24, 2025
Feature Request: Add support for SmolVLM-250M
#11682 closed Mar 24, 2025
Eval bug: <|Assistant|> vs <｜Assistant｜>
#11704 closed Mar 24, 2025
Feature Request: Console Compatibility for Llama.cpp (PS5 & Xbox)
#11732 closed Mar 24, 2025
Compile bug: libggml-cpu.so does not build reproducibily
#11735 closed Mar 24, 2025
Eval bug: A Silu operand overflow occurred , causing the program to malfunction.
#12523 closed Mar 23, 2025
Feature Request: allow to run on CPU despite backend initialization failure.
#11584 closed Mar 23, 2025
Eval bug: Inconsistent <think> Tag Output in simple-chat vs. llama-cli with DeepSeek-R1-Distill-Qwen-7B-Q4_K_M Model
#11702 closed Mar 23, 2025
Misc. bug: The test-chat fails with std::runtime_error
#11705 closed Mar 23, 2025
convert_hf_to_gguf.py: Can not map tensor 'lm_head.weight' on Gemma-3-12b-it
#12483 closed Mar 22, 2025
Misc. bug: ggml files conflict between llama.cpp and whisper.cpp
#11303 closed Mar 22, 2025
Compile bug: Vulkan can not work on Android (cross-compilation from linux) - Aborted without explaination
#11327 closed Mar 22, 2025
Eval bug: using rpc，report error [Inferior 1 (process 290070) detached]
#11431 closed Mar 22, 2025
Compile bug: Nix + cross compilation + Vulkan doesn't work
#11654 closed Mar 22, 2025
Eval bug: Segmentation fault on image encoder quantization
#11683 closed Mar 22, 2025
Feature Request: Support Rocm for Hipblaslt
#12464 closed Mar 21, 2025
Misc. bug: -sm row produces gibberish
#12340 closed Mar 21, 2025
Misc. bug: Drastic drop in Vulkan performance somewhere between builds b4916 (was fast) and b4932 (roasting CPU and seems to barely use GPU)
#12490 closed Mar 21, 2025
Eval bug: Slow prompt processing with Q4_K_S
#12481 closed Mar 21, 2025
Misc. bug: AMD Rcom command error only with cli tools
#11509 closed Mar 21, 2025
Compile bug: ARMv7 NEON FP16 Intrinsic Errors When Cross-Compiling with Android NDK r26b
#11636 closed Mar 21, 2025
Eval bug: qwen2-vl failed to process while using the HIP in windows 11
#11638 closed Mar 21, 2025
Misc. bug: non-CPU compilation and forcing GPU
#12346 closed Mar 20, 2025
Misc. bug: Llama-Server is missing --Prompt-Cache from Llama-CLI
#12437 closed Mar 20, 2025
Feature Request: add support for nvidia/Llama-3.3-Nemotron-70B-Select
#12461 closed Mar 20, 2025
Misc. bug: webui: extreme sluggish performance typing into textarea with long-context conversations
#11813 closed Mar 20, 2025
Why are the data copied by kv cache and the data after rope operation not equal
#12475 closed Mar 20, 2025
Bug: SwiftUI example does not work on simulator.
#10089 closed Mar 20, 2025
Misc. bug: MUSA error with ggml_cuda_op_mul_mat on some MUSA gpus
#12419 closed Mar 20, 2025

33 Issues opened by 32 people

[New Bitnet Model Support Request] Deepgrove model Bonsai 0.5B - Add Channel Scales
#12598 opened Mar 27, 2025
Misc. bug: "Unexpected empty grammar stack after accepting piece" tool crash
#12597 opened Mar 26, 2025
Eval bug: Incorrect n_gpu_layer settings for MoE models
#12596 opened Mar 26, 2025
Eval bug: run failed when run lora adapter(no merged) on android
#12592 opened Mar 26, 2025
Eval bug: got exception: {"code":500,"message":"Unsupported param: echo","type":"server_error"}
#12591 opened Mar 26, 2025
Eval bug: T5Encoder support broken
#12588 opened Mar 26, 2025
Misc. bug: Server crash with use of lora on CPU
#12587 opened Mar 26, 2025
Eval bug: allocating 114296.55 MiB on device 0: cudaMalloc failed: out of memory
#12586 opened Mar 26, 2025
Qwen2.5-vl support and conversion？
#12584 opened Mar 26, 2025
Compile bug: vulkan-shaders-gen hangs when built with address sanitizers
#12581 opened Mar 26, 2025
-ngl to load ·last n layers· to gpu
#12577 opened Mar 26, 2025
Misc. bug: performance drop with 2x SYCL GPUs
#12575 opened Mar 25, 2025
Eval bug: Using llama-llava-clip-quantize-cli under CUDA backend conditions will encounter a crash.
#12564 opened Mar 25, 2025
Eval bug: the swiftui keeps saying the same thing
#12558 opened Mar 25, 2025
Misc. bug: vulkan: performance regression after fd123cfead49eb32e386e26b8ef7a6d41554dda5
#12553 opened Mar 24, 2025
Eval bug: crash when pooling_type == LLAMA_POOLING_TYPE_MEAN
#12543 opened Mar 24, 2025
why assert(!isnan(wp[i])) in softmax_forward function
#12542 opened Mar 24, 2025
Eval bug: Accuracy is dropped when I convert model to gguf. Qwen2_VL_7B_Instruct
#12538 opened Mar 24, 2025
Eval bug: seemed it cannot convert theQwen2.5-VL-7B-Instruct, please help advice, Thank you.
#12534 opened Mar 24, 2025
Potential memory allocation leak
#12531 opened Mar 23, 2025
Misc. bug: Flash attention on Vulkan
#12526 opened Mar 23, 2025
Feature Request: Direct way to check the status of the abort mechanism.
#12525 opened Mar 23, 2025
Misc. bug: test-backend-ops grad crash by GGML_ASSERT error
#12520 opened Mar 22, 2025
Eval bug: llama.swiftui Unexpectedly found nil while unwrapping an Optional value
#12510 opened Mar 22, 2025
Misc. bug: Crashing, forcing BMI2 on non BMI2 CPUs
#12500 opened Mar 21, 2025
llama-gemma3-cli: output degeneration after repeated uses
#12499 opened Mar 21, 2025
tts : add support for SparkTTS
#12495 opened Mar 21, 2025
Error while converting peft finetuned merged model to gguf
#12494 opened Mar 21, 2025
Compile bug: Error build llama cpp on CUDA
#12491 opened Mar 21, 2025
Feature Request: deep/ recurrent processing like "thinking", but script based.
#12486 opened Mar 21, 2025
Feature Request: New sampling method that boosts reasoning performance - looks too good?
#12479 opened Mar 20, 2025
Compile bug: Build failure for Intel oneMKL on Windows
#12478 opened Mar 20, 2025
tts : add support for Orpheus
#12476 opened Mar 20, 2025

81 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

llama : add llama_batch_ext
#11875 commented on Mar 25, 2025 • 23 new comments
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on Mar 27, 2025 • 20 new comments
Add PLM GGUF Conversion & Inference Support
#12457 commented on Mar 24, 2025 • 4 new comments
`tool-call`: Phi-4 support
#12288 commented on Mar 24, 2025 • 3 new comments
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on Mar 22, 2025 • 2 new comments
Eval bug: context shift is disabled
#11974 commented on Mar 27, 2025 • 0 new comments
Eval bug: Error when converting moonlight from bf16 to q4km
#12040 commented on Mar 27, 2025 • 0 new comments
Compile bug: llama.cpp-b4749/ggml/src/ggml-cpu/ggml-cpu-quants.c:5141:26: error: initialization of ‘uint32_t *’ {aka ‘unsigned int *’} from incompatible pointer type ‘const uint8_t (*)[12]’ {aka ‘const unsigned char (*)[12]’} [-Wincompatible-pointer-types]
#12050 commented on Mar 27, 2025 • 0 new comments
Misc. bug: cannot scroll to right side when input too long
#12054 commented on Mar 27, 2025 • 0 new comments
Feature Request: Qwen 2.5 VL
#11483 commented on Mar 26, 2025 • 0 new comments
Possible solution for poor token generation performance in llama.cpp on dual Epyc Genoa/Turin systems
#11744 commented on Mar 26, 2025 • 0 new comments
Misc. bug: auto scroll doesn't work in WebUI
#12362 commented on Mar 25, 2025 • 0 new comments
csm : implement Sesame-based conversation example
#12392 commented on Mar 25, 2025 • 0 new comments
Feature Request: Add support for Kokoro TTS
#11050 commented on Mar 25, 2025 • 0 new comments
Study how LM Evaluation Harness works and try to implement it
#231 commented on Mar 25, 2025 • 0 new comments
Feature Request: RPC offloading using a local model copy
#10095 commented on Mar 25, 2025 • 0 new comments
Eval bug: inference of 32B eats too much memory on ROCM HIP (5x AMD Radeon Instinct Mi50 (gfx906))
#12369 commented on Mar 25, 2025 • 0 new comments
Eval bug: CPU usage is abnormal when running deepseek-r1-671B-Q4_0 weights in Atlas 800T a2 and NPU device。
#11966 commented on Mar 25, 2025 • 0 new comments
Misc. bug: llama-cli: error while loading shared libraries: libllama.so: cannot open shared object file: No such file or directory
#11267 commented on Mar 25, 2025 • 0 new comments
Feature Request: Prefix assistant answer
#11536 commented on Mar 25, 2025 • 0 new comments
Feature Request: when llama.cpp can support convert qwen2.5 VL 7B/72B model to gguf?
#11541 commented on Mar 25, 2025 • 0 new comments
Feature Request: allow mmap to take advantage of hugepage feature which has 10x speedup
#12444 commented on Mar 24, 2025 • 0 new comments
Feature Request: Add Support for ModernBert
#11282 commented on Mar 24, 2025 • 0 new comments
ci: add Linux cross-compile build
#12428 commented on Mar 21, 2025 • 0 new comments
SYCL: Remove misleading ggml_sycl_op_flatten function
#12387 commented on Mar 22, 2025 • 0 new comments
`server`: streaming of tool calls and thoughts when `--jinja` is on
#12379 commented on Mar 23, 2025 • 0 new comments
vulkan: fix coopmat shader generation when cross-compiling
#12272 commented on Mar 27, 2025 • 0 new comments
SYCL: Rename oneMKL to oneMath
#12192 commented on Mar 20, 2025 • 0 new comments
Vulkan: Add DP4A MMQ and Q8_1 quantization shader
#12135 commented on Mar 26, 2025 • 0 new comments
Supporting Velvet model
#11716 commented on Mar 26, 2025 • 0 new comments
Add support for Deepseek-R1 flash attention
#11557 commented on Mar 26, 2025 • 0 new comments
tool-call: add support for tool-calls using Model Context Protocol
#11556 commented on Mar 21, 2025 • 0 new comments
Introduce Graph Profiler
#9659 commented on Mar 20, 2025 • 0 new comments
llama : initial Mamba-2 support
#9126 commented on Mar 24, 2025 • 0 new comments
Simplify and improve CUDA graphs through use of indirect copy pointers
#9017 commented on Mar 26, 2025 • 0 new comments
server: Windows 7 compatibility
#8208 commented on Mar 20, 2025 • 0 new comments
Compile bug: Emulated Linux ARM64 CPU build fails
#10933 commented on Mar 27, 2025 • 0 new comments
Bug: Cannot run larger than VRAM models with `GGML_CUDA_ENABLE_UNIFIED_MEMORY`
#10091 commented on Mar 27, 2025 • 0 new comments
Misc. bug: [SERVER] Multiple slots, generation speed is degraded after each generation/slot used
#10860 commented on Mar 27, 2025 • 0 new comments
Compile bug:
#11930 commented on Mar 27, 2025 • 0 new comments
Feature Request: 推理minicpmv时，encoding_image_with_clip耗时很久
#11941 commented on Mar 27, 2025 • 0 new comments
how many rpc-host should I start on remote server
#11859 commented on Mar 22, 2025 • 0 new comments
Misc. bug: RPC attempt fails with a specific error, but I cannot find any info on troubleshooting it
#11929 commented on Mar 22, 2025 • 0 new comments
Eval bug: llama.cpp Incorrectly Parses and Reports sprintf Calls in C++ Code
#11951 commented on Mar 22, 2025 • 0 new comments
Misc. bug: Segmentation fault when importing model to opencl buffer
#11953 commented on Mar 22, 2025 • 0 new comments
bamba
#11955 commented on Mar 22, 2025 • 0 new comments
Eval bug: Gemma-3 vision don't work multilingual
#12351 commented on Mar 21, 2025 • 0 new comments
Misc. bug: --no-context-shift OR --context-shift ?
#12038 commented on Mar 21, 2025 • 0 new comments
Compile bug: ios swift xcode build error when upgrade to llama : use cmake for swift build
#10747 commented on Mar 21, 2025 • 0 new comments
Eval bug: Gemma3 <unused32> spam
#12433 commented on Mar 21, 2025 • 0 new comments
server : improvements and maintenance
#4216 commented on Mar 21, 2025 • 0 new comments
Feature Request: (webui) Implement a experimental features on webui
#11662 commented on Mar 21, 2025 • 0 new comments
Compile bug: C++ One Definition Rule [-Wodr] violations in common/json.hpp
#11876 commented on Mar 21, 2025 • 0 new comments
Misc. bug: ROCm images cannot be found
#11913 commented on Mar 21, 2025 • 0 new comments
Eval bug: Segmentation fault with Docker ROCm image "full-rocm"
#11947 commented on Mar 21, 2025 • 0 new comments
Eval bug: Loading fail on Gemma 3:12b > llama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
#12367 commented on Mar 20, 2025 • 0 new comments
Misc. bug: llama-cli '--log-disable' parameter omits response
#11983 commented on Mar 20, 2025 • 0 new comments
Eval bug: does llama.cpp support Intel AMX instruction? how to enable it
#12003 commented on Mar 20, 2025 • 0 new comments
llama cpp android gpu
#12462 commented on Mar 20, 2025 • 0 new comments
Eval bug: getting assertion error when trying to use a gguf quantized model at inference "GGML_ASSERT(n_outputs_enc > 0 && "call llama_encode() first") failed"
#12080 commented on Mar 20, 2025 • 0 new comments
Bug tracker: (webui/experimental) Python interpreter via pyodide
#11762 commented on Mar 20, 2025 • 0 new comments
Feature Request: MoE only load activated expert(s) to GPU while rest non-used experts are not loaded (to CPU/GPU) for DeekSeek-R1 Inference on consumer GPU
#11532 commented on Mar 24, 2025 • 0 new comments
Feature Request: allow setting jinja chat template from server webui
#11689 commented on Mar 24, 2025 • 0 new comments
Feature Request: (webui) add import / export function for ALL conversations
#11718 commented on Mar 24, 2025 • 0 new comments
Feature Request: (webui) read data from /props endpoint and use it on the webui
#11717 commented on Mar 24, 2025 • 0 new comments
GGML to GGUF FAIL Quantized tensor bytes per row (5120) is not a multiple of Q2_K type size (84)
#11976 commented on Mar 24, 2025 • 0 new comments
[Tracker] Docker build fails on CI for arm64
#11888 commented on Mar 23, 2025 • 0 new comments
[Feature request] Any plans for AMD XDNA AI Engine support on Ryzen 7x40 processors?
#1499 commented on Mar 23, 2025 • 0 new comments
Compile bug: Compilation fails due to -D_XOPEN_SOURCE=600: error: use of undeclared identifier 'strnlen'
#11095 commented on Mar 23, 2025 • 0 new comments
Misc. bug: convert_hf_to_gguf.py: ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.weight_scale'
#11122 commented on Mar 23, 2025 • 0 new comments
Eval bug: GGML_SCHED_MAX_BACKENDS assert error
#11433 commented on Mar 23, 2025 • 0 new comments
Misc. bug: llama-server web interface doesn't work in Firefox
#11563 commented on Mar 23, 2025 • 0 new comments
Feature Request: Implement CodeGenForCausalLM
#11789 commented on Mar 23, 2025 • 0 new comments
Eval bug: Unexpected empty grammar stack after accepting piece: <｜tool_calls_begin｜> on DeepSeek-R1-Distill-Qwen-32B
#11938 commented on Mar 23, 2025 • 0 new comments
Eval bug: Ram boom after using llama-bench with cuda12.8 and deepseekr1q6
#11965 commented on Mar 23, 2025 • 0 new comments
tensor 'blk.25.ffn_down.weight' has invalid ggml type 42 (NONE)
#11975 commented on Mar 23, 2025 • 0 new comments
Feature Request: add Kernel level verbose option
#11985 commented on Mar 23, 2025 • 0 new comments
Eval bug: input is too large to process. increase the physical batch size
#12295 commented on Mar 22, 2025 • 0 new comments
Misc. bug: Buffer offset is not aligned on macOS / Intel / Vulkan
#10984 commented on Mar 22, 2025 • 0 new comments
Misc. bug: Failed to convert `MiniCPM-o-2_6`
#11347 commented on Mar 22, 2025 • 0 new comments
Feature Request: YuE (music gen)
#11467 commented on Mar 22, 2025 • 0 new comments