-
Notifications
You must be signed in to change notification settings - Fork 11.2k
Insights: ggml-org/llama.cpp
Overview
Could not load contribution data
Please try again later
27 Releases published by 1 person
-
b4929
published
Mar 20, 2025 -
b4930
published
Mar 20, 2025 -
b4932
published
Mar 21, 2025 -
b4933
published
Mar 21, 2025 -
b4934
published
Mar 21, 2025 -
b4935
published
Mar 21, 2025 -
b4936
published
Mar 21, 2025 -
b4937
published
Mar 21, 2025 -
b4938
published
Mar 21, 2025 -
b4939
published
Mar 22, 2025 -
b4940
published
Mar 22, 2025 -
b4942
published
Mar 22, 2025 -
b4944
published
Mar 23, 2025 -
b4945
published
Mar 24, 2025 -
b4946
published
Mar 24, 2025 -
b4947
published
Mar 24, 2025 -
b4948
published
Mar 24, 2025 -
b4951
published
Mar 24, 2025 -
b4953
published
Mar 25, 2025 -
b4956
published
Mar 25, 2025 -
b4957
published
Mar 25, 2025 -
b4958
published
Mar 25, 2025 -
b4961
published
Mar 26, 2025 -
b4963
published
Mar 26, 2025 -
b4964
published
Mar 26, 2025 -
b4966
published
Mar 26, 2025 -
b4967
published
Mar 27, 2025
40 Pull requests merged by 25 people
-
SYCL: implement memset ggml backend buffer interface
#12580 merged
Mar 27, 2025 -
Add support for new gfx1200 and gfx1201 targets
#12372 merged
Mar 26, 2025 -
metal : refactor mat-vec code
#12569 merged
Mar 26, 2025 -
grammars: upgrade to llguidance 0.7.10
#12576 merged
Mar 26, 2025 -
clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend
#12566 merged
Mar 26, 2025 -
convert : fix squeeze for ssm_conv tensors
#12573 merged
Mar 26, 2025 -
ggml : fix MUL_MAT_ID repack with Q8_K
#12544 merged
Mar 26, 2025 -
doc: [MUSA] minor changes
#12583 merged
Mar 26, 2025 -
convert: fix Mistral3/Gemma3 model hparams init
#12571 merged
Mar 25, 2025 -
De-duplicate fmt and format functions and optimize
#11596 merged
Mar 25, 2025 -
ggml-cpu : bug fix related to KleidiAI multithreaded LHS packing
#12568 merged
Mar 25, 2025 -
SYCL: disable Q4_0 reorder optimization by default
#12560 merged
Mar 25, 2025 -
docs : add build instructions for KleidiAI
#12563 merged
Mar 25, 2025 -
ci: [MUSA] add CI and update doc
#12562 merged
Mar 25, 2025 -
context : fix worst-case reserve outputs
#12545 merged
Mar 25, 2025 -
ci: [SYCL] Use main GPU and enable sysman
#12547 merged
Mar 24, 2025 -
opencl: simplify kernel embedding logic in CMakeLists.txt
#12503 merged
Mar 24, 2025 -
ci: fix SYCL build
#12546 merged
Mar 24, 2025 -
docs: update: improve the Fedoa CUDA guide
#12536 merged
Mar 24, 2025 -
llama-vocab : add SuperBPE pre-tokenizer
#12532 merged
Mar 24, 2025 -
Fix clang warnings
#12540 merged
Mar 24, 2025 -
Issues while building on AIX OS
#12541 merged
Mar 24, 2025 -
vulkan: fix mul_mat_vec failure in backend tests
#12529 merged
Mar 24, 2025 -
server : Add verbose output to OAI compatible chat endpoint.
#12246 merged
Mar 23, 2025 -
Update install.md to include MacPorts section
#12518 merged
Mar 23, 2025 -
llama : gemma3 : use output tensor if it exists in model weight
#12506 merged
Mar 22, 2025 -
ggml : fix quantized cpy op
#12310 merged
Mar 22, 2025 -
musa: refine compute capability
#12493 merged
Mar 22, 2025 -
vulkan: Optimize mul_mat_vec p021 and nc shaders
#12505 merged
Mar 22, 2025 -
Vulkan: RTE rounding for cpy to quant
#12480 merged
Mar 21, 2025 -
vulkan: workaround for #10710 and #12147 16 bit unpack8 bug
#12472 merged
Mar 21, 2025 -
model : do not repack if a GPU device is present
#12498 merged
Mar 21, 2025 -
chore : cleanup llama_model_loader::TENSOR_ usage
#12492 merged
Mar 21, 2025 -
llama-tts : avoid crashes related to bad model file paths
#12482 merged
Mar 21, 2025 -
[SYCL] Fix build on Windows when ccache enabled (#9954)
#9976 merged
Mar 21, 2025 -
sycl: cleanup oneDNN related code
#12097 merged
Mar 21, 2025 -
webui: Stop rerender on textarea input and end the devastating lag
#12299 merged
Mar 20, 2025 -
llama : make Qwen2MoE QKV bias optional
#12477 merged
Mar 20, 2025 -
Block interleaving support for Q4_K quantization for x86 AVX2 architecture
#12332 merged
Mar 20, 2025 -
Avoid calls to tokenizer.added_tokens_decoder
#12473 merged
Mar 20, 2025
25 Pull requests opened by 22 people
-
Metal TQ2_0
#12485 opened
Mar 20, 2025 -
(draft) tts: Orpheus support
#12487 opened
Mar 21, 2025 -
Evenly and stably pinning thread pool
#12488 opened
Mar 21, 2025 -
llamafile : ppc64le MMA implementation for Q4_0.
#12489 opened
Mar 21, 2025 -
rpc : send hash when tensor data is above some fixed threshold
#12496 opened
Mar 21, 2025 -
llama: support Qwen3
#12501 opened
Mar 21, 2025 -
cmake: Allow to configure GGML_BUILD_NUMBER with file
#12509 opened
Mar 22, 2025 -
quantize: Handle user-defined quantization levels for additional tensors
#12511 opened
Mar 22, 2025 -
perplexity: Add option to ignore context window overflow errors and continue score calculation
#12512 opened
Mar 22, 2025 -
llama-tts : precompute irFFT theta
#12514 opened
Mar 22, 2025 -
Vulkan: Remove dedicated aligned matrix matrix multiplication shaders
#12515 opened
Mar 22, 2025 -
cmake: fix ccache conflict
#12522 opened
Mar 23, 2025 -
ggml : riscv: add 128-bit RVV support
#12530 opened
Mar 23, 2025 -
(draft) tts: Sesame support
#12549 opened
Mar 24, 2025 -
llama-map to support hugepage feature of pagesize 2M or 1G which can …
#12552 opened
Mar 24, 2025 -
Draft: vulkan: Add bfloat16 support
#12554 opened
Mar 24, 2025 -
Add Trillion 7B model support
#12556 opened
Mar 25, 2025 -
ggml-quants : weighted rounding algorithms with cumulative search
#12557 opened
Mar 25, 2025 -
vulkan: Implement grouped query attention in the coopmat2 FA shader
#12559 opened
Mar 25, 2025 -
Enable MMA for BF16 data types on Powerpc
#12565 opened
Mar 25, 2025 -
Fix T5Encoder model handling.
#12590 opened
Mar 26, 2025 -
llama : make loras compatible with repacking
#12593 opened
Mar 26, 2025 -
llamafile : ppc64le GEMV forwarding for FP32.
#12594 opened
Mar 26, 2025 -
Support Qwen2_5_VLForConditionalGeneration
#12595 opened
Mar 26, 2025 -
opencl: add multi and vision rope, `gelu_quick` and `im2col`
#12600 opened
Mar 27, 2025
53 Issues closed by 21 people
-
Compile bug: Fails to compile with undefined references in libggml.so
#11562 closed
Mar 27, 2025 -
Eval bug: Abnormal memory usage on Metal backend
#12574 closed
Mar 26, 2025 -
Eval bug: GPU Hang Error on Metal backend
#12277 closed
Mar 26, 2025 -
Misc. bug: Falcon3-Mamba-7B fails on ggml_ssm_conv
#12572 closed
Mar 26, 2025 -
Eval bug: Program not working properly due to new features of "repack Q4_K tensor"
#12528 closed
Mar 26, 2025 -
Misc. bug: All llama executables exit immediately without console output
#10929 closed
Mar 26, 2025 -
Eval bug: error: Double type is not supported on this platform.
#11266 closed
Mar 26, 2025 -
Feature Request: llama-server support continue_final_message
#11755 closed
Mar 26, 2025 -
Misc. bug: embedding example coredump since
#12561 closed
Mar 26, 2025 -
Misc. bug: Gemma3 adapter gguf conversion fails
#12551 closed
Mar 25, 2025 -
GPT2: llama_model_load: error loading model: missing tensor 'output.weight'
#12567 closed
Mar 25, 2025 -
Regression: e0dbec0 (aka #12181) breaks pooled embeddings: mean
#12517 closed
Mar 25, 2025 -
Feature Request: Implement Qwen2Model
#12142 closed
Mar 25, 2025 -
Misc. bug: LLGuidance sampler appears to need special treatment compared to other samplers
#12474 closed
Mar 25, 2025 -
Eval bug: How to load clip_model_load to CUDA
#11250 closed
Mar 25, 2025 -
Misc. bug: [json.exception.type_error.316] invalid UTF-8 byte at index 145: 0x27
#11738 closed
Mar 25, 2025 -
Misc. bug: Could not find backend when using -O0/-Og
#11748 closed
Mar 25, 2025 -
Feature Request: Can I choose which layer to offload when using -ngl option?
#11752 closed
Mar 25, 2025 -
Misc. bug: Quantizing Olmo models with imatrix failing on some sizes
#11764 closed
Mar 25, 2025 -
Misc. bug: server metrics sometimes return "-nan" values
#11868 closed
Mar 24, 2025 -
Misc. bug: Failed to convert Mistral-Small-3.1-24B-Instruct-2503
#12524 closed
Mar 24, 2025 -
Feature Request: Add support for SmolVLM
#10877 closed
Mar 24, 2025 -
Eval bug: ggml_sycl_cpy: unsupported type combination (q8_0 to f32)
#11078 closed
Mar 24, 2025 -
Feature Request: Add support for SmolVLM-250M
#11682 closed
Mar 24, 2025 -
Eval bug: <|Assistant|> vs <|Assistant|>
#11704 closed
Mar 24, 2025 -
Feature Request: Console Compatibility for Llama.cpp (PS5 & Xbox)
#11732 closed
Mar 24, 2025 -
Compile bug: libggml-cpu.so does not build reproducibily
#11735 closed
Mar 24, 2025 -
Eval bug: A Silu operand overflow occurred , causing the program to malfunction.
#12523 closed
Mar 23, 2025 -
Feature Request: allow to run on CPU despite backend initialization failure.
#11584 closed
Mar 23, 2025 -
Misc. bug: The test-chat fails with std::runtime_error
#11705 closed
Mar 23, 2025 -
convert_hf_to_gguf.py: Can not map tensor 'lm_head.weight' on Gemma-3-12b-it
#12483 closed
Mar 22, 2025 -
Misc. bug: ggml files conflict between llama.cpp and whisper.cpp
#11303 closed
Mar 22, 2025 -
Compile bug: Vulkan can not work on Android (cross-compilation from linux) - Aborted without explaination
#11327 closed
Mar 22, 2025 -
Eval bug: using rpc,report error [Inferior 1 (process 290070) detached]
#11431 closed
Mar 22, 2025 -
Compile bug: Nix + cross compilation + Vulkan doesn't work
#11654 closed
Mar 22, 2025 -
Eval bug: Segmentation fault on image encoder quantization
#11683 closed
Mar 22, 2025 -
Feature Request: Support Rocm for Hipblaslt
#12464 closed
Mar 21, 2025 -
Misc. bug: -sm row produces gibberish
#12340 closed
Mar 21, 2025 -
Eval bug: Slow prompt processing with Q4_K_S
#12481 closed
Mar 21, 2025 -
Misc. bug: AMD Rcom command error only with cli tools
#11509 closed
Mar 21, 2025 -
Compile bug: ARMv7 NEON FP16 Intrinsic Errors When Cross-Compiling with Android NDK r26b
#11636 closed
Mar 21, 2025 -
Eval bug: qwen2-vl failed to process while using the HIP in windows 11
#11638 closed
Mar 21, 2025 -
Misc. bug: non-CPU compilation and forcing GPU
#12346 closed
Mar 20, 2025 -
Misc. bug: Llama-Server is missing --Prompt-Cache from Llama-CLI
#12437 closed
Mar 20, 2025 -
Feature Request: add support for nvidia/Llama-3.3-Nemotron-70B-Select
#12461 closed
Mar 20, 2025 -
Misc. bug: webui: extreme sluggish performance typing into textarea with long-context conversations
#11813 closed
Mar 20, 2025 -
Why are the data copied by kv cache and the data after rope operation not equal
#12475 closed
Mar 20, 2025 -
Bug: SwiftUI example does not work on simulator.
#10089 closed
Mar 20, 2025 -
Misc. bug: MUSA error with ggml_cuda_op_mul_mat on some MUSA gpus
#12419 closed
Mar 20, 2025
33 Issues opened by 32 people
-
[New Bitnet Model Support Request] Deepgrove model Bonsai 0.5B - Add Channel Scales
#12598 opened
Mar 27, 2025 -
Misc. bug: "Unexpected empty grammar stack after accepting piece" tool crash
#12597 opened
Mar 26, 2025 -
Eval bug: Incorrect n_gpu_layer settings for MoE models
#12596 opened
Mar 26, 2025 -
Eval bug: run failed when run lora adapter(no merged) on android
#12592 opened
Mar 26, 2025 -
Eval bug: got exception: {"code":500,"message":"Unsupported param: echo","type":"server_error"}
#12591 opened
Mar 26, 2025 -
Eval bug: T5Encoder support broken
#12588 opened
Mar 26, 2025 -
Misc. bug: Server crash with use of lora on CPU
#12587 opened
Mar 26, 2025 -
Eval bug: allocating 114296.55 MiB on device 0: cudaMalloc failed: out of memory
#12586 opened
Mar 26, 2025 -
Qwen2.5-vl support and conversion?
#12584 opened
Mar 26, 2025 -
Compile bug: vulkan-shaders-gen hangs when built with address sanitizers
#12581 opened
Mar 26, 2025 -
-ngl to load ·last n layers· to gpu
#12577 opened
Mar 26, 2025 -
Misc. bug: performance drop with 2x SYCL GPUs
#12575 opened
Mar 25, 2025 -
Eval bug: Using llama-llava-clip-quantize-cli under CUDA backend conditions will encounter a crash.
#12564 opened
Mar 25, 2025 -
Eval bug: the swiftui keeps saying the same thing
#12558 opened
Mar 25, 2025 -
Misc. bug: vulkan: performance regression after fd123cfead49eb32e386e26b8ef7a6d41554dda5
#12553 opened
Mar 24, 2025 -
Eval bug: crash when pooling_type == LLAMA_POOLING_TYPE_MEAN
#12543 opened
Mar 24, 2025 -
why assert(!isnan(wp[i])) in softmax_forward function
#12542 opened
Mar 24, 2025 -
Eval bug: Accuracy is dropped when I convert model to gguf. Qwen2_VL_7B_Instruct
#12538 opened
Mar 24, 2025 -
Eval bug: seemed it cannot convert theQwen2.5-VL-7B-Instruct, please help advice, Thank you.
#12534 opened
Mar 24, 2025 -
Potential memory allocation leak
#12531 opened
Mar 23, 2025 -
Misc. bug: Flash attention on Vulkan
#12526 opened
Mar 23, 2025 -
Feature Request: Direct way to check the status of the abort mechanism.
#12525 opened
Mar 23, 2025 -
Misc. bug: test-backend-ops grad crash by GGML_ASSERT error
#12520 opened
Mar 22, 2025 -
Eval bug: llama.swiftui Unexpectedly found nil while unwrapping an Optional value
#12510 opened
Mar 22, 2025 -
Misc. bug: Crashing, forcing BMI2 on non BMI2 CPUs
#12500 opened
Mar 21, 2025 -
llama-gemma3-cli: output degeneration after repeated uses
#12499 opened
Mar 21, 2025 -
tts : add support for SparkTTS
#12495 opened
Mar 21, 2025 -
Error while converting peft finetuned merged model to gguf
#12494 opened
Mar 21, 2025 -
Compile bug: Error build llama cpp on CUDA
#12491 opened
Mar 21, 2025 -
Feature Request: deep/ recurrent processing like "thinking", but script based.
#12486 opened
Mar 21, 2025 -
Feature Request: New sampling method that boosts reasoning performance - looks too good?
#12479 opened
Mar 20, 2025 -
Compile bug: Build failure for Intel oneMKL on Windows
#12478 opened
Mar 20, 2025 -
tts : add support for Orpheus
#12476 opened
Mar 20, 2025
81 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
llama : add llama_batch_ext
#11875 commented on
Mar 25, 2025 • 23 new comments -
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on
Mar 27, 2025 • 20 new comments -
Add PLM GGUF Conversion & Inference Support
#12457 commented on
Mar 24, 2025 • 4 new comments -
`tool-call`: Phi-4 support
#12288 commented on
Mar 24, 2025 • 3 new comments -
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on
Mar 22, 2025 • 2 new comments -
Eval bug: context shift is disabled
#11974 commented on
Mar 27, 2025 • 0 new comments -
Eval bug: Error when converting moonlight from bf16 to q4km
#12040 commented on
Mar 27, 2025 • 0 new comments -
Compile bug: llama.cpp-b4749/ggml/src/ggml-cpu/ggml-cpu-quants.c:5141:26: error: initialization of ‘uint32_t *’ {aka ‘unsigned int *’} from incompatible pointer type ‘const uint8_t (*)[12]’ {aka ‘const unsigned char (*)[12]’} [-Wincompatible-pointer-types]
#12050 commented on
Mar 27, 2025 • 0 new comments -
Misc. bug: cannot scroll to right side when input too long
#12054 commented on
Mar 27, 2025 • 0 new comments -
Feature Request: Qwen 2.5 VL
#11483 commented on
Mar 26, 2025 • 0 new comments -
Possible solution for poor token generation performance in llama.cpp on dual Epyc Genoa/Turin systems
#11744 commented on
Mar 26, 2025 • 0 new comments -
Misc. bug: auto scroll doesn't work in WebUI
#12362 commented on
Mar 25, 2025 • 0 new comments -
csm : implement Sesame-based conversation example
#12392 commented on
Mar 25, 2025 • 0 new comments -
Feature Request: Add support for Kokoro TTS
#11050 commented on
Mar 25, 2025 • 0 new comments -
Study how LM Evaluation Harness works and try to implement it
#231 commented on
Mar 25, 2025 • 0 new comments -
Feature Request: RPC offloading using a local model copy
#10095 commented on
Mar 25, 2025 • 0 new comments -
Eval bug: inference of 32B eats too much memory on ROCM HIP (5x AMD Radeon Instinct Mi50 (gfx906))
#12369 commented on
Mar 25, 2025 • 0 new comments -
Eval bug: CPU usage is abnormal when running deepseek-r1-671B-Q4_0 weights in Atlas 800T a2 and NPU device。
#11966 commented on
Mar 25, 2025 • 0 new comments -
Misc. bug: llama-cli: error while loading shared libraries: libllama.so: cannot open shared object file: No such file or directory
#11267 commented on
Mar 25, 2025 • 0 new comments -
Feature Request: Prefix assistant answer
#11536 commented on
Mar 25, 2025 • 0 new comments -
Feature Request: when llama.cpp can support convert qwen2.5 VL 7B/72B model to gguf?
#11541 commented on
Mar 25, 2025 • 0 new comments -
Feature Request: allow mmap to take advantage of hugepage feature which has 10x speedup
#12444 commented on
Mar 24, 2025 • 0 new comments -
Feature Request: Add Support for ModernBert
#11282 commented on
Mar 24, 2025 • 0 new comments -
ci: add Linux cross-compile build
#12428 commented on
Mar 21, 2025 • 0 new comments -
SYCL: Remove misleading ggml_sycl_op_flatten function
#12387 commented on
Mar 22, 2025 • 0 new comments -
`server`: streaming of tool calls and thoughts when `--jinja` is on
#12379 commented on
Mar 23, 2025 • 0 new comments -
vulkan: fix coopmat shader generation when cross-compiling
#12272 commented on
Mar 27, 2025 • 0 new comments -
SYCL: Rename oneMKL to oneMath
#12192 commented on
Mar 20, 2025 • 0 new comments -
Vulkan: Add DP4A MMQ and Q8_1 quantization shader
#12135 commented on
Mar 26, 2025 • 0 new comments -
Supporting Velvet model
#11716 commented on
Mar 26, 2025 • 0 new comments -
Add support for Deepseek-R1 flash attention
#11557 commented on
Mar 26, 2025 • 0 new comments -
tool-call: add support for tool-calls using Model Context Protocol
#11556 commented on
Mar 21, 2025 • 0 new comments -
Introduce Graph Profiler
#9659 commented on
Mar 20, 2025 • 0 new comments -
llama : initial Mamba-2 support
#9126 commented on
Mar 24, 2025 • 0 new comments -
Simplify and improve CUDA graphs through use of indirect copy pointers
#9017 commented on
Mar 26, 2025 • 0 new comments -
server: Windows 7 compatibility
#8208 commented on
Mar 20, 2025 • 0 new comments -
Compile bug: Emulated Linux ARM64 CPU build fails
#10933 commented on
Mar 27, 2025 • 0 new comments -
Bug: Cannot run larger than VRAM models with `GGML_CUDA_ENABLE_UNIFIED_MEMORY`
#10091 commented on
Mar 27, 2025 • 0 new comments -
Misc. bug: [SERVER] Multiple slots, generation speed is degraded after each generation/slot used
#10860 commented on
Mar 27, 2025 • 0 new comments -
Compile bug:
#11930 commented on
Mar 27, 2025 • 0 new comments -
Feature Request: 推理minicpmv时,encoding_image_with_clip耗时很久
#11941 commented on
Mar 27, 2025 • 0 new comments -
how many rpc-host should I start on remote server
#11859 commented on
Mar 22, 2025 • 0 new comments -
Misc. bug: RPC attempt fails with a specific error, but I cannot find any info on troubleshooting it
#11929 commented on
Mar 22, 2025 • 0 new comments -
Eval bug: llama.cpp Incorrectly Parses and Reports sprintf Calls in C++ Code
#11951 commented on
Mar 22, 2025 • 0 new comments -
Misc. bug: Segmentation fault when importing model to opencl buffer
#11953 commented on
Mar 22, 2025 • 0 new comments -
bamba
#11955 commented on
Mar 22, 2025 • 0 new comments -
Eval bug: Gemma-3 vision don't work multilingual
#12351 commented on
Mar 21, 2025 • 0 new comments -
Misc. bug: --no-context-shift OR --context-shift ?
#12038 commented on
Mar 21, 2025 • 0 new comments -
Compile bug: ios swift xcode build error when upgrade to llama : use cmake for swift build
#10747 commented on
Mar 21, 2025 • 0 new comments -
Eval bug: Gemma3 <unused32> spam
#12433 commented on
Mar 21, 2025 • 0 new comments -
server : improvements and maintenance
#4216 commented on
Mar 21, 2025 • 0 new comments -
Feature Request: (webui) Implement a experimental features on webui
#11662 commented on
Mar 21, 2025 • 0 new comments -
Compile bug: C++ One Definition Rule [-Wodr] violations in common/json.hpp
#11876 commented on
Mar 21, 2025 • 0 new comments -
Misc. bug: ROCm images cannot be found
#11913 commented on
Mar 21, 2025 • 0 new comments -
Eval bug: Segmentation fault with Docker ROCm image "full-rocm"
#11947 commented on
Mar 21, 2025 • 0 new comments -
Eval bug: Loading fail on Gemma 3:12b > llama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
#12367 commented on
Mar 20, 2025 • 0 new comments -
Misc. bug: llama-cli '--log-disable' parameter omits response
#11983 commented on
Mar 20, 2025 • 0 new comments -
Eval bug: does llama.cpp support Intel AMX instruction? how to enable it
#12003 commented on
Mar 20, 2025 • 0 new comments -
llama cpp android gpu
#12462 commented on
Mar 20, 2025 • 0 new comments -
Eval bug: getting assertion error when trying to use a gguf quantized model at inference "GGML_ASSERT(n_outputs_enc > 0 && "call llama_encode() first") failed"
#12080 commented on
Mar 20, 2025 • 0 new comments -
Bug tracker: (webui/experimental) Python interpreter via pyodide
#11762 commented on
Mar 20, 2025 • 0 new comments -
Feature Request: MoE only load activated expert(s) to GPU while rest non-used experts are not loaded (to CPU/GPU) for DeekSeek-R1 Inference on consumer GPU
#11532 commented on
Mar 24, 2025 • 0 new comments -
Feature Request: allow setting jinja chat template from server webui
#11689 commented on
Mar 24, 2025 • 0 new comments -
Feature Request: (webui) add import / export function for ALL conversations
#11718 commented on
Mar 24, 2025 • 0 new comments -
Feature Request: (webui) read data from /props endpoint and use it on the webui
#11717 commented on
Mar 24, 2025 • 0 new comments -
GGML to GGUF FAIL Quantized tensor bytes per row (5120) is not a multiple of Q2_K type size (84)
#11976 commented on
Mar 24, 2025 • 0 new comments -
[Tracker] Docker build fails on CI for arm64
#11888 commented on
Mar 23, 2025 • 0 new comments -
[Feature request] Any plans for AMD XDNA AI Engine support on Ryzen 7x40 processors?
#1499 commented on
Mar 23, 2025 • 0 new comments -
Compile bug: Compilation fails due to -D_XOPEN_SOURCE=600: error: use of undeclared identifier 'strnlen'
#11095 commented on
Mar 23, 2025 • 0 new comments -
Misc. bug: convert_hf_to_gguf.py: ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.weight_scale'
#11122 commented on
Mar 23, 2025 • 0 new comments -
Eval bug: GGML_SCHED_MAX_BACKENDS assert error
#11433 commented on
Mar 23, 2025 • 0 new comments -
Misc. bug: llama-server web interface doesn't work in Firefox
#11563 commented on
Mar 23, 2025 • 0 new comments -
Feature Request: Implement CodeGenForCausalLM
#11789 commented on
Mar 23, 2025 • 0 new comments -
Eval bug: Unexpected empty grammar stack after accepting piece: <|tool_calls_begin|> on DeepSeek-R1-Distill-Qwen-32B
#11938 commented on
Mar 23, 2025 • 0 new comments -
Eval bug: Ram boom after using llama-bench with cuda12.8 and deepseekr1q6
#11965 commented on
Mar 23, 2025 • 0 new comments -
tensor 'blk.25.ffn_down.weight' has invalid ggml type 42 (NONE)
#11975 commented on
Mar 23, 2025 • 0 new comments -
Feature Request: add Kernel level verbose option
#11985 commented on
Mar 23, 2025 • 0 new comments -
Eval bug: input is too large to process. increase the physical batch size
#12295 commented on
Mar 22, 2025 • 0 new comments -
Misc. bug: Buffer offset is not aligned on macOS / Intel / Vulkan
#10984 commented on
Mar 22, 2025 • 0 new comments -
Misc. bug: Failed to convert `MiniCPM-o-2_6`
#11347 commented on
Mar 22, 2025 • 0 new comments -
Feature Request: YuE (music gen)
#11467 commented on
Mar 22, 2025 • 0 new comments