Skip to content

Tags: ggml-org/llama.cpp

Tags

b6919

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
scripts : add script to bench models (#16894)

b6916

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
vendor : update cpp-httplib to 0.27.0 (#16846)

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

b6915

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
mtmd: refactor preprocessing + support max/min pixels (#16878)

* mtmd: refactor preprocessing + support max/min pixels

* fix mlp type

* implement mix/max pixels

* improve hparams

* better image preproc for qwen

* fix

* fix out of bound composite

* fix (2)

* fix token calculation

* get_merge_kernel_size()

* fix llama4 and lfm2

* gonna fix them all

* use simple resize for qwen

* qwen: increase min tokens

* no resize if dst size == src size

* restore to initial min/max tokens value for qwen

b6912

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
common : allow --system-prompt-file for diffusion-cli (#16903)

b6910

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
vulkan: Fix multi_add invalid descriptor usage (#16899)

b6909

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
vulkan: fuse mul_mat+add and mul_mat_id+add_id (#16868)

* vulkan: fuse mul_mat+add and mul_mat_id+add_id

The fusion is only applied for the mat-vec mul paths.

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* fix 32b build

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

b6908

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CUDA: Remove unneded bias/gate dims in fused mmvq (#16858)

* CUDA: Remove unneded bias/gate dims in fused mmvq

Pointed out
[here](#16847 (comment))
that only a single value is needed per target col per thread

* Apply suggestions from code review

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Fix "Error 991-D: extra braces are nonstandard" during compilation

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

b6907

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
refactor : llama-model.cpp (#16252)

* Sqashed: llama-model.cpp refactoring

* Fix formatting of attn / ffn / ffn_moe calls

* Fix import regression / unify spacing in models.h

* totally DID NOT miss those!

* Add missing qwen3vl(moe) models

* Add missing new .cpp files to build

* Remove extra semicolons

* Editor checker

* Update src/models/models.h

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

b6906

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
model : Minimax M2 (#16831)

* Model: Minimax M2

* Cleanup

* Cleanup pt. 2

* Cleanup pt. 3

* Update convert_hf_to_gguf_update.py - merge catch blocks

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Remove vocab models and test

* Remove all redundant hparam settings covered by TextModel

* Move super to start, don't set block_count

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update gguf-py/gguf/constants.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

b6905

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
model : add Granite Hybrid nano types (#16896)

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>