Releases · ggml-org/llama.cpp

20 Jun 16:34

c959f46

b5723

CUDA: add conv_2d_transpose (#14287)

* CUDA: add conv_2d_transpose

* remove direct include of cuda_fp16

* Review: add brackets for readability, remove ggml_set_param and add asserts

Assets 15

20 Jun 16:35

github-actions

b5722

22015b2

b5722 Latest

Latest

lint : remove trailing whitepace (#14304)

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6
373 MB 2025-06-20T16:35:36Z
llama-b5722-bin-macos-arm64.zip

sha256:d8d430c312a6fe1fd8128ff9c9cb90f5bb8754188283d38d287a5f5acc7a392b
10.4 MB 2025-06-20T16:35:45Z
llama-b5722-bin-macos-x64.zip

sha256:e7b860b51157f15599641c09617ea45e1909793b4d2974ff655117ee50bfaea3
26.1 MB 2025-06-20T16:35:47Z
llama-b5722-bin-ubuntu-vulkan-x64.zip

sha256:bcd5e05a5c3ebe9281fdcceaab14488fc11bcd5c4fb9c9aeb7f31d3230f652f2
19.9 MB 2025-06-20T16:35:48Z
llama-b5722-bin-ubuntu-x64.zip

sha256:84534e9168bbbdddffecc8d329828c1cec898e695e805452aedc271979443a0d
12.2 MB 2025-06-20T16:35:49Z
llama-b5722-bin-win-cpu-arm64.zip

sha256:f670649be5d3c1d932cdad6216d49d5f339dbb90ba103d7ed8e4a8675306ab9f
10.7 MB 2025-06-20T16:35:50Z
llama-b5722-bin-win-cpu-x64.zip

sha256:ebe2d4fb8049ae2435d9ca1ccaa19207b07da3b4a58cb39cef350d00c44fe234
13.6 MB 2025-06-20T16:35:51Z
llama-b5722-bin-win-cuda-12.4-x64.zip

sha256:0f810384180eee2d9f379bdaf0cb8bd81dc2990cc37f04df6bd4ebabe3eb0447
126 MB 2025-06-20T16:35:52Z
llama-b5722-bin-win-hip-radeon-x64.zip

sha256:68067b2e4877bc6431c23b3951ce9cea4c976017b2cf9737ac51ef6a830caa3f
298 MB 2025-06-20T16:35:56Z
llama-b5722-bin-win-opencl-adreno-arm64.zip

sha256:be035d23a99483fe648267a888f4d2dbb8c69abd1f62b688870299743893d575
11.1 MB 2025-06-20T16:36:04Z
Source code (zip)

2025-06-20T14:37:44Z
Source code (tar.gz)

2025-06-20T14:37:44Z

20 Jun 16:07

github-actions

b5721

dd6e6d0

b5721

vocab : prevent tokenizer overflow (#14301)

* vocab : prevent stack overflow in tokenize

* vocab : return error instead of aborting on oversized token count

* vocab : INT32_MIN from llama_tokenize on overflow

Assets 15

20 Jun 14:25

github-actions

b5720

8308f98

b5720

sycl: add usage of enqueue_functions extension (#14244)

* Add header and namespace to use enqueue_functions extension

* Convert submit and parallel_for to use new extension in convert.cpp

* Convert submit and parallel_for to use extension in ggml-sycl.cpp

* Convert submit and parallel_for to use extension in gla.cpp

* Convert submit and parallel_for in mmq.cpp

* Convert submit and parallel_for in mmvq.cpp

* Convert submit and parallel_for in remaining files

* Convert all simple parallel_for to nd_launch from enqueue_functions
extension

* Wrapping extension in general function

Create a general function that enable the enqueue_functions extension if
it is enable in the compiler, otherwise call the general SYCL function
to launch kernels.

---------

Signed-off-by: nscipione <nicolo.scipione@codeplay.com>

Assets 15

20 Jun 14:27

github-actions

b5719

6369be0

b5719

Implement GGML_CPU_ALL_VARIANTS for PowerPC (#14286)

* Add PowerPC feature detection and scoring

* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for PowerPC

* ggml-cpu: Delay some initializations until function is called

When using GGML_BACKEND_DL=ON, these initializations might use
instructions that are not supported by the current CPU.

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>

Assets 15

20 Jun 14:18

github-actions

b5718

88fc854

b5718

llama : improve sep token handling (#14272)

Assets 15

20 Jun 13:34

github-actions

b5717

e28c1b9

b5717

cuda : synchronize graph capture and cublas handle destruction (#14288)

Workarounds an issue that may cause CUDA graph capture to fail when a cuBLAS handle is destroyed in a different thread

Assets 15

20 Jun 10:09

github-actions

b5716

d27b3ca

b5716

ggml : fix repack work size for mul_mat_id (#14292)

ggml-ci

Assets 15

20 Jun 09:43

github-actions

b5715

9230dbe

b5715

ggml: Update KleidiAI to v1.9.0 (#14277)

Assets 15

20 Jun 09:32

github-actions

b5714

812939a

b5714

model : more uniform output id handling (#14275)

* model : more uniform output id handling

ggml-ci

* cont : revert n_outputs < n_tokens optimization

ggml-ci

* cont : fix out_ids initialization

ggml-ci

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b5723

Uh oh!

b5722

Uh oh!

b5721

Uh oh!

b5720

Uh oh!

b5719

Uh oh!

b5718

Uh oh!

b5717

Uh oh!

b5716

Uh oh!

b5715

Uh oh!

b5714

Uh oh!