Skip to content

Releases: ggml-org/llama.cpp

b5723

20 Jun 16:34
c959f46
Compare
Choose a tag to compare
CUDA: add conv_2d_transpose (#14287)

* CUDA: add conv_2d_transpose

* remove direct include of cuda_fp16

* Review: add brackets for readability, remove ggml_set_param and add asserts

b5722

20 Jun 16:35
22015b2
Compare
Choose a tag to compare
lint : remove trailing whitepace (#14304)

b5721

20 Jun 16:07
dd6e6d0
Compare
Choose a tag to compare
vocab : prevent tokenizer overflow (#14301)

* vocab : prevent stack overflow in tokenize

* vocab : return error instead of aborting on oversized token count

* vocab : INT32_MIN from llama_tokenize on overflow

b5720

20 Jun 14:25
8308f98
Compare
Choose a tag to compare
sycl: add usage of enqueue_functions extension (#14244)

* Add header and namespace to use enqueue_functions extension

* Convert submit and parallel_for to use new extension in convert.cpp

* Convert submit and parallel_for to use extension in ggml-sycl.cpp

* Convert submit and parallel_for to use extension in gla.cpp

* Convert submit and parallel_for in mmq.cpp

* Convert submit and parallel_for in mmvq.cpp

* Convert submit and parallel_for in remaining files

* Convert all simple parallel_for to nd_launch from enqueue_functions
extension

* Wrapping extension in general function

Create a general function that enable the enqueue_functions extension if
it is enable in the compiler, otherwise call the general SYCL function
to launch kernels.

---------

Signed-off-by: nscipione <nicolo.scipione@codeplay.com>

b5719

20 Jun 14:27
6369be0
Compare
Choose a tag to compare
Implement GGML_CPU_ALL_VARIANTS for PowerPC (#14286)

* Add PowerPC feature detection and scoring

* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for PowerPC

* ggml-cpu: Delay some initializations until function is called

When using GGML_BACKEND_DL=ON, these initializations might use
instructions that are not supported by the current CPU.

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>

b5718

20 Jun 14:18
88fc854
Compare
Choose a tag to compare
llama : improve sep token handling (#14272)

b5717

20 Jun 13:34
e28c1b9
Compare
Choose a tag to compare
cuda : synchronize graph capture and cublas handle destruction (#14288)

Workarounds an issue that may cause CUDA graph capture to fail when a cuBLAS handle is destroyed in a different thread

b5716

20 Jun 10:09
d27b3ca
Compare
Choose a tag to compare
ggml : fix repack work size for mul_mat_id (#14292)

ggml-ci

b5715

20 Jun 09:43
9230dbe
Compare
Choose a tag to compare
ggml: Update KleidiAI to v1.9.0 (#14277)

b5714

20 Jun 09:32
812939a
Compare
Choose a tag to compare
model : more uniform output id handling (#14275)

* model : more uniform output id handling

ggml-ci

* cont : revert n_outputs < n_tokens optimization

ggml-ci

* cont : fix out_ids initialization

ggml-ci