Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vulkan: RTE rounding for cpy to quant #12480

Merged
merged 5 commits into from
Mar 21, 2025
Merged

Conversation

stduhpf
Copy link
Contributor

@stduhpf stduhpf commented Mar 20, 2025

Fixes some failing tests
Discussed here: #11166
@jeffbolznv

@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Mar 20, 2025
@jeffbolznv
Copy link
Collaborator

The changes look good to me, but I haven't tested locally.

@stduhpf stduhpf marked this pull request as ready for review March 20, 2025 21:31
@stduhpf
Copy link
Contributor Author

stduhpf commented Mar 20, 2025

I just tested these changes in sdcpp, LoRAs still load properly on quantized models, image quality (with LoRA) seemed slightly better with this PR compared to the current implementation, but that might just be luck or placebo effect.

Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@0cc4m 0cc4m merged commit 4375415 into ggml-org:master Mar 21, 2025
48 checks passed
RealTimeChris pushed a commit to RealTimeChris/llama.cpp that referenced this pull request Mar 22, 2025
* Vulkan: RTE rounding for cpy to quant

Co-Authored-By: Jeff Bolz <jbolz@nvidia.com>

* remove trailing whitespace

* avoid duplicating pipeline_cpy_f32_quant

* fix copypasting issue

* remove duplicated code

---------

Co-Authored-By: Jeff Bolz <jbolz@nvidia.com>
Ivy233 pushed a commit to Ivy233/llama.cpp that referenced this pull request Mar 23, 2025
* Vulkan: RTE rounding for cpy to quant

Co-Authored-By: Jeff Bolz <jbolz@nvidia.com>

* remove trailing whitespace

* avoid duplicating pipeline_cpy_f32_quant

* fix copypasting issue

* remove duplicated code

---------

Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants