clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend #12566

Ivy233 · 2025-03-25T10:08:42Z

Link to issue 12564. This PR provides a possible solution, which is to move all the quantization processes to the CPU backend. Since it is not a real inference process, the performance does not seem to be so sensitive.

Make sure to read the contributing guidelines before submitting a PR

…t will cause ggml_fp16_to_fp32 to report an error when trying to access video memory. You need to switch to the CPU backend to run quantize. After the fix, it will automatically run in the CPU backend and will no longer be bound to CUDA.

ngxson · 2025-03-25T22:05:49Z

examples/llava/clip.h

@@ -45,7 +45,7 @@ struct clip_context_params {
 };

 // deprecated, use clip_init
-CLIP_API struct clip_ctx * clip_model_load(const char * fname, int verbosity);
+CLIP_API struct clip_ctx * clip_model_load(const char * fname, const int verbosity=1, const bool use_gpu=true);


Do not change the signature of this function, many downstream binding will break.

Instead, use clip_init and set use_gpu via clip_context_params

Also the const int verbosity=1 will fail to compile in C, some users use this header file in C projects.

…nd change the call in clip_model_quantize to clip_init.

github-actions bot added the examples label Mar 25, 2025

ngxson requested changes Mar 25, 2025

View reviewed changes

[Fix]Roll back the signature and implementation of clip_model_load, a…

3e18116

…nd change the call in clip_model_quantize to clip_init.

Ivy233 requested a review from ngxson March 26, 2025 11:12

ngxson approved these changes Mar 26, 2025

View reviewed changes

ngxson merged commit 02082f1 into ggml-org:master Mar 26, 2025
48 checks passed

Ivy233 deleted the fix-clip-quantize-use_gpu branch March 27, 2025 07:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend #12566

clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend #12566

Ivy233 commented Mar 25, 2025

ngxson Mar 25, 2025 •

edited

Loading

ngxson Mar 25, 2025

clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend #12566

clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend #12566

Conversation

Ivy233 commented Mar 25, 2025

ngxson Mar 25, 2025 • edited Loading

Choose a reason for hiding this comment

ngxson Mar 25, 2025

Choose a reason for hiding this comment

ngxson Mar 25, 2025 •

edited

Loading