ggml : fix MUL_MAT_ID repack with Q8_K #12544

ggerganov · 2025-03-24T11:31:28Z

fix #12528

The mul_mat_id assumed Q8_0 type for the src1
Fix IQ4_NL param type to by Q8_0 instead of IQ4_NL
Code indentations

ggml-ci

ggerganov · 2025-03-24T11:32:04Z

@Djip007 Could you take a look at these fixes?

Djip007

a very quick review...

Djip007 · 2025-03-24T21:41:49Z

ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp

            for (int64_t i11 = ith * 4; i11 < ne11 - ne11 % 4; i11 += nth * 4) {
                quantize_mat_q8_K((float *) ((char *) src1->data + i11 * nb11), (void *) (wdata + i11 * nbw1), 4, ne10,
                              INTER_SIZE);
            }
        } else {
+            GGML_ASSERT(PARAM_TYPE == GGML_TYPE_Q8_0);


For C++ I'll use in this case

if constexpr (PARAM_TYPE == GGML_TYPE_Q8_0) { ... } if constexpr (PARAM_TYPE == GGML_TYPE_Q8_K) { ... }

and if this is the only to possible may may be some

static_assert( (PARAM_TYPE == GGML_TYPE_Q8_K) || (PARAM_TYPE == GGML_TYPE_Q8_0), "comment");

but it may be trap by adding gem[v/m] PARAM_TYPE as template param

may be remove the if and change (transforme) the quantize_mat_q8_K/quantize_mat_q8_0 to template quantize_mat<PARAM_TYPE>

template <ggml_type PARAM_TYPE> void quantize_mat(...); tempate<> quantize_mat<GGML_TYPE_Q8_0>(...) { quantize_mat_q8_0(...); // or "inline" it. } tempate<> quantize_mat<GGML_TYPE_Q8_0>(...) { quantize_mat_q8_K(...); }

Djip007 · 2025-03-24T21:42:50Z

ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp

@@ -5578,7 +5579,7 @@ static const tensor_traits<block_q4_0, 8, 8, GGML_TYPE_Q8_0> q4_0_8x8_q8_0;
 static const tensor_traits<block_q4_K, 8, 8, GGML_TYPE_Q8_K> q4_K_8x8_q8_K;

 // instance for IQ4
-static const tensor_traits<block_iq4_nl, 4, 4, GGML_TYPE_IQ4_NL> iq4_nl_4x4_q8_0;
+static const tensor_traits<block_iq4_nl, 4, 4, GGML_TYPE_Q8_0> iq4_nl_4x4_q8_0;


with the static_assert it will be catch a build time.

Djip007 · 2025-03-24T22:12:40Z

ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp

@@ -5295,8 +5295,7 @@ template <> void gemv<block_q4_K, 8, 8>(int n, float * s, size_t bs, const void
    ggml_gemv_q4_K_8x8_q8_K(n, s, bs, vx, vy, nr, nc);
 }

-template <>
-void gemv<block_iq4_nl, 4, 4>(int n, float * s, size_t bs, const void * vx, const void * vy, int nr, int nc) {
+template <> void gemv<block_iq4_nl, 4, 4>(int n, float * s, size_t bs, const void * vx, const void * vy, int nr, int nc) {


may be add the PARAM_TYPE on this template to

template <typename BLOC_TYPE, int64_t INTER_SIZE, int64_t NB_COLS, ggml_type PARAM_TYPE> void gem[m,v]....

ggml-ci

ggerganov · 2025-03-25T07:43:48Z

@Djip007 Thanks. Let me know if you have any other suggestions.

ggerganov · 2025-03-26T11:02:18Z

Merging this for now and happy to hear if you have any additional suggestions.

ggml : fix MUL_MAT_ID repack with Q8_K

87cd537

ggml-ci

github-actions bot added the ggml label Mar 24, 2025

Djip007 reviewed Mar 24, 2025

View reviewed changes

ggml : improve repack templates

e94c2bd

ggml-ci

ggerganov mentioned this pull request Mar 26, 2025

Eval bug: Program not working properly due to new features of "repack Q4_K tensor" #12528

Closed

ggerganov merged commit 5ed38b6 into master Mar 26, 2025
55 of 56 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : fix MUL_MAT_ID repack with Q8_K #12544

ggml : fix MUL_MAT_ID repack with Q8_K #12544

ggerganov commented Mar 24, 2025

ggerganov commented Mar 24, 2025

Djip007 left a comment

Djip007 Mar 24, 2025

Djip007 Mar 24, 2025

Djip007 Mar 24, 2025

Djip007 Mar 24, 2025 •

edited

Loading

ggerganov commented Mar 25, 2025

ggerganov commented Mar 26, 2025

ggml : fix MUL_MAT_ID repack with Q8_K #12544

ggml : fix MUL_MAT_ID repack with Q8_K #12544

Conversation

ggerganov commented Mar 24, 2025

ggerganov commented Mar 24, 2025

Djip007 left a comment

Choose a reason for hiding this comment

Djip007 Mar 24, 2025

Choose a reason for hiding this comment

Djip007 Mar 24, 2025

Choose a reason for hiding this comment

Djip007 Mar 24, 2025

Choose a reason for hiding this comment

Djip007 Mar 24, 2025 • edited Loading

Choose a reason for hiding this comment

ggerganov commented Mar 25, 2025

ggerganov commented Mar 26, 2025

Djip007 Mar 24, 2025 •

edited

Loading