vulkan: unpack more values at a time for iquants mat mul #14485

netrunnereve · 2025-07-01T16:48:14Z

This change was taken from @remyoudompheng's #12260 and rebased as the original PR has been abandoned for a while. @remyoudompheng if you'd rather submit this yourself please let me know.

On my RX 470 it's around 10-15% faster.

PR:

  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                    44 runs - 23378.84 us/run -  60.13 GFLOP/run -   2.57 TFLOPS
  MUL_MAT(type_a=iq2_xs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                     44 runs - 23188.05 us/run -  60.13 GFLOP/run -   2.59 TFLOPS
  MUL_MAT(type_a=iq2_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      38 runs - 27454.82 us/run -  60.13 GFLOP/run -   2.19 TFLOPS
  MUL_MAT(type_a=iq3_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                    42 runs - 24248.86 us/run -  60.13 GFLOP/run -   2.48 TFLOPS
  MUL_MAT(type_a=iq1_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      44 runs - 23257.50 us/run -  60.13 GFLOP/run -   2.59 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      44 runs - 23611.98 us/run -  60.13 GFLOP/run -   2.55 TFLOPS
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                     44 runs - 23516.11 us/run -  60.13 GFLOP/run -   2.56 TFLOPS
  MUL_MAT(type_a=iq3_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      42 runs - 24849.00 us/run -  60.13 GFLOP/run -   2.42 TFLOPS

model	size	params	backend	ngl	test	t/s
llama 8B IQ1_M - 1.75 bpw	2.01 GiB	8.03 B	Vulkan	100	pp512	200.65 ± 0.32
llama 8B IQ2_S - 2.5 bpw	2.56 GiB	8.03 B	Vulkan	100	pp512	200.98 ± 0.13

Master:

  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                    40 runs - 25567.17 us/run -  60.13 GFLOP/run -   2.35 TFLOPS
  MUL_MAT(type_a=iq2_xs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                     38 runs - 26685.89 us/run -  60.13 GFLOP/run -   2.25 TFLOPS
  MUL_MAT(type_a=iq2_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      34 runs - 31151.91 us/run -  60.13 GFLOP/run -   1.93 TFLOPS
  MUL_MAT(type_a=iq3_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                    38 runs - 26525.08 us/run -  60.13 GFLOP/run -   2.27 TFLOPS
  MUL_MAT(type_a=iq1_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      42 runs - 24953.29 us/run -  60.13 GFLOP/run -   2.41 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      38 runs - 26897.55 us/run -  60.13 GFLOP/run -   2.24 TFLOPS
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                     44 runs - 23591.05 us/run -  60.13 GFLOP/run -   2.55 TFLOPS
  MUL_MAT(type_a=iq3_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      38 runs - 27017.13 us/run -  60.13 GFLOP/run -   2.23 TFLOPS

model	size	params	backend	ngl	test	t/s
llama 8B IQ1_M - 1.75 bpw	2.01 GiB	8.03 B	Vulkan	100	pp512	172.03 ± 0.38
llama 8B IQ2_S - 2.5 bpw	2.56 GiB	8.03 B	Vulkan	100	pp512	174.84 ± 0.36

Commit taken from remyoudompheng's PR ggml-org#12260 Co-authored-by: Rémy Oudompheng <remyoudompheng@gmail.com>

0cc4m

I see improvements on all my devices, thank you @remyoudompheng and @netrunnereve

vulkan: increase LOAD_VEC_A to 8 (IQ1/IQ2) or 4 (IQ3)

5712c2f

Commit taken from remyoudompheng's PR ggml-org#12260 Co-authored-by: Rémy Oudompheng <remyoudompheng@gmail.com>

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jul 1, 2025

0cc4m approved these changes Jul 6, 2025

View reviewed changes

0cc4m merged commit 6491d6e into ggml-org:master Jul 6, 2025
48 checks passed

netrunnereve deleted the iquants branch July 6, 2025 15:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: unpack more values at a time for iquants mat mul #14485

vulkan: unpack more values at a time for iquants mat mul #14485

netrunnereve commented Jul 1, 2025

Uh oh!

0cc4m left a comment

Uh oh!

Uh oh!

Uh oh!

vulkan: unpack more values at a time for iquants mat mul #14485

vulkan: unpack more values at a time for iquants mat mul #14485

Conversation

netrunnereve commented Jul 1, 2025

Uh oh!

0cc4m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!