Skip to content

vulkan: unpack more values at a time for iquants mat mul #14485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 6, 2025

Conversation

netrunnereve
Copy link
Collaborator

This change was taken from @remyoudompheng's #12260 and rebased as the original PR has been abandoned for a while. @remyoudompheng if you'd rather submit this yourself please let me know.

On my RX 470 it's around 10-15% faster.

PR:

  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                    44 runs - 23378.84 us/run -  60.13 GFLOP/run -   2.57 TFLOPS
  MUL_MAT(type_a=iq2_xs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                     44 runs - 23188.05 us/run -  60.13 GFLOP/run -   2.59 TFLOPS
  MUL_MAT(type_a=iq2_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      38 runs - 27454.82 us/run -  60.13 GFLOP/run -   2.19 TFLOPS
  MUL_MAT(type_a=iq3_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                    42 runs - 24248.86 us/run -  60.13 GFLOP/run -   2.48 TFLOPS
  MUL_MAT(type_a=iq1_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      44 runs - 23257.50 us/run -  60.13 GFLOP/run -   2.59 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      44 runs - 23611.98 us/run -  60.13 GFLOP/run -   2.55 TFLOPS
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                     44 runs - 23516.11 us/run -  60.13 GFLOP/run -   2.56 TFLOPS
  MUL_MAT(type_a=iq3_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      42 runs - 24849.00 us/run -  60.13 GFLOP/run -   2.42 TFLOPS
model size params backend ngl test t/s
llama 8B IQ1_M - 1.75 bpw 2.01 GiB 8.03 B Vulkan 100 pp512 200.65 ± 0.32
llama 8B IQ2_S - 2.5 bpw 2.56 GiB 8.03 B Vulkan 100 pp512 200.98 ± 0.13

Master:

  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                    40 runs - 25567.17 us/run -  60.13 GFLOP/run -   2.35 TFLOPS
  MUL_MAT(type_a=iq2_xs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                     38 runs - 26685.89 us/run -  60.13 GFLOP/run -   2.25 TFLOPS
  MUL_MAT(type_a=iq2_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      34 runs - 31151.91 us/run -  60.13 GFLOP/run -   1.93 TFLOPS
  MUL_MAT(type_a=iq3_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                    38 runs - 26525.08 us/run -  60.13 GFLOP/run -   2.27 TFLOPS
  MUL_MAT(type_a=iq1_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      42 runs - 24953.29 us/run -  60.13 GFLOP/run -   2.41 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      38 runs - 26897.55 us/run -  60.13 GFLOP/run -   2.24 TFLOPS
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                     44 runs - 23591.05 us/run -  60.13 GFLOP/run -   2.55 TFLOPS
  MUL_MAT(type_a=iq3_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      38 runs - 27017.13 us/run -  60.13 GFLOP/run -   2.23 TFLOPS
model size params backend ngl test t/s
llama 8B IQ1_M - 1.75 bpw 2.01 GiB 8.03 B Vulkan 100 pp512 172.03 ± 0.38
llama 8B IQ2_S - 2.5 bpw 2.56 GiB 8.03 B Vulkan 100 pp512 174.84 ± 0.36

Commit taken from remyoudompheng's PR ggml-org#12260

Co-authored-by: Rémy Oudompheng <remyoudompheng@gmail.com>
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jul 1, 2025
Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see improvements on all my devices, thank you @remyoudompheng and @netrunnereve

@0cc4m 0cc4m merged commit 6491d6e into ggml-org:master Jul 6, 2025
48 checks passed
@netrunnereve netrunnereve deleted the iquants branch July 6, 2025 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants