Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama.cpp: update PPL w.r.t. new k-quant optimization #172

Merged
merged 1 commit into from Aug 23, 2023
Merged

Conversation

ymcui
Copy link
Owner

@ymcui ymcui commented Aug 23, 2023

Description

This PR updates PPLs for GGUF k-quant models.

Details

llama.cpp has introduced new GGUF format and a recent PR brings improvements in k-quant model series. This PR is typically meant for refreshing the k-quant models' stats to see if our model can also benefit from these improvements.

Chinese-LLaMA-2-7B

old:

F16 Q2_K Q3_K Q4_0 Q4_1 Q4_K Q5_0 Q5_1 Q5_K Q6_K Q8_0
PPL 9.128 13.640 9.910 9.476 9.576 9.257 9.156 9.213 9.141 9.143 9.129
Size 12.91G 2.77G 3.17G 3.69G 4.08G 3.92G 4.47G 4.86G 4.59G 5.30G 6.81G
CPU Speed 117 42 51 39 44 43 48 51 50 54 65
GPU Speed 53 19 21 17 18 20 x x 25 26 x

new (changed results are in boldface):

F16 馃啎Q2_K 馃啎Q3_K Q4_0 Q4_1 馃啎Q4_K Q5_0 Q5_1 馃啎Q5_K 馃啎Q6_K Q8_0
PPL 9.128 11.1073 9.5760 9.476 9.576 9.2397 9.156 9.213 9.1676 9.1329 9.129
Size 12.91G 2.41G 3.18G 3.69G 4.08G 3.92G 4.47G 4.86G 4.59G 5.30G 6.81G
CPU Speed 117 42 51 39 44 43 48 51 50 54 65
GPU Speed 53 19 21 17 18 20 x x 25 26 x

Chinese-LLaMA-2-13B

old:

F16 Q2_K Q3_K Q4_0 Q4_1 Q4_K Q5_0 Q5_1 Q5_K Q6_K Q8_0
PPL 8.810 14.84 9.834 9.371 9.549 8.958 8.988 8.924 8.850 8.817 8.811
Size 24.69G 5.26G 6.02G 7.01G 7.77G 7.48G 8.52G 9.28G 8.76G 10.13G 13.05G
CPU Speed - 75 90 76 80 80 91 99 92 104 125
GPU Speed - 31 37 30 32 36 x x 47 51 x

new (changed results are in boldface):

F16 馃啎Q2_K 馃啎Q3_K Q4_0 Q4_1 馃啎Q4_K Q5_0 Q5_1 馃啎Q5_K 馃啎Q6_K Q8_0
PPL 8.810 12.8040 9.7383 9.371 9.549 8.9522 8.988 8.924 8.8581 8.820 8.811
Size 24.69G 5.18G 6.04G 7.01G 7.77G 7.48G 8.52G 9.28G 8.76G 10.13G 13.05G
CPU Speed - 75 90 76 80 80 91 99 92 104 125
GPU Speed - 31 37 30 32 36 x x 47 51 x

Observations:

  1. All k-quant models' PPL have been improved (reduced), except for q5_k. Significant improvement in q2_k and q3_k.
  2. q2_k model also have a more compact size than before. For example, 7B-q2_k model was 2.77G, and now is 2.41G.

鈿狅笍 WARNING: You should re-convert these models to avoid unexpected results and behavior.

Related Issue

None.

Explanation of Changes

copilot:walkthrough

- mainly update k-quant series
@ymcui ymcui merged commit 8f682ad into main Aug 23, 2023
1 check passed
@ymcui ymcui deleted the llama-cpp-gguf branch August 23, 2023 02:57
@ymcui ymcui restored the llama-cpp-gguf branch August 23, 2023 02:57
@ymcui ymcui deleted the llama-cpp-gguf branch August 30, 2023 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant