Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama.cpp: update GGUF models (with imatrix) #510

Merged
merged 1 commit into from
Jan 23, 2024
Merged

llama.cpp: update GGUF models (with imatrix) #510

merged 1 commit into from
Jan 23, 2024

Conversation

ymcui
Copy link
Owner

@ymcui ymcui commented Jan 23, 2024

Description

Recently, llama.cpp introduced importance matrix-aware quantization, which yields further improvements on PPL.
Before quantization, the important matrices are calculated through imatrix application. We use Chinese segmentation training data PKU, and iterate over 100 batches to obtain the imatrix.

During quantization, specify --imatrix with the generated imatrix file to allow im-aware quantization. Note that the process will be longer than without imatrix.

Currently, we have converted all available models (only for K-quants). You can download them directly from our Hugging Face model hub. The model name with -im suffix represents the newly converted im-aware models. These models can be used directly without further actions.

The followings are several benchmarks (PPL). Generally speaking, im-quantized models are better but not always.

Chinese-Alpaca-2-7B-RLHF-GGUF

Quant original imatrix (-im)
Q2_K 10.5211 +/- 0.14139 11.9331 +/- 0.16168
Q3_K 8.9748 +/- 0.12043 8.8238 +/- 0.11850
Q4_0 8.7843 +/- 0.11854 -
Q4_K 8.4643 +/- 0.11341 8.4226 +/- 0.11302
Q5_0 8.4563 +/- 0.11353 -
Q5_K 8.3722 +/- 0.11236 8.3336 +/- 0.11192
Q6_K 8.3207 +/- 0.11184 8.3047 +/- 0.11159
Q8_0 8.3100 +/- 0.11173 -

Chinese-LLaMA-2-13B-GGUF

Quant original imatrix (-im)
Q2_K 14.4701 +/- 0.26107 17.4275 +/- 0.31909
Q3_K 10.1620 +/- 0.18277 9.7486 +/- 0.17744
Q4_0 9.8633 +/- 0.17792 -
Q4_K 9.2735 +/- 0.16793 9.2734 +/- 0.16792
Q5_0 9.3553 +/- 0.16945 -
Q5_K 9.1767 +/- 0.16634 9.1594 +/- 0.16590
Q6_K 9.1326 +/- 0.16546 9.1478 +/- 0.16583
Q8_0 9.1394 +/- 0.16574 -

Related Issue

None.

@ymcui ymcui merged commit ae46bcd into main Jan 23, 2024
1 check passed
@ymcui ymcui deleted the imatrix branch January 23, 2024 05:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant