llama.cpp: update PPL w.r.t. new k-quant optimization #172

ymcui · 2023-08-23T02:47:34Z

Description

This PR updates PPLs for GGUF k-quant models.

Details

llama.cpp has introduced new GGUF format and a recent PR brings improvements in k-quant model series. This PR is typically meant for refreshing the k-quant models' stats to see if our model can also benefit from these improvements.

Chinese-LLaMA-2-7B

old:

	F16	Q2_K	Q3_K	Q4_0	Q4_1	Q4_K	Q5_0	Q5_1	Q5_K	Q6_K	Q8_0
PPL	9.128	13.640	9.910	9.476	9.576	9.257	9.156	9.213	9.141	9.143	9.129
Size	12.91G	2.77G	3.17G	3.69G	4.08G	3.92G	4.47G	4.86G	4.59G	5.30G	6.81G
CPU Speed	117	42	51	39	44	43	48	51	50	54	65
GPU Speed	53	19	21	17	18	20	x	x	25	26	x

new (changed results are in boldface):

	F16	🆕Q2_K	🆕Q3_K	Q4_0	Q4_1	🆕Q4_K	Q5_0	Q5_1	🆕Q5_K	🆕Q6_K	Q8_0
PPL	9.128	11.1073	9.5760	9.476	9.576	9.2397	9.156	9.213	9.1676	9.1329	9.129
Size	12.91G	2.41G	3.18G	3.69G	4.08G	3.92G	4.47G	4.86G	4.59G	5.30G	6.81G
CPU Speed	117	42	51	39	44	43	48	51	50	54	65
GPU Speed	53	19	21	17	18	20	x	x	25	26	x

Chinese-LLaMA-2-13B

old:

	F16	Q2_K	Q3_K	Q4_0	Q4_1	Q4_K	Q5_0	Q5_1	Q5_K	Q6_K	Q8_0
PPL	8.810	14.84	9.834	9.371	9.549	8.958	8.988	8.924	8.850	8.817	8.811
Size	24.69G	5.26G	6.02G	7.01G	7.77G	7.48G	8.52G	9.28G	8.76G	10.13G	13.05G
CPU Speed	-	75	90	76	80	80	91	99	92	104	125
GPU Speed	-	31	37	30	32	36	x	x	47	51	x

new (changed results are in boldface):

	F16	🆕Q2_K	🆕Q3_K	Q4_0	Q4_1	🆕Q4_K	Q5_0	Q5_1	🆕Q5_K	🆕Q6_K	Q8_0
PPL	8.810	12.8040	9.7383	9.371	9.549	8.9522	8.988	8.924	8.8581	8.820	8.811
Size	24.69G	5.18G	6.04G	7.01G	7.77G	7.48G	8.52G	9.28G	8.76G	10.13G	13.05G
CPU Speed	-	75	90	76	80	80	91	99	92	104	125
GPU Speed	-	31	37	30	32	36	x	x	47	51	x

Observations:

All k-quant models' PPL have been improved (reduced), except for q5_k. Significant improvement in q2_k and q3_k.
q2_k model also have a more compact size than before. For example, 7B-q2_k model was 2.77G, and now is 2.41G.

⚠️ WARNING: You should re-convert these models to avoid unexpected results and behavior.

Related Issue

None.

Explanation of Changes

copilot:walkthrough

- mainly update k-quant series

update ppl w.r.t. llama-cpp-gguf

e88f92e

- mainly update k-quant series

ymcui merged commit 8f682ad into main Aug 23, 2023
1 check passed

ymcui deleted the llama-cpp-gguf branch August 23, 2023 02:57

ymcui restored the llama-cpp-gguf branch August 23, 2023 02:57

ymcui deleted the llama-cpp-gguf branch August 30, 2023 09:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama.cpp: update PPL w.r.t. new k-quant optimization #172

llama.cpp: update PPL w.r.t. new k-quant optimization #172

ymcui commented Aug 23, 2023

llama.cpp: update PPL w.r.t. new k-quant optimization #172

llama.cpp: update PPL w.r.t. new k-quant optimization #172

Conversation

ymcui commented Aug 23, 2023

Description

Details

Chinese-LLaMA-2-7B

Chinese-LLaMA-2-13B

Related Issue

Explanation of Changes