运行13B模型时报错：expected [5120 x 49953], got [5120 x 49952] #133

Leo4zhou · 2023-04-12T03:29:26Z

chinese-llama-lora-7b已经成功合并运行了，用同样的步骤测试13b的模型时，到了最后一步运行模型时报了以下错误：

main: seed = 1681268819
llama.cpp: loading model from ./models_cn/13B/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 49953
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: f16        = 2
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =  73.73 KB
llama_model_load_internal: mem required  = 9917.04 MB (+ 1608.00 MB per state)
error loading model: llama.cpp: tensor 'output.weight' has wrong shape; expected [5120 x 49953], got [5120 x 49952]
llama_init_from_file: failed to load model
main: error: failed to load model './models_cn/13B/ggml-model-q4_0.bin'

运行的命令是：
./main -m ./models_cn/13B/ggml-model-q4_0.bin --color -ins -t 8 --temp 0.2 -n 256 --repeat_penalty 1.3 -i -r "用户：" -p "对话"
这里是说'output.weight'的形状不对，所以看了下上一步quantize的过程，发现第1步时是[5120 x 49953]，但到最后一步的output.weight就变成了[5120 x 49952]。49953这个数量在config.json中有配置，我发现7B的目录下有config.json，但是13B的目录下没有这个文件。
quantize的输出如下：

llama.cpp: loading model from ./models_cn/13B/ggml-model-f16.bin
llama.cpp: saving model to ./models_cn/13B/ggml-model-q4_0.bin
[1/363]                tok_embeddings.weight - [5120 x 49953], type =    f16, quantizing .. size =   487.82 MB ->   152.44 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 
[2/363]         layers.0.attention.wq.weight - [5120 x 5120], type =    f16, quantizing .. 50.00 MB ->    15.62 MB | hist: 0.000 0.021 0.016 0.027 0.045 0.071 0.104 0.138 0.157 0.138 0.103 0.071 0.045 0.027 0.016 0.021 
[3/363]         layers.0.attention.wk.weight - [5120 x 5120], type =    f16, quantizing .. size =    50.00 MB ->    15.62 MB | hist: 0.000 0.021 0.016 0.027 0.045 0.071 0.103 0.138 0.157 0.138 0.103 0.071 0.045 0.027 0.016 0.021

（中间步骤省略）

[361/363]            layers.39.ffn_norm.weight - [5120], type =    f32, size =    0.020 MB
[362/363]                          norm.weight - [5120], type =    f32, size =    0.020 MB
[363/363]                        output.weight - [5120 x 49952], type =    f16, quantizing .. size =   487.81 MB ->   152.44 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 
llama_model_quantize_internal: model size  = 25177.22 MB
llama_model_quantize_internal: quant size  =  7868.97 MB
llama_model_quantize_internal: hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 

main: quantize time = 140065.95 ms
main:    total time = 140065.95 ms

请问这个可能是哪里的问题呢？

The text was updated successfully, but these errors were encountered:

airaria · 2023-04-12T04:26:57Z

你量化的是LLaMA模型吗？LLaMA模型的词表大小是49953，我估计和49953不能被2整除有关；
如果量化Alpaca 13B模型，词表大小49954，应该是没问题的。你试一下？

Leo4zhou · 2023-04-12T07:55:55Z

你量化的是LLaMA模型吗？LLaMA模型的词表大小是49953，我估计和49953不能被2整除有关；如果量化Alpaca 13B模型，词表大小49954，应该是没问题的。你试一下？

试了一下Alpaca的7B和13B模型都没有问题，output.weight 都是 [5120 x 49954]。
又重新试了下LLaMA的7B模型，output.weight是 [5120 x 49953]，没有问题。
但是LLaMA的13B模型的output.weight 是 [5120 x 49952]，导致了这个错误。
操作步骤应该是完全一样的，感觉很奇怪。

airaria · 2023-04-12T08:04:25Z

你量化的是LLaMA模型吗？LLaMA模型的词表大小是49953，我估计和49953不能被2整除有关；如果量化Alpaca 13B模型，词表大小49954，应该是没问题的。你试一下？

试了一下Alpaca的7B和13B模型都没有问题，output.weight 都是 [5120 x 49954]。又重新试了下LLaMA的7B模型，output.weight是 [5120 x 49953]，没有问题。但是LLaMA的13B模型的output.weight 是 [5120 x 49952]，导致了这个错误。操作步骤应该是完全一样的，感觉很奇怪。

因为13B的pth文件有两个，每个文件存矩阵一半的维度。
对于lm_head.weight这个参数来说，两个pth文件分别存了24976和24977维；但llama.cpp的量化脚本可能只根据一个文件中的维度*2来推测合并两个pth后的维度，所以llama.cpp算出来是24976*2= 49952，也就是你报错信息中的维度。

暂时没有特别好的解决办法。如果你有一定nlp或dl经验，可以手动padding两个pth中不一致的维度，或给lora权重中的embedding加一维，强行使总维度为49954，然后配合alpaca_tokenizer使用。

yungangwu · 2023-04-13T06:42:42Z

我也遇到了相同的问题，不过即使不采用量化，加载原始模型，报的错误也是这个，不知道是不是扩充权重出的问题

airaria · 2023-04-13T07:19:51Z

我也遇到了相同的问题，不过即使不采用量化，加载原始模型，报的错误也是这个，不知道是不是扩充权重出的问题

是加载pth模型文件吗

yungangwu · 2023-04-13T07:22:27Z

不是，直接加载的ggml-model-f16.bin

airaria · 2023-04-13T07:24:18Z

不是，直接加载的ggml-model-f16.bin

哦，那原因可能和我上面说的一样，是llama.cpp转torch到ggml时的问题

niupi21 · 2023-04-13T14:01:25Z

我认为是llama.cpp的问题。然后处理的办法就比较残暴了，将llama.cpp第956行：model.output = ml->get_tensor("output.weight", {n_embd, n_vocab}); 改为：model.output = ml->get_tensor("output.weight", {n_embd, n_vocab-1});
注意：不需要重新转ggml和量化4bit，测试运行正常。临时办法，其他model得改回来。

b3601993 · 2023-04-22T03:10:22Z

ggml.c文件里没有 model.output = ml->get_tensor("output.weight", {n_embd, n_vocab}); 这行代码，哭死。。。

niupi21 · 2023-04-22T07:49:53Z

ggml.c文件里没有 model.output = ml->get_tensor("output.weight", {n_embd, n_vocab}); 这行代码，哭死。。。

sorry，在llama.cpp里

airaria mentioned this issue Apr 12, 2023

Size([49952, 5120]) from checkpoint V.S. Size([49953, 5120]). from model #91

Closed

airaria closed this as completed Apr 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

运行13B模型时报错：expected [5120 x 49953], got [5120 x 49952] #133

运行13B模型时报错：expected [5120 x 49953], got [5120 x 49952] #133

Leo4zhou commented Apr 12, 2023

airaria commented Apr 12, 2023 •

edited

Loading

Leo4zhou commented Apr 12, 2023

airaria commented Apr 12, 2023 •

edited

Loading

yungangwu commented Apr 13, 2023

airaria commented Apr 13, 2023

yungangwu commented Apr 13, 2023

airaria commented Apr 13, 2023

niupi21 commented Apr 13, 2023 •

edited

Loading

b3601993 commented Apr 22, 2023

niupi21 commented Apr 22, 2023

运行13B模型时报错：expected [5120 x 49953], got [5120 x 49952] #133

运行13B模型时报错：expected [5120 x 49953], got [5120 x 49952] #133

Comments

Leo4zhou commented Apr 12, 2023

airaria commented Apr 12, 2023 • edited Loading

Leo4zhou commented Apr 12, 2023

airaria commented Apr 12, 2023 • edited Loading

yungangwu commented Apr 13, 2023

airaria commented Apr 13, 2023

yungangwu commented Apr 13, 2023

airaria commented Apr 13, 2023

niupi21 commented Apr 13, 2023 • edited Loading

b3601993 commented Apr 22, 2023

niupi21 commented Apr 22, 2023

airaria commented Apr 12, 2023 •

edited

Loading

airaria commented Apr 12, 2023 •

edited

Loading

niupi21 commented Apr 13, 2023 •

edited

Loading