Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some weights of the model checkpoint at ./save_model/ were not used #17

Open
zhangtaochn opened this issue Mar 27, 2023 · 2 comments
Open

Comments

@zhangtaochn
Copy link

执行infer的时候,日志提示了这些,是因为这些weights没能加载到模型中,所以预测结果猜测没训练之前结果一模一样吧,
但我不知道怎么改,可以让这些参数正常加载。@yuanzhoulvpi2017

sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
load model: ./save_model/
Some weights of the model checkpoint at ./save_model/ were not used when initializing ChatGLMForConditionalGeneration: ['transformer.layers.5.attention.query_key_value.lora_B.weight', 'transformer.layers.26.attention.query_key_value.lora_B.weight', 'transformer.layers.13.attention.query_key_value.lora_A.weight', 'transformer.layers.7.attention.query_key_value.lora_A.weight', 'transformer.layers.4.attention.query_key_value.lora_A.weight', 'transformer.layers.6.attention.query_key_value.lora_A.weight', 'transformer.layers.10.attention.query_key_value.lora_B.weight', 'transformer.layers.13.attention.query_key_value.lora_B.weight', 'transformer.layers.11.attention.query_key_value.lora_B.weight', 'transformer.layers.27.attention.query_key_value.lora_A.weight', 'transformer.layers.2.attention.query_key_value.lora_A.weight', 'transformer.layers.1.attention.query_key_value.lora_B.weight', 'transformer.layers.19.attention.query_key_value.lora_B.weight', 'transformer.layers.18.attention.query_key_value.lora_A.weight', 'transformer.layers.20.attention.query_key_value.lora_A.weight', 'transformer.layers.9.attention.query_key_value.lora_B.weight', 'transformer.layers.20.attention.query_key_value.lora_B.weight', 'transformer.layers.22.attention.query_key_value.lora_A.weight', 'transformer.layers.21.attention.query_key_value.lora_B.weight', 'transformer.layers.22.attention.query_key_value.lora_B.weight', 'transformer.layers.6.attention.query_key_value.lora_B.weight', 'transformer.layers.24.attention.query_key_value.lora_B.weight', 'transformer.layers.0.attention.query_key_value.lora_B.weight', 'transformer.layers.17.attention.query_key_value.lora_A.weight', 'transformer.layers.21.attention.query_key_value.lora_A.weight', 'transformer.layers.25.attention.query_key_value.lora_B.weight', 'transformer.layers.1.attention.query_key_value.lora_A.weight', 'transformer.layers.18.attention.query_key_value.lora_B.weight', 'transformer.layers.14.attention.query_key_value.lora_B.weight', 'transformer.layers.16.attention.query_key_value.lora_B.weight', 'transformer.layers.15.attention.query_key_value.lora_A.weight', 'transformer.layers.17.attention.query_key_value.lora_B.weight', 'transformer.layers.23.attention.query_key_value.lora_B.weight', 'transformer.layers.14.attention.query_key_value.lora_A.weight', 'transformer.layers.9.attention.query_key_value.lora_A.weight', 'transformer.layers.8.attention.query_key_value.lora_A.weight', 'transformer.layers.25.attention.query_key_value.lora_A.weight', 'transformer.layers.10.attention.query_key_value.lora_A.weight', 'transformer.layers.23.attention.query_key_value.lora_A.weight', 'transformer.layers.15.attention.query_key_value.lora_B.weight', 'transformer.layers.5.attention.query_key_value.lora_A.weight', 'transformer.layers.26.attention.query_key_value.lora_A.weight', 'transformer.layers.4.attention.query_key_value.lora_B.weight', 'transformer.layers.12.attention.query_key_value.lora_A.weight', 'transformer.layers.8.attention.query_key_value.lora_B.weight', 'transformer.layers.11.attention.query_key_value.lora_A.weight', 'transformer.layers.2.attention.query_key_value.lora_B.weight', 'transformer.layers.19.attention.query_key_value.lora_A.weight', 'transformer.layers.3.attention.query_key_value.lora_B.weight', 'transformer.layers.7.attention.query_key_value.lora_B.weight', 'transformer.layers.24.attention.query_key_value.lora_A.weight', 'transformer.layers.27.attention.query_key_value.lora_B.weight', 'transformer.layers.12.attention.query_key_value.lora_B.weight', 'transformer.layers.0.attention.query_key_value.lora_A.weight', 'transformer.layers.16.attention.query_key_value.lora_A.weight', 'transformer.layers.3.attention.query_key_value.lora_A.weight']

  • This IS expected if you are initializing ChatGLMForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing ChatGLMForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
@yuanzhoulvpi2017
Copy link
Owner

我当时理解,这些lora层,都是用来辅助训练的用来帮助模型参数更新的,在预测和推理的时候,就没有使用这些层了。

我后面检查一下这些层是否有用,是否有必要加载。

感谢老哥的提醒

@yanchaoguo
Copy link

+1 模型各层权重初始化失败
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants