Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

模型训练中的收敛问题 #31

Closed
Bobo1204 opened this issue Sep 16, 2022 · 5 comments
Closed

模型训练中的收敛问题 #31

Bobo1204 opened this issue Sep 16, 2022 · 5 comments

Comments

@Bobo1204
Copy link

咨询波哥一个问题哦。
对比测试了几个模型,比如分类,序列标注,文本生成等。使用bert4torch和hugging face中的tokenizer和model load,

hugging face版本的会在五六轮左右出现一个比较好的效果
bert4torch需要20轮以上效果才可以
而最终的模型评估效果是hugging face略高1~2个点
对比代码,暂时没找到原因。比较疑惑

@Tongjilibo
Copy link
Owner

我给的示例代码,学习率固定,没有weight_decay,额外增加的网络结构我是随机初始化,这些一致不?我之前也跑了实验结果在performance里面,你跑的结果差不多嘛?方便的话可以把hugging face的代码发我,我这也排查下?防止是我的代码问题

@Tongjilibo
Copy link
Owner

我猜想是不是你的预训练模型有部分权重没有正确加载,使用bert4torch时候你warning提示吗

@Bobo1204
Copy link
Author

早上拉取了bert4torch的最新代码,测试了开源情感数据和一个我们自己的标注数据分类模型,收敛很快,3~6轮之间基本上达到最佳表现。

上面的问题我大概是五六月份用bert4torch测试的时候出现的,然后工作中就都用hugging face的tokenizer和model。尴尬...大佬牛逼

预训练模型加载,如果是hugging face上面的bert-base,'LayerNorm.gamma', 'LayerNorm.beta'和框架不一致,会有warning。以前我改了bert4torch/models.py中的映射,后来看到convert_scripts,改了权重里面的名称。改完效果提升在2~3个点之间。

@Tongjilibo
Copy link
Owner

嗯嗯,预训练权重有的不匹配,目前是convert脚本来修改后适配的

@Tongjilibo
Copy link
Owner

刚刚发现前两天修改的一个tokenizer读vocab.txt有个bug,刚刚已经修复,后续使用建议重新拉最新代码哈~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants