New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
请教,是否观察到 electra 较 bert/roberta 收敛更快? #2
Comments
如果你说的是论文中的FIgure 4,与BERT在不同checkpoint下的性能比较,这个暂时没有。 |
感觉是策略核心验证了~ |
electra 还是很有意思的 |
与1M checkpoint下的RoBERTa-base的结果来看,ELECTRA的效果是要好一些的。比如在CMRC 2018开发集上,
|
👍。这个 roberta small 总步数是多少?两者最终效果估计差不多吧。 |
总步数就定在1M步。
|
感谢分享! |
比较 pretraining 不同 steps 的 checkpoint。同 step 对应的 checkpoint,electra 100% label 学习的优势,在 finetuning 效果上,论文里是显著快于 bert 的。
不知道复现是否有这个结论呢?我们在做一个类似的策略,收敛速度上并没有论文显著。
The text was updated successfully, but these errors were encountered: