对xlnet预训练过程的一点疑问 #32

genggui001 · 2019-06-23T13:27:46Z

对于一段文本，选取其中的K个单词，每次只MASK掉一个，生成K条训练数据，再最大化K条训练数据的对应正确单词的对数概率。

是不是也可以达到和xlnet一样的效果？

zihangdai · 2019-06-23T23:19:01Z

Objective-wise, this is the same as the XLNet. However, this procedure requires K different forward and backward passes, which makes it too expensive to use,

kimiyoung · 2019-06-24T03:00:41Z

@genggui001 That would take 85x more machines, which is almost impossible to train. Also, given 85x more machines, simply scaling up XLNet will probably be better due to better data efficiency.

guotong1988 · 2020-09-28T02:55:51Z

Thank you. Learned a lot here.

guotong1988 mentioned this issue Sep 28, 2020

XLNet support guotong1988/BERT-GPU#22

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

对xlnet预训练过程的一点疑问 #32

对xlnet预训练过程的一点疑问 #32

genggui001 commented Jun 23, 2019

zihangdai commented Jun 23, 2019

kimiyoung commented Jun 24, 2019

guotong1988 commented Sep 28, 2020

对xlnet预训练过程的一点疑问 #32

对xlnet预训练过程的一点疑问 #32

Comments

genggui001 commented Jun 23, 2019

zihangdai commented Jun 23, 2019

kimiyoung commented Jun 24, 2019

guotong1988 commented Sep 28, 2020