关于输出每个word的embedding #1

ECNU109 · 2019-11-26T13:09:55Z

作者您好，请问使用您的代码应该如何改写能输出最终fine-tune的模型中的word embedding呢？
另外我注意到您没有进行分词，是一个中文字为单位，如果想使用字的embedding得到一个词汇的embedding，请问有什么比较好的方式么？谢谢！

yao8839836 · 2019-12-01T13:28:13Z

@ECNU109

您好，个人认为最简单的方法是将单词用[CLS]和[SEP]包装起来(例如，如[CLS]头痛[SEP])传入预训练的BERT，然后将[CLS]的最后一层embedding作为词汇的embedding。

此外，可以参考bert-as-service的计算方式：

Q: How do you get the fixed representation? Did you do pooling or something?
A: Yes, pooling is required to get a fixed representation of a sentence. In the default strategy REDUCE_MEAN, I take the second-to-last hidden layer of all of the tokens in the sentence and do average pooling.

这里的sentence也可以是词汇。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于输出每个word的embedding #1

关于输出每个word的embedding #1

ECNU109 commented Nov 26, 2019

yao8839836 commented Dec 1, 2019

关于输出每个word的embedding #1

关于输出每个word的embedding #1

Comments

ECNU109 commented Nov 26, 2019

yao8839836 commented Dec 1, 2019