Efficient Estimation of Word Representations in Vector Space #2

shnakazawa · 2022-11-21T07:04:16Z

Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Efficient Estimation of Word Representations in Vector Space.” arXiv [cs.CL]. arXiv. http://arxiv.org/abs/1301.3781.

word2vecを提唱した論文
- 1つのモデルではなく、複数のモデルの総称
- 単語をベクトルで表す・単語の分散表現手法の一つ
- 自然言語処理におけるブレイクスルーの1つであり、単語の分散表現手法のデファクトスタンダードとなった
3部作の2作目。本論文で初めて "word2vec" という名前が与えられた。
- Mikolov, Tomas, Wen-Tau Yih, and Geoffrey Zweig. 2013. “Linguistic Regularities in Continuous Space Word Representations,” June, 746–51.
- Mikolov Tomas, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Distributed Representations of Words and Phrases and Their Compositionality.” In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, 3111–19. NIPS’13. Red Hook, NY, USA: Curran Associates Inc.
  - 実用的には3作目が一番大切か。
とても高精度な分散表現が獲得できたため、単語の演算ができるようになった！
- 例： King - man + woman = Queen

Abstract

We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

(DeepL翻訳)

我々は、非常に大規模なデータセットから単語の連続ベクトル表現を計算するための2つの新しいモデルアーキテクチャを提案する。これらの表現の品質は単語の類似性タスクで測定され、その結果は異なるタイプのニューラルネットワークに基づく、これまでで最も性能の良い技術と比較される。その結果、16億語のデータセットから高品質の単語ベクトルを学習するのに1日もかからず、より低い計算コストで精度が大幅に向上することが確認された。さらに、これらのベクトルは、我々のテストセットにおいて、構文的および意味的な単語の類似性を測定するための最先端の性能を提供することを示す。

コード

https://code.google.com/archive/p/word2vec/

解決した課題/先行研究との比較

単語の分散表現 (単語をベクトルで表すこと) を用いることで自然言語処理のタスクにおいてN-gramモデルよりも高い精度が出ることが報告されていた。
しかし、巨大なデータセットからどのようにクオリティの高い分散表現を作成するかは依然課題が残されていた。
- 大きな課題の一つが学習時の計算量。
本論文は単語の分散表現化手法の一つ word2vec を提案した。
結果として非常に高精度・ハイクオリティな単語の分散表現の獲得に成功し、単語の演算ができるようになった。

技術・手法のポイント

本論文では2つのword2vecモデルを紹介している。
Continuous Bag-of-Words (CBOW) Model: 周囲の単語（文脈）から現在の単語を予測するモデル
- Feedforward 言語モデルNeural Networkに似ているが、投影層をすべての単語に対して共有する。
- 前後n個の単語を入力とする→現在の (中央の) 単語の分類を学習・推論させる。
- 特に学習の速さがウリ
Continuous Skip-gram Model: 現在の単語から周囲の単語（文脈）を予測するモデル
- CBOWの逆。現在の単語を入力に、前後で一定の範囲にある単語を予測する。
- これの範囲を広げると得られる単語ベクトルの質も上がるが、計算量も増加する。
- 特に分散表現の精度がウリ
いずれも3層のneural network。隠れ層がないので計算量が激減。
これらのモデルの学習を通して得られる全結合層の重み行列 = 単語の分散表現 = word2vec となる！
- 目の付け方がすごい。
加えて、hierarchical softmaxで高速化。
- CBOW, skip-gram共にsoftmaxの計算が激重。(単語の数だけの分類問題)
- → 2分類問題を繰り返す (= 階層的/hierarchical) 方式に変更し、計算回数が激減。
- 更に高速化→ Mikolov et al., NIPS 2013
単語ベクトルの学習には、Google Newsコーパスを使用。

評価指標

Task Description
- 2つの単語のペアをつなげて、質問リストを作成。例えば、アメリカの都市と州のペア等。
- 加えて、ランダムに2単語を選んだペアを作成。
- この「2単語は、この関係 (e.g., 都市と州、男女、対義語、etc...) がありますか？」→正答率を評価
学習速度。3億2000万語, 8万2千 vocabularyの学習に、
- リカレントニューラルネットワーク言語モデルはシングルCPUで8週間！
- CBOWは1日で終わる。
- Skip-gramモデルは約3日。
Microsoft Sentence Completion Challenge
- 結果はそこそこ。
- リカレントニューラルネットワーク言語モデル (これまでState-of-the-Art: SOTA) と重み付き結合をすることでSOTA更新

残された課題・議論

これ以前に報告されていた手法より計算量が少ないといえど、実用的にはより高速化が必要
- → Mikolov et al., NIPS 2013へと繋がる。
  - 1時間あたり数十億語のオーダー。1000億語以上で学習した140万以上のベクトル。
得られた高品質単語ベクトルの様々な用途への実適用

重要な引用

Mikolov, Tomas, Wen-Tau Yih, and Geoffrey Zweig. 2013. “Linguistic Regularities in Continuous Space Word Representations,” June, 746–51.
- 本稿に先立つ、同著者たちによる論文。このときはまだ手法に名前が与えられていなかったが、単語の演算ができた！と報告。
Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Distributed Representations of Words and Phrases and Their Compositionality.” In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, 3111–19. NIPS’13. Red Hook, NY, USA: Curran Associates Inc.
- 本稿の手法の改良。CBOWとskip-gramを合体させる + negative samplingを適用することで、本稿の手法より速く、1時間あたり数十億単語のオーダーでベクトル化できたという報告。
Mikolov, Tomas, M. Karafiát, L. Burget, J. Cernocký, and S. Khudanpur. 2010. “Recurrent Neural Network Based Language Model.” INTERSPEECH. https://www.semanticscholar.org/paper/9819b600a828a57e1cde047bbe710d3446b30da5.
- リカレントニューラルネットワーク言語モデルを提唱した論文
Morin, Frederic, and Yoshua Bengio. 06--08 Jan 2005. “Hierarchical Probabilistic Neural Network Language Model.” In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, edited by Robert G. Cowell and Zoubin Ghahramani, R5:246–52. Proceedings of Machine Learning Research. PMLR.
- Hierarchical softmaxの由来

参考情報

shnakazawa added the Natural language processing Papers related to NLP label Nov 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient Estimation of Word Representations in Vector Space #2

Efficient Estimation of Word Representations in Vector Space #2

shnakazawa commented Nov 21, 2022

Efficient Estimation of Word Representations in Vector Space #2

Efficient Estimation of Word Representations in Vector Space #2

Comments

shnakazawa commented Nov 21, 2022

Abstract

コード

解決した課題/先行研究との比較

技術・手法のポイント

評価指標

残された課題・議論

重要な引用

関連論文

参考情報