You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Borgeaud, Sebastian, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, et al. 2021. “Improving Language Models by Retrieving from Trillions of Tokens.” arXiv [cs.CL]. arXiv. http://arxiv.org/abs/2112.04426.
We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a 2 trillion token database, our Retrieval-Enhanced Transformer (Retro) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25x fewer parameters. After fine-tuning, Retro performance translates to downstream knowledge-intensive tasks such as question answering. Retro combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train Retro from scratch, yet can also rapidly Retrofit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale.
過去にも検索を組み合わせる手法は行われてきた (Guu et al., 2020; Khandelwal et al., 2020; Lewis et al., 2020; Yogatama et al., 2021) が、モデルサイズやデータベースが小規模なものであったため、何兆ものトークンからなるデータベースを用いた初の報告。
Guu, Kelvin, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 13--18 Jul 2020. “Retrieval Augmented Language Model Pre-Training.” In Proceedings of the 37th International Conference on Machine Learning, edited by Hal Daumé Iii and Aarti Singh, 119:3929–38. Proceedings of Machine Learning Research. PMLR.
Khandelwal, Urvashi, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. 2020. “Generalization through Memorization: Nearest Neighbor Language Models.” https://openreview.net/pdf?id=HklBjCEKvH.
Lewis, Patrick, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, et al. 2020. “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” In Proceedings of the 34th International Conference on Neural Information Processing Systems, 9459–74. NIPS’20 793. Red Hook, NY, USA: Curran Associates Inc.
Yogatama, Dani, Cyprien de Masson d’Autume, and Lingpeng Kong. 2021. “Adaptive Semiparametric Language Models.” Transactions of the Association for Computational Linguistics 9: 362–73.
Rae, Jack W., Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, et al. 2021. “Scaling Language Models: Methods, Analysis & Insights from Training Gopher.” arXiv [cs.CL]. arXiv. http://arxiv.org/abs/2112.11446.
大規模言語モデルが持つ潜在的弊害について議論
Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜.” In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–23. FAccT ’21. New York, NY, USA: Association for Computing Machinery.
Weidinger, Laura, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, et al. 2021. “Ethical and Social Risks of Harm from Language Models.” arXiv [cs.CL]. arXiv. http://arxiv.org/abs/2112.04359.
The text was updated successfully, but these errors were encountered:
Borgeaud, Sebastian, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, et al. 2021. “Improving Language Models by Retrieving from Trillions of Tokens.” arXiv [cs.CL]. arXiv. http://arxiv.org/abs/2112.04426.
Abstract
(DeepL翻訳)
我々は、大規模コーパスから取得した文書チャンクを、先行トークンとの局所的な類似性に基づいて条件付けすることにより、自己回帰型言語モデルを強化する。2兆個のトークンデータベースを用いた我々の検索強化型変換器(Retro)は、25倍少ないパラメータで、Pile上のGPT-3やJurassic-1と同等の性能を得ることができる。Retroの性能は、微調整の後、質問応答のような下流の知識集約的なタスクに反映される。Retroは、凍結バートレトリバー、微分可能エンコーダー、チャンク型クロスアテンションメカニズムを組み合わせ、学習時に消費されるデータよりも一桁多いデータを基にトークンを予測します。私たちは通常、Retroをゼロから学習しますが、事前に学習した変換器を検索に迅速にRetrofitすることも可能であり、それでも良好な性能を達成することができます。私たちの研究は、前例のない規模の明示的記憶によって言語モデルを改善する新しい道を開くものです。
コード
解決した課題/先行研究との比較
技術・手法のポイント
評価指標
用いたデータセット
比較した指標
比較対象のモデル
いずれにおいてもBaselineからの改良がみられ、Fine-tuningを行うことでQ&A taskでもstate-of-the-artとのcompetitive performanceを示した。検索なしでもbaselineと同程度の性能が出る。
残された課題・議論
重要な引用
The text was updated successfully, but these errors were encountered: