Add Chinese Mixtral paper (#20)

paper: https://arxiv.org/abs/2403.01851 --------- Co-authored-by: ymcui <16095339+ymcui@users.noreply.github.com>
ymcui · Mar 5, 2024 · a0e4b8c · a0e4b8c
1 parent 7f32e79
commit a0e4b8c
Show file tree

Hide file tree

Showing 3 changed files with 46 additions and 12 deletions.
diff --git a/CITATION.cff b/CITATION.cff
@@ -0,0 +1,24 @@
+cff-version: 1.2.0
+message: "Please cite our paper as below."
+authors:
+- family-names: "Cui"
+  given-names: "Yiming"
+  orcid: "https://orcid.org/0000-0002-2452-375X"
+- family-names: "Yao"
+  given-names: "Xin"  
+title: "Chinese Mixtral"
+version: 1.0
+date-released: 2024-03-05
+url: "https://github.com/ymcui/Chinese-Mixtral"
+preferred-citation: 
+  type: article
+  authors:
+  - family-names: "Cui"
+    given-names: "Yiming"
+    orcid: "https://orcid.org/0000-0002-2452-375X"
+  - family-names: "Yao"
+    given-names: "Xin"  
+  title: "Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral"
+  journal: "arXiv pre-print"
+  year: 2024
+  url: "https://arxiv.org/abs/2403.01851"
diff --git a/README.md b/README.md
@@ -14,6 +14,8 @@
 
 本项目基于Mistral.ai发布的[Mixtral模型](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)进行开发，该模型使用了稀疏混合专家模型（Sparse MoE）架构。本项目利用大规模中文无标注数据进行了中文增量训练，得到了**中文Mixtral**基础模型，并且进一步通过指令精调，得到了**中文Mixtral-Instruct**指令模型。该模型原生支持**32K上下文（实测可达128K）**，能够有效地处理长文本，同时在数学推理、代码生成等方面获得了显著性能提升。使用llama.cpp进行量化推理时，最低只需16G内存（或显存）。
 
+**技术报告**：[[Cui and Yao, 2024] Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral](https://arxiv.org/abs/2403.01851)
+
 #### 本项目主要内容
 
 - 🚀 开源中文Mixtral基础模型，该模型在[Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)的基础上进行了中文增量训练
@@ -29,7 +31,9 @@
 
 ## 新闻
 
-**[2024/01/29] 🚀 正式发布Chinese-Mixtral（基座模型），Chinese-Mixtral-Instruct（指令/chat模型）。详情查看：[📚v1.0版本发布日志](https://github.com/ymcui/Chinese-Mixtral/releases/tag/v1.0)**
+**[2024/03/05] 开源模型训练和精调代码，发布技术报告。详情查看：[📚v1.1版本发布日志](https://github.com/ymcui/Chinese-Mixtral/releases/tag/v1.1)**
+
+[2024/01/29] 🚀 正式发布Chinese-Mixtral（基座模型），Chinese-Mixtral-Instruct（指令/chat模型）。详情查看：[📚v1.0版本发布日志](https://github.com/ymcui/Chinese-Mixtral/releases/tag/v1.0)
 
 
 ## 内容导引
@@ -246,11 +250,12 @@ Mixtral是一个稀疏混合专家模型。该模型与以往的LLaMA等主流
 ## 引用
 
 ```tex
-@misc{chinese-mixtral,
-  title={Chinese Mixtral},
-  author={Cui, Yiming and Yao, Xin},
-  howpublished={\url{https://github.com/ymcui/Chinese-Mixtral}},
-  year={2024}
+@article{chinese-mixtral,
+      title={Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral}, 
+      author={Cui, Yiming and Yao, Xin},
+      journal={arXiv preprint arXiv:2403.01851},
+      url={https://arxiv.org/abs/2403.01851},
+      year={2024}
 }
 ```
 

diff --git a/README_EN.md b/README_EN.md
@@ -14,6 +14,8 @@
 
 This project is developed based on the [Mixtral model](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) released by Mistral.ai, which utilizes a Sparse Mixture of Experts (MoE) architecture. This project involves the use of large-scale Chinese unannotated data for incremental training in Chinese, resulting in the **Chinese Mixtral** base model. Further fine-tuning with instructions led to the creation of the **Chinese Mixtral-Instruct** instruction model. This model natively supports a **32K context (tested up to 128K)** and is capable of effectively processing long texts, while also showing significant performance improvements in areas like mathematical reasoning and code generation. When using llama.cpp for quantized inference, a minimum of only 16GB of memory (or VRAM) is required.
 
+**Paper**: [[Cui and Yao, 2024] Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral](https://arxiv.org/abs/2403.01851)
+
 #### Main Contents of This Project
 
 - 🚀 Open-sourced Chinese Mixtral base model, incrementally trained in Chinese on top of [Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)
@@ -29,7 +31,9 @@ This project is developed based on the [Mixtral model](https://huggingface.co/mi
 
 ## News
 
-**[2024/01/29] 🚀 Official release of Chinese-Mixtral (Base Model), Chinese-Mixtral-Instruct (Instruction/Chat Model). For more details, see: [📚 Version 1.0 Release Notes](https://github.com/ymcui/Chinese-Mixtral/releases/tag/v1.0)**
+**[2024/03/05] Release pre-training and fine-tuning scripts. Technical reports are also available. See: [📚 v1.1 Release Notes](https://github.com/ymcui/Chinese-Mixtral/releases/tag/v1.1)**
+
+[2024/01/29] 🚀 Official release of Chinese-Mixtral (Base Model), Chinese-Mixtral-Instruct (Instruction/Chat Model). For more details, see: [📚 v1.0 Release Notes](https://github.com/ymcui/Chinese-Mixtral/releases/tag/v1.0)
 
 
 ## Content Guide
@@ -246,11 +250,12 @@ Question 3: Is the downstream ecosystem of Mixtral supported?
 ## Citation
 
 ```tex
-@misc{chinese-mixtral,
-  title={Chinese Mixtral},
-  author={Cui, Yiming and Yao, Xin},
-  howpublished={\url{https://github.com/ymcui/Chinese-Mixtral}},
-  year={2024}
+@article{chinese-mixtral,
+      title={Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral}, 
+      author={Cui, Yiming and Yao, Xin},
+      journal={arXiv preprint arXiv:2403.01851},
+      url={https://arxiv.org/abs/2403.01851},
+      year={2024}
 }
 ```