Skip to content

Commit

Permalink
Add Chinese Mixtral paper (#20)
Browse files Browse the repository at this point in the history
paper: https://arxiv.org/abs/2403.01851

---------

Co-authored-by: ymcui <16095339+ymcui@users.noreply.github.com>
  • Loading branch information
ymcui and ymcui committed Mar 5, 2024
1 parent 7f32e79 commit a0e4b8c
Show file tree
Hide file tree
Showing 3 changed files with 46 additions and 12 deletions.
24 changes: 24 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
cff-version: 1.2.0
message: "Please cite our paper as below."
authors:
- family-names: "Cui"
given-names: "Yiming"
orcid: "https://orcid.org/0000-0002-2452-375X"
- family-names: "Yao"
given-names: "Xin"
title: "Chinese Mixtral"
version: 1.0
date-released: 2024-03-05
url: "https://github.com/ymcui/Chinese-Mixtral"
preferred-citation:
type: article
authors:
- family-names: "Cui"
given-names: "Yiming"
orcid: "https://orcid.org/0000-0002-2452-375X"
- family-names: "Yao"
given-names: "Xin"
title: "Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral"
journal: "arXiv pre-print"
year: 2024
url: "https://arxiv.org/abs/2403.01851"
17 changes: 11 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@

本项目基于Mistral.ai发布的[Mixtral模型](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)进行开发,该模型使用了稀疏混合专家模型(Sparse MoE)架构。本项目利用大规模中文无标注数据进行了中文增量训练,得到了**中文Mixtral**基础模型,并且进一步通过指令精调,得到了**中文Mixtral-Instruct**指令模型。该模型原生支持**32K上下文(实测可达128K)**,能够有效地处理长文本,同时在数学推理、代码生成等方面获得了显著性能提升。使用llama.cpp进行量化推理时,最低只需16G内存(或显存)。

**技术报告**[[Cui and Yao, 2024] Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral](https://arxiv.org/abs/2403.01851)

#### 本项目主要内容

- 🚀 开源中文Mixtral基础模型,该模型在[Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)的基础上进行了中文增量训练
Expand All @@ -29,7 +31,9 @@

## 新闻

**[2024/01/29] 🚀 正式发布Chinese-Mixtral(基座模型),Chinese-Mixtral-Instruct(指令/chat模型)。详情查看:[📚v1.0版本发布日志](https://github.com/ymcui/Chinese-Mixtral/releases/tag/v1.0)**
**[2024/03/05] 开源模型训练和精调代码,发布技术报告。详情查看:[📚v1.1版本发布日志](https://github.com/ymcui/Chinese-Mixtral/releases/tag/v1.1)**

[2024/01/29] 🚀 正式发布Chinese-Mixtral(基座模型),Chinese-Mixtral-Instruct(指令/chat模型)。详情查看:[📚v1.0版本发布日志](https://github.com/ymcui/Chinese-Mixtral/releases/tag/v1.0)


## 内容导引
Expand Down Expand Up @@ -246,11 +250,12 @@ Mixtral是一个稀疏混合专家模型。该模型与以往的LLaMA等主流
## 引用

```tex
@misc{chinese-mixtral,
title={Chinese Mixtral},
author={Cui, Yiming and Yao, Xin},
howpublished={\url{https://github.com/ymcui/Chinese-Mixtral}},
year={2024}
@article{chinese-mixtral,
title={Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral},
author={Cui, Yiming and Yao, Xin},
journal={arXiv preprint arXiv:2403.01851},
url={https://arxiv.org/abs/2403.01851},
year={2024}
}
```

Expand Down
17 changes: 11 additions & 6 deletions README_EN.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@

This project is developed based on the [Mixtral model](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) released by Mistral.ai, which utilizes a Sparse Mixture of Experts (MoE) architecture. This project involves the use of large-scale Chinese unannotated data for incremental training in Chinese, resulting in the **Chinese Mixtral** base model. Further fine-tuning with instructions led to the creation of the **Chinese Mixtral-Instruct** instruction model. This model natively supports a **32K context (tested up to 128K)** and is capable of effectively processing long texts, while also showing significant performance improvements in areas like mathematical reasoning and code generation. When using llama.cpp for quantized inference, a minimum of only 16GB of memory (or VRAM) is required.

**Paper**: [[Cui and Yao, 2024] Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral](https://arxiv.org/abs/2403.01851)

#### Main Contents of This Project

- 🚀 Open-sourced Chinese Mixtral base model, incrementally trained in Chinese on top of [Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)
Expand All @@ -29,7 +31,9 @@ This project is developed based on the [Mixtral model](https://huggingface.co/mi

## News

**[2024/01/29] 🚀 Official release of Chinese-Mixtral (Base Model), Chinese-Mixtral-Instruct (Instruction/Chat Model). For more details, see: [📚 Version 1.0 Release Notes](https://github.com/ymcui/Chinese-Mixtral/releases/tag/v1.0)**
**[2024/03/05] Release pre-training and fine-tuning scripts. Technical reports are also available. See: [📚 v1.1 Release Notes](https://github.com/ymcui/Chinese-Mixtral/releases/tag/v1.1)**

[2024/01/29] 🚀 Official release of Chinese-Mixtral (Base Model), Chinese-Mixtral-Instruct (Instruction/Chat Model). For more details, see: [📚 v1.0 Release Notes](https://github.com/ymcui/Chinese-Mixtral/releases/tag/v1.0)


## Content Guide
Expand Down Expand Up @@ -246,11 +250,12 @@ Question 3: Is the downstream ecosystem of Mixtral supported?
## Citation

```tex
@misc{chinese-mixtral,
title={Chinese Mixtral},
author={Cui, Yiming and Yao, Xin},
howpublished={\url{https://github.com/ymcui/Chinese-Mixtral}},
year={2024}
@article{chinese-mixtral,
title={Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral},
author={Cui, Yiming and Yao, Xin},
journal={arXiv preprint arXiv:2403.01851},
url={https://arxiv.org/abs/2403.01851},
year={2024}
}
```

Expand Down

0 comments on commit a0e4b8c

Please sign in to comment.