update LongBench score (#259)

* Added a reply to saved_model size * update longbench score
ymcui · Sep 11, 2023 · cedb6da · cedb6da
1 parent 043819a
commit cedb6da
Show file tree

Hide file tree

Showing 2 changed files with 16 additions and 16 deletions.
diff --git a/README.md b/README.md
@@ -245,14 +245,14 @@
 
 | Models                       | 单文档QA | 多文档QA | 摘要 | Few-shot学习 | 代码补全 | 合成任务 | Avg  |
 | ---------------------------- | :------: | :------: | :--: | :----------: | :------: | :------: | :--: |
-| **Chinese-Alpaca-2-13B-16K** |   48.1   |   26.0   | 12.8 |     23.3     |   45.5   |   21.5   | 29.5 |
-| Chinese-Alpaca-2-13B         |   38.4   |   20.0   | 12.2 |     18.0     |   46.2   |   9.0    | 24.0 |
-| **Chinese-Alpaca-2-7B-16K**  |   46.6   |   23.6   | 14.5 |     29.0     |   47.1   |   9.0    | 28.3 |
-| Chinese-Alpaca-2-7B          |   32.0   |   17.2   | 11.5 |     21.5     |   48.8   |   5.0    | 22.7 |
-| **Chinese-LLaMA-2-13B-16K**  |   37.3   |   18.1   | 3.4  |     30.8     |   13.0   |   3.0    | 17.6 |
-| Chinese-LLaMA-2-13B          |   26.7   |   14.0   | 4.4  |     16.3     |   9.8    |   5.5    | 12.8 |
-| **Chinese-LLaMA-2-7B-16K**   |   33.7   |   16.5   | 5.3  |     24.3     |   9.9    |   4.2    | 15.6 |
-| Chinese-LLaMA-2-7B           |   20.7   |   14.5   | 6.5  |     12.8     |   11.5   |   5.3    | 11.9 |
+| **Chinese-Alpaca-2-13B-16K** |   47.9  |   26.7 | 13.0 |     22.3    |   46.6   |   21.5   | 29.7 |
+| Chinese-Alpaca-2-13B         |   38.4   |   20.0   | 11.9 |     17.3    |   46.5   |   8.0    | 23.7 |
+| **Chinese-Alpaca-2-7B-16K**  |   46.4  |   23.3  | 14.3 |     29.0     |   49.6   |   9.0    | 28.6 |
+| Chinese-Alpaca-2-7B          |   34.0   |   17.4   | 11.8 |     21.3    |   50.3  |   4.5    | 23.2 |
+| **Chinese-LLaMA-2-13B-16K**  |   36.7   |   17.7  | 3.1 |     29.8     |   13.8   |   3.0    | 17.3 |
+| Chinese-LLaMA-2-13B          |   28.3   |   14.4   | 4.6 |     16.3     |   10.4   |   5.4    | 13.2 |
+| **Chinese-LLaMA-2-7B-16K**   |   33.2   |   15.9   | 6.5 |     23.5     |   10.3    |   5.3    | 15.8|
+| Chinese-LLaMA-2-7B           |   19.0   |   13.9   | 6.4  |     11.0    |   11.0   |   4.7    | 11.0 |
 
 ### 量化效果评测
 

diff --git a/README_EN.md b/README_EN.md
@@ -236,14 +236,14 @@ In order to intuitively understand the generation performance of the model, this
 
 | Models                      | Single-doc QA | Multi-doc QA | Summarization | Few-shot Learning | Code Completion | Synthetic Task | Avg  |
 | --------------------------- | :-----------: | :----------: | :-----------: | :---------------: | :-------------: | :------------: | :--: |
-| **Chinese-Alpaca-2-13B-16K** |   48.1   |   26.0   | 12.8 |     23.3     |   45.5   |   21.5   | 29.5 |
-| Chinese-Alpaca-2-13B         |   38.4   |   20.0   | 12.2 |     18.0     |   46.2   |   9.0    | 24.0 |
-| **Chinese-Alpaca-2-7B-16K**  |   46.6   |   23.6   | 14.5 |     29.0     |   47.1   |   9.0    | 28.3 |
-| Chinese-Alpaca-2-7B          |   32.0   |   17.2   | 11.5 |     21.5     |   48.8   |   5.0    | 22.7 |
-| **Chinese-LLaMA-2-13B-16K**  |   37.3   |   18.1   | 3.4  |     30.8     |   13.0   |   3.0    | 17.6 |
-| Chinese-LLaMA-2-13B          |   26.7   |   14.0   | 4.4  |     16.3     |   9.8    |   5.5    | 12.8 |
-| **Chinese-LLaMA-2-7B-16K**   |   33.7   |   16.5   | 5.3  |     24.3     |   9.9    |   4.2    | 15.6 |
-| Chinese-LLaMA-2-7B           |   20.7   |   14.5   | 6.5  |     12.8     |   11.5   |   5.3    | 11.9 |
+| **Chinese-Alpaca-2-13B-16K** |   47.9  |   26.7 | 13.0 |     22.3    |   46.6   |   21.5   | 29.7 |
+| Chinese-Alpaca-2-13B         |   38.4   |   20.0   | 11.9 |     17.3    |   46.5   |   8.0    | 23.7 |
+| **Chinese-Alpaca-2-7B-16K**  |   46.4  |   23.3  | 14.3 |     29.0     |   49.6   |   9.0    | 28.6 |
+| Chinese-Alpaca-2-7B          |   34.0   |   17.4   | 11.8 |     21.3    |   50.3  |   4.5    | 23.2 |
+| **Chinese-LLaMA-2-13B-16K**  |   36.7   |   17.7  | 3.1 |     29.8     |   13.8   |   3.0    | 17.3 |
+| Chinese-LLaMA-2-13B          |   28.3   |   14.4   | 4.6 |     16.3     |   10.4   |   5.4    | 13.2 |
+| **Chinese-LLaMA-2-7B-16K**   |   33.2   |   15.9   | 6.5 |     23.5     |   10.3    |   5.3    | 15.8|
+| Chinese-LLaMA-2-7B           |   19.0   |   13.9   | 6.4  |     11.0    |   11.0   |   4.7    | 11.0 |
 
 ### Quantization Evaluation