xorbitsai · ChengjieLi28 · May 15, 2024 · May 13, 2024 · May 13, 2024 · May 13, 2024
diff --git a/doc/source/getting_started/installation.rst b/doc/source/getting_started/installation.rst
@@ -43,7 +43,7 @@ Currently, supported models include:
 - ``baichuan``, ``baichuan-chat``, ``baichuan-2-chat``
 - ``internlm-16k``, ``internlm-chat-7b``, ``internlm-chat-8k``, ``internlm-chat-20b``
 - ``mistral-v0.1``, ``mistral-instruct-v0.1``, ``mistral-instruct-v0.2``
-- ``Yi``, ``Yi-chat``
+- ``Yi``, ``Yi-1.5``, ``Yi-chat``, ``Yi-1.5-chat``
 - ``code-llama``, ``code-llama-python``, ``code-llama-instruct``
 - ``c4ai-command-r-v01``, ``c4ai-command-r-v01-4bit``
 - ``vicuna-v1.3``, ``vicuna-v1.5``

diff --git a/doc/source/models/builtin/llm/codeqwen1.5-chat.rst b/doc/source/models/builtin/llm/codeqwen1.5-chat.rst
@@ -4,7 +4,7 @@
 codeqwen1.5-chat
 ========================================
 
-- **Context Length:** 32768
+- **Context Length:** 65536
 - **Model Name:** codeqwen1.5-chat
 - **Languages:** en, zh
 - **Abilities:** chat

diff --git a/doc/source/models/builtin/llm/index.rst b/doc/source/models/builtin/llm/index.rst
@@ -108,7 +108,7 @@ The following is a list of built-in LLM in Xinference:
 
    * - :ref:`codeqwen1.5-chat <models_llm_codeqwen1.5-chat>`
      - chat
-     - 32768
+     - 65536
      - CodeQwen1.5 is the Code-Specific version of Qwen1.5. It is a transformer-based decoder-only language model pretrained on a large amount of data of codes.
 
    * - :ref:`codeshell <models_llm_codeshell>`
@@ -381,6 +381,11 @@ The following is a list of built-in LLM in Xinference:
      - 8192
      - Starcoderplus is an open-source LLM trained by fine-tuning Starcoder on RedefinedWeb and StarCoderData datasets.
 
+   * - :ref:`starling-lm <models_llm_starling-lm>`
+     - chat
+     - 4096
+     - We introduce Starling-7B, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). The model harnesses the power of our new GPT-4 labeled ranking dataset
+
    * - :ref:`tiny-llama <models_llm_tiny-llama>`
      - generate
      - 2048
@@ -431,19 +436,29 @@ The following is a list of built-in LLM in Xinference:
      - 4096
      - The Yi series models are large language models trained from scratch by developers at 01.AI.
 
+   * - :ref:`yi-1.5 <models_llm_yi-1.5>`
+     - generate
+     - 4096
+     - Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples.
+
+   * - :ref:`yi-1.5-chat <models_llm_yi-1.5-chat>`
+     - chat
+     - 4096
+     - Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples.
+
    * - :ref:`yi-200k <models_llm_yi-200k>`
      - generate
-     - 204800
+     - 262144
      - The Yi series models are large language models trained from scratch by developers at 01.AI.
 
    * - :ref:`yi-chat <models_llm_yi-chat>`
      - chat
-     - 204800
+     - 4096
      - The Yi series models are large language models trained from scratch by developers at 01.AI.
 
    * - :ref:`yi-vl-chat <models_llm_yi-vl-chat>`
      - chat, vision
-     - 204800
+     - 4096
      - Yi Vision Language (Yi-VL) model is the open-source, multimodal version of the Yi Large Language Model (LLM) series, enabling content comprehension, recognition, and multi-round conversations about images.
 
    * - :ref:`zephyr-7b-alpha <models_llm_zephyr-7b-alpha>`
@@ -607,6 +622,8 @@ The following is a list of built-in LLM in Xinference:
 
    starcoderplus
 
+   starling-lm
+
    tiny-llama
 
    vicuna-v1.3
@@ -627,6 +644,10 @@ The following is a list of built-in LLM in Xinference:
 
    yi
 
+   yi-1.5
+
+   yi-1.5-chat
+
    yi-200k
 
    yi-chat

diff --git a/doc/source/models/builtin/llm/yi-1.5-chat.rst b/doc/source/models/builtin/llm/yi-1.5-chat.rst
@@ -0,0 +1,60 @@
+.. _models_llm_yi-1.5-chat:
+
+========================================
+Yi-1.5-chat
+========================================
+
+- **Context Length:** 4096
+- **Model Name:** Yi-1.5-chat
+- **Languages:** en, zh
+- **Abilities:** chat
+- **Description:** Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples.
+
+Specifications
+^^^^^^^^^^^^^^
+
+
+Model Spec 1 (pytorch, 6 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 6
+- **Quantizations:** 4-bit, 8-bit, none
+- **Model ID:** 01-ai/Yi-1.5-6B-Chat
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/01-ai/Yi-1.5-6B-Chat>`__, `ModelScope <https://modelscope.cn/models/01ai/Yi-1.5-6B-Chat>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name Yi-1.5-chat --size-in-billions 6 --model-format pytorch --quantization ${quantization}
+
+
+Model Spec 2 (pytorch, 9 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 9
+- **Quantizations:** 4-bit, 8-bit, none
+- **Model ID:** 01-ai/Yi-1.5-9B-Chat
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/01-ai/Yi-1.5-9B-Chat>`__, `ModelScope <https://modelscope.cn/models/01ai/Yi-1.5-9B-Chat>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name Yi-1.5-chat --size-in-billions 9 --model-format pytorch --quantization ${quantization}
+
+
+Model Spec 3 (pytorch, 34 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 34
+- **Quantizations:** 4-bit, 8-bit, none
+- **Model ID:** 01-ai/Yi-1.5-34B-Chat
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/01-ai/Yi-1.5-34B-Chat>`__, `ModelScope <https://modelscope.cn/models/01ai/Yi-1.5-34B-Chat>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name Yi-1.5-chat --size-in-billions 34 --model-format pytorch --quantization ${quantization}
+
diff --git a/doc/source/models/builtin/llm/yi-1.5.rst b/doc/source/models/builtin/llm/yi-1.5.rst
@@ -0,0 +1,60 @@
+.. _models_llm_yi-1.5:
+
+========================================
+Yi-1.5
+========================================
+
+- **Context Length:** 4096
+- **Model Name:** Yi-1.5
+- **Languages:** en, zh
+- **Abilities:** generate
+- **Description:** Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples.
+
+Specifications
+^^^^^^^^^^^^^^
+
+
+Model Spec 1 (pytorch, 6 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 6
+- **Quantizations:** 4-bit, 8-bit, none
+- **Model ID:** 01-ai/Yi-1.5-6B
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/01-ai/Yi-1.5-6B>`__, `ModelScope <https://modelscope.cn/models/01ai/Yi-1.5-6B>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name Yi-1.5 --size-in-billions 6 --model-format pytorch --quantization ${quantization}
+
+
+Model Spec 2 (pytorch, 9 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 9
+- **Quantizations:** 4-bit, 8-bit, none
+- **Model ID:** 01-ai/Yi-1.5-9B
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/01-ai/Yi-1.5-9B>`__, `ModelScope <https://modelscope.cn/models/01ai/Yi-1.5-9B>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name Yi-1.5 --size-in-billions 9 --model-format pytorch --quantization ${quantization}
+
+
+Model Spec 3 (pytorch, 34 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 34
+- **Quantizations:** 4-bit, 8-bit, none
+- **Model ID:** 01-ai/Yi-1.5-34B
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/01-ai/Yi-1.5-34B>`__, `ModelScope <https://modelscope.cn/models/01ai/Yi-1.5-34B>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name Yi-1.5 --size-in-billions 34 --model-format pytorch --quantization ${quantization}
+
diff --git a/doc/source/models/builtin/llm/yi-200k.rst b/doc/source/models/builtin/llm/yi-200k.rst
@@ -4,7 +4,7 @@
 Yi-200k
 ========================================
 
-- **Context Length:** 204800
+- **Context Length:** 262144
 - **Model Name:** Yi-200k
 - **Languages:** en, zh
 - **Abilities:** generate

diff --git a/doc/source/models/builtin/llm/yi-chat.rst b/doc/source/models/builtin/llm/yi-chat.rst
@@ -4,7 +4,7 @@
 Yi-chat
 ========================================
 
-- **Context Length:** 204800
+- **Context Length:** 4096
 - **Model Name:** Yi-chat
 - **Languages:** en, zh
 - **Abilities:** chat
@@ -29,7 +29,22 @@ chosen quantization method from the options listed above::
    xinference launch --model-name Yi-chat --size-in-billions 34 --model-format gptq --quantization ${quantization}
 
 
-Model Spec 2 (pytorch, 34 Billion)
+Model Spec 2 (pytorch, 6 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 6
+- **Quantizations:** 4-bit, 8-bit, none
+- **Model ID:** 01-ai/Yi-6B-Chat
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/01-ai/Yi-6B-Chat>`__, `ModelScope <https://modelscope.cn/models/01ai/Yi-6B-Chat>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-name Yi-chat --size-in-billions 6 --model-format pytorch --quantization ${quantization}
+
+
+Model Spec 3 (pytorch, 34 Billion)
 ++++++++++++++++++++++++++++++++++++++++
 
 - **Model Format:** pytorch
@@ -44,7 +59,7 @@ chosen quantization method from the options listed above::
    xinference launch --model-name Yi-chat --size-in-billions 34 --model-format pytorch --quantization ${quantization}
 
 
-Model Spec 3 (ggufv2, 34 Billion)
+Model Spec 4 (ggufv2, 34 Billion)
 ++++++++++++++++++++++++++++++++++++++++
 
 - **Model Format:** ggufv2

diff --git a/doc/source/models/builtin/llm/yi-vl-chat.rst b/doc/source/models/builtin/llm/yi-vl-chat.rst
@@ -4,7 +4,7 @@
 yi-vl-chat
 ========================================
 
-- **Context Length:** 204800
+- **Context Length:** 4096
 - **Model Name:** yi-vl-chat
 - **Languages:** en, zh
 - **Abilities:** chat, vision

diff --git a/doc/source/user_guide/backends.rst b/doc/source/user_guide/backends.rst
@@ -50,7 +50,7 @@ Currently, supported model includes:
 - ``baichuan``, ``baichuan-chat``, ``baichuan-2-chat``
 - ``internlm-16k``, ``internlm-chat-7b``, ``internlm-chat-8k``, ``internlm-chat-20b``
 - ``mistral-v0.1``, ``mistral-instruct-v0.1``, ``mistral-instruct-v0.2``
-- ``Yi``, ``Yi-chat``
+- ``Yi``, ``Yi-1.5``, ``Yi-chat``, ``Yi-1.5-chat``
 - ``code-llama``, ``code-llama-python``, ``code-llama-instruct``
 - ``c4ai-command-r-v01``, ``c4ai-command-r-v01-4bit``
 - ``vicuna-v1.3``, ``vicuna-v1.5``