Skip to content

Commit

Permalink
FEAT: Support qwen-1.5 series (#994)
Browse files Browse the repository at this point in the history
Co-authored-by: codingl2k1 <codingl2k1@outlook.com>
  • Loading branch information
aresnow1 and codingl2k1 authored Feb 6, 2024
1 parent b4cdc38 commit e903e05
Show file tree
Hide file tree
Showing 11 changed files with 958 additions and 8 deletions.
7 changes: 7 additions & 0 deletions doc/source/models/builtin/llm/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,11 @@ The following is a list of built-in LLM in Xinference:
- 4096
- Qwen-VL-Chat supports more flexible interaction, such as multiple image inputs, multi-round question answering, and creative capabilities.

* - :ref:`qwen1.5-chat <models_llm_qwen1.5-chat>`
- chat, tools
- 32768
- Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data.

* - :ref:`skywork <models_llm_skywork>`
- generate
- 4096
Expand Down Expand Up @@ -401,6 +406,8 @@ The following is a list of built-in LLM in Xinference:

qwen-vl-chat

qwen1.5-chat

skywork

skywork-math
Expand Down
45 changes: 45 additions & 0 deletions doc/source/models/builtin/llm/llama-2-chat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -103,3 +103,48 @@ chosen quantization method from the options listed above::

xinference launch --model-name llama-2-chat --size-in-billions 70 --model-format pytorch --quantization ${quantization}


Model Spec 7 (ggufv2, 7 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** ggufv2
- **Model Size (in billions):** 7
- **Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0
- **Model ID:** TheBloke/Llama-2-7B-Chat-GGUF
- **Model Hubs**: `Hugging Face <https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF>`__, `ModelScope <https://modelscope.cn/models/Xorbits/Llama-2-7b-Chat-GGUF>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name llama-2-chat --size-in-billions 7 --model-format ggufv2 --quantization ${quantization}


Model Spec 8 (ggufv2, 13 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** ggufv2
- **Model Size (in billions):** 13
- **Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0
- **Model ID:** TheBloke/Llama-2-13B-chat-GGUF
- **Model Hubs**: `Hugging Face <https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF>`__, `ModelScope <https://modelscope.cn/models/Xorbits/Llama-2-13b-Chat-GGUF>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name llama-2-chat --size-in-billions 13 --model-format ggufv2 --quantization ${quantization}


Model Spec 9 (ggufv2, 70 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** ggufv2
- **Model Size (in billions):** 70
- **Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0
- **Model ID:** TheBloke/Llama-2-70B-Chat-GGUF
- **Model Hubs**: `Hugging Face <https://huggingface.co/TheBloke/Llama-2-70B-Chat-GGUF>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name llama-2-chat --size-in-billions 70 --model-format ggufv2 --quantization ${quantization}

Loading

0 comments on commit e903e05

Please sign in to comment.