Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SPHINXINTL ?= sphinx-intl
SOURCEDIR = source
BUILDDIR = build

# the i18n builder cannot share the environment and doctrees with the others
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) $(SOURCEDIR)
I18NSPHINXLANGS = -l zh_CN

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile html_zh_cn html_ja_jp gettext

html_zh_cn:
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) -t zh_cn -D language='zh_CN' "$(SOURCEDIR)" $(BUILDDIR)/html_zh_cn

gettext:
$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
$(SPHINXINTL) update -p $(BUILDDIR)/locale $(I18NSPHINXLANGS)
python $(SOURCEDIR)/norm_zh.py

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
42 changes: 42 additions & 0 deletions docs/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=source/_build

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.https://www.sphinx-doc.org/
exit /b 1
)

REM Check if the first parameter is provided
if "%1" == "" goto help

REM Initialize command parameters
set CMD_PARAMS=%*

REM Add -D language='zh_CN' if the first parameter is provided (i.e., if %1 is not empty)
set CMD_PARAMS=%CMD_PARAMS% -D language='zh_CN'

%SPHINXBUILD% -M %CMD_PARAMS% %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
4 changes: 4 additions & 0 deletions docs/source/_build/.buildinfo
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Sphinx build info version 1
# This file records the configuration used when building these files. When it is not found, a full rebuild will be done.
config: b78d9368253bdee29cbc70295b940666
tags: 645f666f9bcd5a90fca523b33c5a78b7
Empty file added docs/source/_build/.nojekyll
Empty file.
38 changes: 38 additions & 0 deletions docs/source/_build/_sources/getting_started/environments.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
.. _environments:

======================
Environments Requirement
======================


Recommended Systems
~~~~~~~~~~~~~~~~~~~~

Xinference supports the following operating systems:

- **Ubuntu 20.04 / 22.04** (Recommended)
- **CentOS 7 / Rocky Linux 8**
- **Windows 10/11 with WSL2**


Recommended CUDA
~~~~~~~~~~~~~~~~~~~~

Xinference Recommended the following CUDA version:

- **Driver Version 550.127.08** - `Download Driver <https://www.nvidia.cn/drivers/lookup/>`_
- **CUDA Version 12.4** - `Download CUDA <https://developer.nvidia.com/cuda-12-4-0-download-archive>`_


Recommended Docker
~~~~~~~~~~~~~~~~~~~~

Here are the recommended Docker versions for different environments:

- Docker >= 19.03 (recommended, but some distributions may include older versions of Docker. The minimum supported version is 1.12)

- `How to install Docker <https://docs.docker.com/engine/install/>`_

- NVIDIA Container Toolkit >= 1.7.0

- `How to install NVIDIA Container Toolkit <https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.13.5/install-guide.html#install-guide>`_
17 changes: 17 additions & 0 deletions docs/source/_build/_sources/getting_started/index.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
.. _getting_started_index:

===============
Getting Started
===============


.. toctree::
:maxdepth: 2

installation
using_xinference
logging
using_docker_image
using_kubernetes
troubleshooting
environments
181 changes: 181 additions & 0 deletions docs/source/_build/_sources/getting_started/installation.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
.. _installation:

============
Installation
============

Xinference can be installed with ``docker`` on Nvidia, NPU, GCU, and DCU.To run models using Xinference, you will need to pull the image corresponding to the type of device you intend to serve.



Nvidia
-------------------

To pull the Nvidia image, run the following command:

.. code-block:: bash

docker login --username=qin@qinxuye.me registry.cn-hangzhou.aliyuncs.com
Password: cre.uwd3nyn4UDM6fzm
docker pull registry.cn-hangzhou.aliyuncs.com/xinference-prod/xinference-prod:0.0.10-nvidia


Run Command Example
^^^^^^^^^^^^^^^^^^^

To run the container, use the following command:

.. code-block:: bash

docker run -it \
--name Xinf \
--network host \
--gpus all \
--restart unless-stopped \
-v </your/home/path>/.xinference:/root/.xinference \
-v </your/home/path>/.cache/huggingface:/root/.cache/huggingface \
-v </your/home/path>/.cache/modelscope:/root/.cache/modelscope \
registry.cn-hangzhou.aliyuncs.com/xinference-prod/xinference-prod:0.0.10-nvidia /bin/bash

Start Xinference
^^^^^^^^^^^^^^^^^^^

After starting the container, navigate to the `/opt/projects` directory inside the container and run the following command:

.. code-block:: bash

./xinf-enterprise.sh --host 192.168.10.197 --port 9997 && \
XINFERENCE_MODEL_SRC=modelscope xinference-local --host 192.168.10.197 --port 9997 --log-level debug

The `./xinf-enterprise.sh` script is used to start the Nginx service and write the Xinf service startup address to the configuration file.

The Xinf service startup command can be adjusted according to actual requirements. The `host` and `port` should be adjusted according to your device's configuration.

Once the Xinf service is started, you can access the Xinf WebUI interface by visiting port 8000.

MindIE Series
-------------------

Version Information
^^^^^^^^^^^^^^^^^^^
- Python Version: 3.10
- CANN Version: 8.0.rc2
- Operating System Version: ubuntu_22.04
- mindie_1.0.RC2


Dependencies
^^^^^^^^^^^^^^^^^^^
For 310I DUO:
- Driver: Ascend-hdk-310p-npu-driver_24.1.rc2_linux-aarch64.run - `Download <https://obs-whaicc-fae-public.obs.cn-central-221.ovaijisuan.com/cann/mindie/1.0.RC2/310p/Ascend-hdk-310p-npu-driver_24.1.rc2_linux-aarch64.run>`_
- Firmware: Ascend-hdk-310p-npu-firmware_7.3.0.1.231.run - `Download <https://obs-whaicc-fae-public.obs.cn-central-221.ovaijisuan.com/cann/mindie/1.0.RC2/310p/Ascend-hdk-310p-npu-firmware_7.3.0.1.231.run>`_

For 910B:
- Driver: Ascend-hdk-910b-npu-driver_24.1.rc3_linux-aarch64.run - `Download <https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Ascend%20HDK/Ascend%20HDK%2024.1.RC3/Ascend-hdk-910b-npu-driver_24.1.rc3_linux-aarch64.run?response-content-type=application/octet-stream>`_
- Firmware: Ascend-hdk-910b-npu-firmware_7.5.0.1.129.run - `Download <https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Ascend%20HDK/Ascend%20HDK%2024.1.RC3/Ascend-hdk-910b-npu-firmware_7.5.0.1.129.run?response-content-type=application/octet-stream>`_

Download the `.run` packages to the host machine, and then run the following commands to install the drivers and firmware:

.. code-block:: bash

chmod +x Ascend-hdk-910b-npu-driver_24.1.rc3_linux-aarch64.run
./Ascend-hdk-910b-npu-firmware_7.5.0.1.129.run --full

Once the installation is complete, the output should indicate "successfully," confirming the installation. The firmware installation method is the same.

When Mindie does not start properly, verify that the driver and firmware versions match. Both the driver and firmware must be installed on the host machine and loaded into the Docker container via mounting.

For version upgrades, install the firmware first, then the driver.

Pull the Image
^^^^^^^^^^^^^^^^^^^
For 310I DUO:

.. code-block:: bash

docker login --username=qin@qinxuye.me registry.cn-hangzhou.aliyuncs.com
Password: cre.uwd3nyn4UDM6fzm
docker pull registry.cn-hangzhou.aliyuncs.com/xinference-prod/xinference-prod:0.0.10-310p

For 910B:

.. code-block:: bash

docker login --username=qin@qinxuye.me registry.cn-hangzhou.aliyuncs.com
Password: cre.uwd3nyn4UDM6fzm
docker pull registry.cn-hangzhou.aliyuncs.com/xinference-prod/xinference-prod:0.0.10-910b

Run Command Example
^^^^^^^^^^^^^^^^^^^
To run the container, use the following command:

.. code-block:: bash

docker run --name MindIE-Xinf -it \
-d \
--net=host \
--shm-size=500g \
--privileged=true \
-w /opt/projects \
--device=/dev/davinci_manager \
--device=/dev/hisi_hdc \
--device=/dev/devmm_svm \
--entrypoint=bash \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/sbin:/usr/local/sbin \
-v /home:/home \
-v /root:/root/model \
-v /tmp:/tmp \
-v </your/home/path>/.xinference:/root/.xinference \
-v </your/home/path>/.cache/huggingface:/root/.cache/huggingface \
-v </your/home/path>/.cache/modelscope:/root/.cache/modelscope \
-e http_proxy=$http_proxy \
-e https_proxy=$https_proxy \
registry.cn-hangzhou.aliyuncs.com/xinference-prod/xinference-prod:0.0.10-910b

Start Xinference
^^^^^^^^^^^^^^^^^^^
After starting the container, navigate to the `/opt/projects` directory inside the container and run the following command:

.. code-block:: bash

./xinf-enterprise.sh --host 192.168.10.197 --port 9997 && \
XINFERENCE_MODEL_SRC=modelscope xinference-local --host 192.168.10.197 --port 9997 --log-level debug

The `./xinf-enterprise.sh` script starts the Nginx service and writes the Xinf service startup address to the configuration file.

The Xinf service startup command can be adjusted according to your needs. Adjust the `host` and `port` according to your device's configuration.

Once the Xinf service is started, you can access the Xinf WebUI by visiting port 8000.

Supported Models
^^^^^^^^^^^^^^^^^^^

When selecting a model execution engine, we recommend using the Mindie model for faster inference speed. Other engines may have slower inference speeds and are not recommended for use.

Currently, Mindie supports the following large language models:

- baichuan-chat
- baichuan-2-chat
- chatglm3
- deepseek-chat
- deepseek-coder-instruct
- llama-3-instruct
- mistral-instruct-v0.3
- telechat
- Yi-chat
- Yi-1.5-chat
- qwen-chat
- qwen1.5-chat
- codeqwen1.5-chat
- qwen2-instruct
- csg-wukong-chat-v0.1
- qwen2.5 series (qwen2.5-instruct, qwen2.5-coder-instruct, etc.)

Embedding Models:
- bge-large-zh-v1.5

Rerank Models:
- bge-reranker-large
17 changes: 17 additions & 0 deletions docs/source/_build/_sources/index.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
.. _index:

======================
Welcome to Xinference!
======================

.. toctree::
:maxdepth: 2
:hidden:

getting_started/index


Xorbits Inference (Xinference) is an open-source platform to streamline the operation and integration
of a wide array of AI models. With Xinference, you're empowered to run inference using any open-source LLMs,
embedding models, and multimodal models either in the cloud or on your own premises, and create robust
AI-driven applications.
Loading