-
Notifications
You must be signed in to change notification settings - Fork 733
Not planned
Description
System Info / 系統信息
docker
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
- docker / dockerpip install / 通过 pip install 安装installation from source / 从源码安装To pick up a draggable item, press the space bar. While dragging, use the arrow keys to move the item. Press space again to drop the item in its new position, or press escape to cancel.
Version info / 版本信息
0.15.2
The command used to start Xinference / 用以启动 xinference 的命令
分布式场景,正常启动 supervisor 和 worker
supervisor启动指定了supervisor-port
在worker上启动一个模型,如:bge-m3
Reproduction / 复现过程
重启supervisor,前端无法查看正在运行的模型 bge-m3;模型服务不可用;
Expected behavior / 期待表现
- supervisor重启后,已经运行的模型正常
- 模型服务正常
Metadata
Metadata
Assignees
Labels
Type
Projects
Relationships
Development
Select code repository
Activity
pkunight commentedon Oct 10, 2024
我也发现了这个问题, 必须先启动supervisor, 后启动worker, 而且此后连接不能中断. 否则即使supervisor成功重启了, worker依然会持续报错连不上supervisor的ip地址.
paradin commentedon Oct 10, 2024
启动supervisor时指定supervisor-port的话,重启supervisor后是能够让worker连上的(因为supervisor端口固定了)
但是问题是现在supervisor是有状态的,重启后woker虽然能report_status,但是却没有report running models status
如果supervisor能实现无状态(比如通过redis共享),还能解决目前supervisor单点问题
ak47947 commentedon Oct 12, 2024
我也发现这个问题了,如果这个问题不解决,是没法真正集群使用的
github-actions commentedon Oct 19, 2024
This issue is stale because it has been open for 7 days with no activity.
github-actions commentedon Oct 24, 2024
This issue was closed because it has been inactive for 5 days since being marked as stale.
github-actions commentedon Dec 19, 2024
This issue is stale because it has been open for 7 days with no activity.
github-actions commentedon Dec 25, 2024
This issue was closed because it has been inactive for 5 days since being marked as stale.