-
Notifications
You must be signed in to change notification settings - Fork 397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worker无法连接上master #988
Comments
看上去是连了0.0.0.0,你可以用具体的机器ip试试,例如:xinference-worker -e http://10.100.108.220:9997 --log-level=debug |
worker的那个ip我是通过传参数传进去的,后面我在sh里写死了,运行以后还是报同样的错误:
|
确定ip没错吧? |
这个是server端当前的日志,应该没错吧
|
看着supervisor日志是有Enter add_worker的,worker的报错还是跟最开始一样吗? |
是的,还是一样
|
分布式下,worker -H 指定当前 worker 的 ip |
成功了,谢谢! |
还想请教一个问题,我一台机子有八张卡,我用四张卡启了一个qwen 72b的模型,但是在launch的时候oom了,我单卡的显存是80G,肯定是够的,请问在启动的时候还需要设置什么吗?下面是我sh的命令:
|
在启动master之后woker连接报错:
master.sh
worker.sh (MASTER_IP与master的ip一致)
报错如下:
xinference版本是拉取最新的源码然后pip install
请问这个问题应该如何解决?
The text was updated successfully, but these errors were encountered: