Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG:多GPU环境,docker v0.9.2 modelscope源启动bge-reranker-base和bge-reranker-large模型失败 #1132

Closed
bigzws opened this issue Mar 12, 2024 · 9 comments
Labels
Milestone

Comments

@bigzws
Copy link

bigzws commented Mar 12, 2024

Web UI 显示bge-reranker-base或bge-reranker-large运行在GPU1上
image
但GPU1上无任何模型运行
使用dify验证,模型不能被调用。
但在单GPU环境,rerank模型运行在GPU0上,模型可以正常启动

@XprobeBot XprobeBot added the gpu label Mar 12, 2024
@XprobeBot XprobeBot added this to the v0.9.3 milestone Mar 12, 2024
@qinxuye
Copy link
Contributor

qinxuye commented Mar 12, 2024

调用报错有错误栈吗?

@bigzws
Copy link
Author

bigzws commented Mar 12, 2024

屏幕截图 2024-03-12 145107
bb88e6e284609d6e24a005946232a01

@qinxuye
Copy link
Contributor

qinxuye commented Mar 12, 2024

xinference 日志的错误有没有

@bigzws
Copy link
Author

bigzws commented Mar 12, 2024

imagedocker
无任何报错

@qinxuye
Copy link
Contributor

qinxuye commented Mar 12, 2024

感觉 dify 的请求没有打过来?dify 的版本是最新的吗

@bigzws
Copy link
Author

bigzws commented Mar 12, 2024

dify v0.5.8,在单卡上可以调用rerank模型。测试将qwen 1.5 运行在GPU1上,也可以被正常调用。只有bge-reranker存在这个问题

@bigzws
Copy link
Author

bigzws commented Mar 12, 2024

找到原因了,启动rerank模型失败的机器,cuda版本12.2;启动rerank模型成功的机器cuda版本12.1

@qinxuye
Copy link
Contributor

qinxuye commented Mar 12, 2024

我们镜像应该是基于12.1 的。

@qinxuye
Copy link
Contributor

qinxuye commented Mar 12, 2024

先关闭,碰到问题可以再开。

@qinxuye qinxuye closed this as completed Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants