-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
关于多卡部署 #1199
Comments
多卡推理的话,你这个是哪个代码,而且我看了你这个错误好像是驱动和cuda级的,不是代码错误,请你发个完整的官方代码运行的位置和完整报错 |
我找到出bug的原因了,当我在一个py文件中同时导入blip2和glm-6b模型时,就会报错,如果只是导入单一模型则没有问题。相关代码如下: tokenizer = AutoTokenizer.from_pretrained("./chatglm3-6b", trust_remote_code=True) 完整报错如下: Traceback (most recent call last): |
嗯那,正常是单独导入,不然分配可能出现问题 |
System Info / 系統信息
2080ti多卡
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
Reproduction / 复现过程
多卡运行GLM时,运行代码generated_text_GLM, history = model_GLM.chat(tokenizer, prompt, history=[]),
显示RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling
cublasCreate(handle)
我确定cuda版本正确,并且batchsize只有1. 请问有人遇到过类似问题吗
Expected behavior / 期待表现
The text was updated successfully, but these errors were encountered: