Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

多卡并行训练报错 #44

Closed
cywjava opened this issue Mar 30, 2023 · 5 comments
Closed

多卡并行训练报错 #44

cywjava opened this issue Mar 30, 2023 · 5 comments

Comments

@cywjava
Copy link

cywjava commented Mar 30, 2023

我这里用了8张P40,指定了训练程序用,1,2,3,4 号卡
RuntimeError: Caught RuntimeError in replica 0 on device 0.

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmStridedBatchedExFix( handle, opa, opb, m, n, k, (void*)(&falpha), a, CUDA_R_16F, lda, stridea, b, CUDA_R_16F, ldb, strideb, (void*)(&fbeta), c, CUDA_R_16F, ldc, stridec, num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)

第二次再运行,卡在加载模型后,就不动了。错也不报。进程 也杀不了了,还影响了0号卡上的生成文本应用,这时候只能重启

@yuanzhoulvpi2017
Copy link
Owner

  1. 在老版本中,模型可以多卡,但是我用了新版本代码,就不能并行了。我还在调试bug,

@cywjava
Copy link
Author

cywjava commented Mar 31, 2023

  1. 在老版本中,模型可以多卡,但是我用了新版本代码,就不能并行了。我还在调试bug,

哟西,我就说怎么你之前的可以,现在的不可以了呢。 另外之前的是可以多卡,但是除了第一个卡满载运行,其它感觉 都是闲着的啊

@yuanzhoulvpi2017
Copy link
Owner

yuanzhoulvpi2017 commented Mar 31, 2023 via email

@Chenzongchao
Copy link

okok

@yuanzhoulvpi2017
Copy link
Owner

添加了单机多卡训练代码,链接放在这里,https://github.com/yuanzhoulvpi2017/zero_nlp/tree/main/Chatglm6b_ModelParallel

单机多卡、模型并行方式训练chatglm6b模型代码。
同时,结合lora算法、fp16精度、使用checkpoint等方法,可以在文本长度为1024、batchsize=4的情况下,在两个T4显卡上跑的很快乐(显卡的显存最大为16G,但是实际上卡1用了8G,卡2用了11G),甚至batchsize还可以提高。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants