Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. #29

Closed
1006076811 opened this issue Mar 29, 2023 · 6 comments

Comments

@1006076811
Copy link

code02里的代码运行训练的时候报如下错误:
本人cuda版本:11.3,torch
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

@onlyfew
Copy link

onlyfew commented Mar 29, 2023

同遇到问题

@yuanzhoulvpi2017
Copy link
Owner

是多卡出现的问题么

@RoyalArsenal
Copy link

是多卡出现的问题么

是的,我也是这个情况。顺便问问大佬python版本号,我单卡可以运行但是训练卡住了,感觉是版本的问题。

@1006076811
Copy link
Author

是多卡出现的问题么

是的,目前是4卡,运行您之前旧的train能跑起来,我也怀疑是否是torch或者cuda版本问题导致的

@xiaoweiweixiao
Copy link

我也遇到了这个问题,单卡可以跑通,多卡跑就不行了。

@yuanzhoulvpi2017
Copy link
Owner

#44 (comment)

添加了单机多卡训练代码,链接放在这里,https://github.com/yuanzhoulvpi2017/zero_nlp/tree/main/Chatglm6b_ModelParallel

单机多卡、模型并行方式训练chatglm6b模型代码。
同时,结合lora算法、fp16精度、使用checkpoint等方法,可以在文本长度为1024、batchsize=4的情况下,在两个T4显卡上跑的很快乐(显卡的显存最大为16G,但是实际上卡1用了8G,卡2用了11G),甚至batchsize还可以提高。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants