RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. #29

1006076811 · 2023-03-29T02:48:39Z

code02里的代码运行训练的时候报如下错误：
本人cuda版本：11.3，torch
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

The text was updated successfully, but these errors were encountered:

onlyfew · 2023-03-29T09:28:04Z

同遇到问题

yuanzhoulvpi2017 · 2023-03-29T13:19:43Z

是多卡出现的问题么

RoyalArsenal · 2023-03-29T13:56:20Z

是多卡出现的问题么

是的，我也是这个情况。顺便问问大佬python版本号，我单卡可以运行但是训练卡住了，感觉是版本的问题。

1006076811 · 2023-03-30T00:39:07Z

是多卡出现的问题么

是的，目前是4卡，运行您之前旧的train能跑起来，我也怀疑是否是torch或者cuda版本问题导致的

xiaoweiweixiao · 2023-03-30T08:34:54Z

我也遇到了这个问题，单卡可以跑通，多卡跑就不行了。

yuanzhoulvpi2017 · 2023-04-01T18:12:36Z

#44 (comment)

添加了单机多卡训练代码，链接放在这里，https://github.com/yuanzhoulvpi2017/zero_nlp/tree/main/Chatglm6b_ModelParallel

单机多卡、模型并行方式训练chatglm6b模型代码。
同时，结合lora算法、fp16精度、使用checkpoint等方法，可以在文本长度为1024、batchsize=4的情况下，在两个T4显卡上跑的很快乐（显卡的显存最大为16G，但是实际上卡1用了8G，卡2用了11G），甚至batchsize还可以提高。

yuanzhoulvpi2017 closed this as completed Apr 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1006076811 commented Mar 29, 2023

onlyfew commented Mar 29, 2023

yuanzhoulvpi2017 commented Mar 29, 2023

RoyalArsenal commented Mar 29, 2023

1006076811 commented Mar 30, 2023

xiaoweiweixiao commented Mar 30, 2023

yuanzhoulvpi2017 commented Apr 1, 2023

RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. #29

RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. #29

Comments

1006076811 commented Mar 29, 2023

onlyfew commented Mar 29, 2023

yuanzhoulvpi2017 commented Mar 29, 2023

RoyalArsenal commented Mar 29, 2023

1006076811 commented Mar 30, 2023

xiaoweiweixiao commented Mar 30, 2023

yuanzhoulvpi2017 commented Apr 1, 2023