New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/ #10
Comments
Hi, according to my experience, this is manily because the value of label is out of the range |
@ycszen Hi, I try Cityscapes dataset. And the value of the label is out of the range
Any suggestion? |
@hubutui Could you show more details? Did you do a correct remap? |
@ycszen Hi, I use
|
Hi, have you solved this problem? I have got the same question with above, the Error code is some thing as below, File "train.py", line 137, in Hope suggestions. Thanks. |
I have solved the question by mapping the labels with cityscapesscripts. Thanks a lot. |
@blueardour hi, I meet the same problem with you. Can you tell me the way to solve this question? Thanks a lot. |
@jhch1995 it might caused by the wrong match bewteen label ids. You can try createTrainIdLabelImgs.py to generate ***_gtFine_labelTrainIds.png files and use these files as training labels |
hi, @ycszen
Sorry to disturb you again. After some struggle on the code, I was stuck at the Criterion part. It gave RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:128
I add the CUDA_LAUNCH_BLOCKING=1 before run the script to enable more accuracy message:
0] Assertion
t >= 0 && t < n_classesfailed. /pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [11,0,0], thread: [766,0,0] Assertion
t >= 0 && t < n_classesfailed. /pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [11,0,0], thread: [767,0,0] Assertion
t >= 0 && t < n_classesfailed. /pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [11,0,0], thread: [800,0,0] Assertion
t >= 0 && t < n_classes` failed.THCudaCheck FAIL file=/pytorch/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu line=128 error=59 : device-side assert triggered
Traceback (most recent call last):
File "/home/chenp/.pyenv/versions/3.6.8/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/chenp/workspace/git/TorchSeg/model/dfn/voc.dfn.R101_v1c/network.py", line 137, in forward
loss0 = self.criterion(pred_out[0], label)
File "/home/chenp/.pyenv/versions/3.6.8/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/chenp/.pyenv/versions/3.6.8/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 904, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File "/home/chenp/.pyenv/versions/3.6.8/lib/python3.6/site-packages/torch/nn/functional.py", line 1970, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/home/chenp/.pyenv/versions/3.6.8/lib/python3.6/site-packages/torch/nn/functional.py", line 1792, in nll_loss
ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:128
`
Do you have any experience or advise on it ?
The text was updated successfully, but these errors were encountered: