-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered #8
Comments
Hi, have you been modifying the code? Because Line 124 in "SLidR/pretrain/lightning_trainer.py" isn't supposed to be |
Thank you for your reply. I didn't modify the original code (only some comments), but when I run the pretrain.py code with the slidr_minkunet.yaml, the problem RuntimeError: merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered still occurs. I checked the k = one_hot_P @ output_points[batch["pairing_points"]], the dimension is one_hot_P(3600,45771), output_points(44516,64) and pairing_points(45771,). BTW, could you kindly share the code via email used to finetune object detection models from OpenPCDet? I'm still reproducing the results of SLidR. It'll help me a lot. |
I met this problem too, did you solve it? @JakeVander |
Same problem here. |
1 similar comment
Same problem here. |
Could you please tell me exactly what you are running? I will try to add a dockerfile to setup a working environment, and see if I can either reproduce the issue, or specify an environment so that you won't have any. However I'm not sure a single dockerfile will suit every configuration since MinkowskiEngine can be particularly painful to compile. |
I managed to fix this error but don't remember exactly how, sorry. Thanks for the amazing work by the way ! |
Thanks for reopening the issue!
in which
And this problem won't occurred when running my env is
along with nvidia A800 |
Ok, I was actually able to re-produce the issue now. I'll see what I can do. |
The issue is indeed a compatibility issue between MinkowskiEngine, and some versions of Pytorch+CUDA (see for instance NVIDIA/MinkowskiEngine#299). I was able to run the code again using CUDA 11.3, torch 1.12.0 cudnn 8, the latest commit of MinkowskiEngine and pytorch_lightning 1.6.0 I will add a Dockerfile with this config to the repo. |
It works! Thanks a lot! |
Ok, I'm closing the issue. For future reference, if problems with MinkowskiEngine reappear, it could be easier to switch to torchsparse see for instance. |
I have the same error too when I use the MinkowskiEngine with tag v0.5.4. |
This problem occurs when I run the code pretrain.py.
I tried a lot of methods, but do not know how to deal with them.
Could you help me?
I run with 1GPU, ubuntu 18.04, cudnn8, cuda11.1 and other requirements same like requirements.txt.
Epoch 0: 0%| | 0/7033 [00:37<74:08:49, 37.95s/it]
The text was updated successfully, but these errors were encountered: