Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: an illegal memory access was encountered #24

Open
ditingdapeng opened this issue Sep 25, 2020 · 12 comments
Open

Comments

@ditingdapeng
Copy link

In the process of train, encountered such a mistake, where is the problem? "RuntimeError: CUDA error: an illegal memory access was encountered"

@Sleepychord
Copy link
Contributor

Hi, this seems to be caused by some other problems(from the environments), could you provide more information?

@ditingdapeng
Copy link
Author

Thank you ! This is my conda and torch version configuration:
python 3.7.0
conda 4.5.11
torch 1.0.1.post2
torchvision 0.2.2.post3

Hi, this seems to be caused by some other problems(from the environments), could you provide more information?

@ditingdapeng
Copy link
Author

The batch_size in the train.py has been transferred to 1, My machine is:2080Ti

@Sleepychord
Copy link
Contributor

Hi, can you tell me which code raise the error? seems like the environment is okay.

@ditingdapeng
Copy link
Author

Yeah ! "batch = tuple(t.to(device) for t in batch)" , I've now reinstalled the Ubantu environment, May I ask if your VERSION of CUDA must be 8?

@ditingdapeng
Copy link
Author

I have been stuck with this problem for 3 days and have ruled out memory overflow and batch_size. I couldn't resist reinstalling the system yesterday, and I noticed that the CUDA version didn't fit.

How many VERSIONS of CUDA do you have?
Thank you ~

@ditingdapeng
Copy link
Author

I suspect the problem is that CudA10.0 doesn't match the torch in the code

@Sleepychord
Copy link
Contributor

No, but you need to ensure your torch build to fit for the CUDA version.

@ditingdapeng
Copy link
Author

soga. I think I know what the problem is where. My CUDA version follows your requirements, but Cuda may not match the torch

@ditingdapeng
Copy link
Author

Cuda loaded 8.0 has collapsed, I'm going to reinstall the system and press the new CUDA and Torch versions

@ditingdapeng
Copy link
Author

Hello! I think I finally found the problem, but I don’t know how to solve it.
Hope to get your help.

The problem appears in the train.py file:
' hop_loss, ans_loss, pooled_output = model1(*batch)'

The error suggested is:RuntimeError: CUDA error: an illegal memory access was encountered.

I suspect that the parameter range of Model1 is different from the size of batch, Could you please help me explain the structure of model1, thank you very much!!!

@ditingdapeng
Copy link
Author

I have found the problem. It is because my CUDA environment is not well installed. So much trouble for you~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants