Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault #47

Closed
ASHWIN2605 opened this issue Jun 17, 2021 · 12 comments
Closed

Segmentation fault #47

ASHWIN2605 opened this issue Jun 17, 2021 · 12 comments

Comments

@ASHWIN2605
Copy link

ASHWIN2605 commented Jun 17, 2021

Hello,

I am trying to run the fixed_baseline.sh for lp_train configuration in my linux machine, with the following configurations.
Python = 3.8
CUDA = 11.0
PyTorch = 1.8.0

The code is running fine in CPU, but if I changed the device to CUDA then I am getting a Segmentation fault (core dumped).Please if you have seen this in past, kindly help me to resolve this.

@yhhu99
Copy link

yhhu99 commented Jun 18, 2021

I also meet this problem when I transfer a CUDA tensor by float_quantize yesterday, and cannot figure out what made this mistake.

@Tiiiger
Copy link
Owner

Tiiiger commented Jun 19, 2021

hi @ASHWIN2605 and @yhhu99,

sorry to hear this and thank you for letting me know. Would you kindly help try out v0.2.0 to see if you encounter the same issue?

I will look into it in the meantime.

@yhhu99
Copy link

yhhu99 commented Jun 19, 2021

hi @ASHWIN2605 and @yhhu99,

sorry to hear this and thank you for letting me know. Would you kindly help try out v0.2.0 to see if you encounter the same issue?

I will look into it in the meantime.

thanks for your reply, but I wonder know that how can I change the version?

@Tiiiger
Copy link
Owner

Tiiiger commented Jun 19, 2021

can you try pip install qtorch==0.2.0? thank you!

@yhhu99
Copy link

yhhu99 commented Jun 19, 2021

can you try pip install qtorch==0.2.0? thank you!

I'm sorry that it doesn't work.
image
image

@yhhu99
Copy link

yhhu99 commented Jun 19, 2021

can you try pip install qtorch==0.2.0? thank you!
But when I test this problem in another script, it works well:
image

@Tiiiger
Copy link
Owner

Tiiiger commented Jun 22, 2021

hi @ASHWIN2605 @yhhu99

please downgrade to v2.0.0 for now. I will look into fixing this.

@Tiiiger Tiiiger closed this as completed Jun 22, 2021
@Tiiiger Tiiiger reopened this Jun 22, 2021
@Tiiiger
Copy link
Owner

Tiiiger commented Jun 25, 2021

@ASHWIN2605 @yhhu99

unfortunately I cannot replicate the segmentation fault you are seeing. I am on torch v1.8.0 and CUDA11.1.

Not sure if this is due to the difference in our CUDA versions. Can you provide me a minimal example to replicate the issue you are seeing?

Also, maybe try removing the cached compilation and recompile it. For example try rm -rf /tmp/torch_extensions/quant_cuda /tmp/torch_extensions/quant_cpu (but note that your torch_extensions might live somewhere else).

@yhhu99
Copy link

yhhu99 commented Jun 29, 2021

@ASHWIN2605 @yhhu99

unfortunately I cannot replicate the segmentation fault you are seeing. I am on torch v1.8.0 and CUDA11.1.

Not sure if this is due to the difference in our CUDA versions. Can you provide me a minimal example to replicate the issue you are seeing?

Also, maybe try removing the cached compilation and recompile it. For example try rm -rf /tmp/torch_extensions/quant_cuda /tmp/torch_extensions/quant_cpu (but note that your torch_extensions might live somewhere else).

Thanks for your reply. I had tried the command 'rm -rf ...quant_cuda & quant_cpu', but it didn't work. I make a small example to show the question as below:
image

And after I run the script on gpu, the 'segmetation fault (core dump)' would occur.

@ASHWIN2605
Copy link
Author

Hi,

It was my bad.I didn't update the CUDA version of Pytorch installation to latest 11.0.I had only my NVCC updated in my linux machine to version 11.0 earlier and now after updating the Pytorch CUDA veraion to 11.0,it is working fine without the segmentation fault error.Thank you for the trials you made. @yhhu99 I hope this will help you as well.

@yhhu99
Copy link

yhhu99 commented Jul 2, 2021

Hi,

It was my bad.I didn't update the CUDA version of Pytorch installation to latest 11.0.I had only my NVCC updated in my linux machine to version 11.0 earlier and now after updating the Pytorch CUDA veraion to 11.0,it is working fine without the segmentation fault error.Thank you for the trials you made. @yhhu99 I hope this will help you as well.

Thank you! I think I know where the problem is.

@Tiiiger
Copy link
Owner

Tiiiger commented Jul 9, 2021

Closing because this seems to be resolved.

@Tiiiger Tiiiger closed this as completed Jul 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants