Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find extremely big pts_loss output when trying to set cudatoolkit=11.2 #7

Closed
Yang-Hao-Lin opened this issue Dec 3, 2021 · 3 comments

Comments

@Yang-Hao-Lin
Copy link

Hi Junhwa, thanks a lot for the awesome work in the field of scene flow :)
When l tried to use your loss in my environment setting (cudatoolkit=11.2), the pts_loss(s_3) became extremely big, for example, bigger than 100000.
But when I set the version of cudatoolkit as 10.2, the output of pts_loss became normal (usually less than 10).
I can not find out the reason.
Have you ever met this kind of situation?
Again, thanks a lot.

@hurjunhwa
Copy link
Collaborator

Hi, thank you for your interest in our work!
I think it happened when the disparity decoder is not properly trained, and there can be multiple reasons such as training instability, invalid inputs, etc..
I would first test the pre-trained model using with the version 11.2 and check if the scene flow accuracy matches the baseline's.

Probably doing the unit-tests of cuda-dependent modules can be also necessary.
Please try to use the python implementation of the correlation layer (--correlation_cuda_enabled=False) and also check if the softsplat works well.

If it still doesn't work, please let me know!

@Yang-Hao-Lin
Copy link
Author

Hi Junhwa. Thank a lot for your attention!
The server administrator of my lab updated my GPU from Tesla P40 to RTX A40 yesterday. I test in the environment setting of cudatoolkit=11.2 again, and now the output of pts_loss is within a reasonable range. So, now the situation is:

Test on GPU P40, cudatoolkit=10.2 -> pts_loss is within a reasonable range;
Test on GPU P40, cudatoolkit=11.2 -> pts_loss is extremely big, either correlation_cuda_enabled equals False or equals True, even on a supervised pretrained model;
Test on GPU A40, cudatoolkit=11.2 -> pts_loss is within a reasonable range.

It is weird, but there seems to be some compatibility issue between Tesla P40 and cudatoolkit=11.2.

@hurjunhwa
Copy link
Collaborator

Thank you for sharing them!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants