Find extremely big pts_loss output when trying to set cudatoolkit=11.2 #7

Yang-Hao-Lin · 2021-12-03T02:07:05Z

Hi Junhwa, thanks a lot for the awesome work in the field of scene flow :)
When l tried to use your loss in my environment setting (cudatoolkit=11.2), the pts_loss(s_3) became extremely big, for example, bigger than 100000.
But when I set the version of cudatoolkit as 10.2, the output of pts_loss became normal (usually less than 10).
I can not find out the reason.
Have you ever met this kind of situation?
Again, thanks a lot.

hurjunhwa · 2021-12-03T08:50:36Z

Hi, thank you for your interest in our work!
I think it happened when the disparity decoder is not properly trained, and there can be multiple reasons such as training instability, invalid inputs, etc..
I would first test the pre-trained model using with the version 11.2 and check if the scene flow accuracy matches the baseline's.

Probably doing the unit-tests of cuda-dependent modules can be also necessary.
Please try to use the python implementation of the correlation layer (--correlation_cuda_enabled=False) and also check if the softsplat works well.

If it still doesn't work, please let me know!

Yang-Hao-Lin · 2021-12-04T03:23:13Z

Hi Junhwa. Thank a lot for your attention!
The server administrator of my lab updated my GPU from Tesla P40 to RTX A40 yesterday. I test in the environment setting of cudatoolkit=11.2 again, and now the output of pts_loss is within a reasonable range. So, now the situation is:

Test on GPU P40, cudatoolkit=10.2 -> pts_loss is within a reasonable range;
Test on GPU P40, cudatoolkit=11.2 -> pts_loss is extremely big, either correlation_cuda_enabled equals False or equals True, even on a supervised pretrained model;
Test on GPU A40, cudatoolkit=11.2 -> pts_loss is within a reasonable range.

It is weird, but there seems to be some compatibility issue between Tesla P40 and cudatoolkit=11.2.

hurjunhwa · 2021-12-05T10:45:38Z

Thank you for sharing them!

Yang-Hao-Lin closed this as completed Dec 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find extremely big pts_loss output when trying to set cudatoolkit=11.2 #7

Find extremely big pts_loss output when trying to set cudatoolkit=11.2 #7

Yang-Hao-Lin commented Dec 3, 2021

hurjunhwa commented Dec 3, 2021

Yang-Hao-Lin commented Dec 4, 2021

hurjunhwa commented Dec 5, 2021

Find extremely big pts_loss output when trying to set cudatoolkit=11.2 #7

Find extremely big pts_loss output when trying to set cudatoolkit=11.2 #7

Comments

Yang-Hao-Lin commented Dec 3, 2021

hurjunhwa commented Dec 3, 2021

Yang-Hao-Lin commented Dec 4, 2021

hurjunhwa commented Dec 5, 2021