Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss nonconvergence #6

Closed
rush9838465 opened this issue Jan 5, 2021 · 7 comments
Closed

Loss nonconvergence #6

rush9838465 opened this issue Jan 5, 2021 · 7 comments

Comments

@rush9838465
Copy link

I train my data set, what's the reason why the loss doesn't converge you know?But mAP is good.

11
22
33

@ybkscht
Copy link
Owner

ybkscht commented Jan 5, 2021

Hi rush,

The jump in the classification and regression loss is strange but the mAP is still near 1.0, so the 2D detection task is still doing well and I think this shouldn't be the main problem here (classification and regression losses are from the 2D detection part and the mAP is a metric for it).
The transformation loss (which is for the 6D pose estimation part) instead is decreasing but when looking at the 6D pose estimation metrics like ADD, they start to increase very slowly.
From my experience the absolute value of the transformation loss should be much higher (factor 100x - 1000x) and I could imagine that your dataset unit is different from Linemod which uses mm.
If this is the case, you need to change the translation_scale_norm parameter in the generator according to the unit of your dataset and probably the transformation loss weight (in train.py) from 0.02 to something greater because the transformation loss becomes otherwise too small relative to the other losses.
EfficientPose works internally with meter but the output is scaled with the translation_scale_norm parameter. For example when using Linemod which is in mm, the translation_scale_norm parameter is set to 1000 (which is also the default).
But that's just a guess and hard to say if this is really the problem without more information about your dataset and training parameters.

Did you use the debug.py script to check wheter the images and annotations of your custom dataset are loaded as expected? This also can help a lot in such scenarios.

I hope this helps you.

Sincerely,
Yannick

@rush9838465
Copy link
Author

rush9838465 commented Jan 5, 2021 via email

@ghoshaw
Copy link

ghoshaw commented Jan 10, 2021

Hi, I also train my own data, but the loss seems strange either.
image
image
image
image
image
image
image
BTW, my data is in meter, so I set the translation_scale_norm to 1, and the loss weight is still 0.02, and I find that the scale of loss between the 2D detect and pose estimating is not that large. I also use the debug.py to show the data, the result seems ok, except for that the 3D points order I used is different from the origin code. I random set the 3D points order, but I think this should not be an issue, because I find that the result saved in training process can plot the point correctly, it is just the pose estimate is not accurate. So any advice for my training? Thanks a lot!

@rush9838465
Copy link
Author

@ghoshaw My previous problem was caused by extrinsic matrix error. If debug.py shows that your 3D bounding box is normal, it indicates that your extrinsic matrix should have no problem. But I changed 0.02 to 5.

@ghoshaw
Copy link

ghoshaw commented Jan 11, 2021

@rush9838465 , in your case, the loss of transformation is smaller than the 2d detect loss, but in my case, my transformation loss is larger than 2d detect loss, So I think I should change 0.02 t0 0.002?

@rush9838465
Copy link
Author

@ghoshaw I think it can be left unchanged, your loss is pretty low. Have you tried inference?

@ghoshaw
Copy link

ghoshaw commented Jan 11, 2021

I did not try inference, but the image saved in training process is not that good.
And I just find a bug, I accidentally freeze the backbone, and I will try to do a new experiment. Thanks for your answer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants