-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss nonconvergence #6
Comments
Hi rush, The jump in the classification and regression loss is strange but the mAP is still near 1.0, so the 2D detection task is still doing well and I think this shouldn't be the main problem here (classification and regression losses are from the 2D detection part and the mAP is a metric for it). Did you use the debug.py script to check wheter the images and annotations of your custom dataset are loaded as expected? This also can help a lot in such scenarios. I hope this helps you. Sincerely, |
You are quite right. I used the unit of meter, and I changed it by 0.02 at noon. I will see the effect tomorrow. I'll try Debug.py later. Thank you very much!
…---Original---
From: "ybkscht"<notifications@github.com>
Date: Tue, Jan 5, 2021 21:28 PM
To: "ybkscht/EfficientPose"<EfficientPose@noreply.github.com>;
Cc: "Author"<author@noreply.github.com>;"rush9838465"<9838465@qq.com>;
Subject: Re: [ybkscht/EfficientPose] Loss nonconvergence (#6)
Hi rush,
The jump in the classification and regression loss is strange but the mAP is still near 1.0, so the 2D detection task is still doing well and I think this shouldn't be the main problem here (classification and regression losses are from the 2D detection part and the mAP is a metric for it).
The transformation loss (which is for the 6D pose estimation part) instead is decreasing but when looking at the 6D pose estimation metrics like ADD, they start to increase very slowly.
From my experience the absolute value of the transformation loss should be much higher (factor 100x - 1000x) and I could imagine that your dataset unit is different from Linemod which uses mm.
If this is the case, you need to change the translation_scale_norm parameter in the generator according to the unit of your dataset and probably the transformation loss weight (in train.py) from 0.02 to something greater because the transformation loss becomes otherwise too small relative to the other losses.
EfficientPose works internally with meter but the output is scaled with the translation_scale_norm parameter. For example when using Linemod which is in mm, the translation_scale_norm parameter is set to 1000 (which is also the default).
But that's just a guess and hard to say if this is really the problem without more information about your dataset and training parameters.
Did you use the debug.py script to check wheter the images and annotations of your custom dataset are loaded as expected? This also can help a lot in such scenarios.
I hope this helps you.
Sincerely,
Yannick
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
@ghoshaw My previous problem was caused by extrinsic matrix error. If debug.py shows that your 3D bounding box is normal, it indicates that your extrinsic matrix should have no problem. But I changed 0.02 to 5. |
@rush9838465 , in your case, the loss of transformation is smaller than the 2d detect loss, but in my case, my transformation loss is larger than 2d detect loss, So I think I should change 0.02 t0 0.002? |
@ghoshaw I think it can be left unchanged, your loss is pretty low. Have you tried inference? |
I did not try inference, but the image saved in training process is not that good. |
I train my data set, what's the reason why the loss doesn't converge you know?But mAP is good.
The text was updated successfully, but these errors were encountered: