Loss nonconvergence #6

rush9838465 · 2021-01-05T04:20:32Z

I train my data set, what's the reason why the loss doesn't converge you know?But mAP is good.

ybkscht · 2021-01-05T13:28:11Z

Hi rush,

The jump in the classification and regression loss is strange but the mAP is still near 1.0, so the 2D detection task is still doing well and I think this shouldn't be the main problem here (classification and regression losses are from the 2D detection part and the mAP is a metric for it).
The transformation loss (which is for the 6D pose estimation part) instead is decreasing but when looking at the 6D pose estimation metrics like ADD, they start to increase very slowly.
From my experience the absolute value of the transformation loss should be much higher (factor 100x - 1000x) and I could imagine that your dataset unit is different from Linemod which uses mm.
If this is the case, you need to change the translation_scale_norm parameter in the generator according to the unit of your dataset and probably the transformation loss weight (in train.py) from 0.02 to something greater because the transformation loss becomes otherwise too small relative to the other losses.
EfficientPose works internally with meter but the output is scaled with the translation_scale_norm parameter. For example when using Linemod which is in mm, the translation_scale_norm parameter is set to 1000 (which is also the default).
But that's just a guess and hard to say if this is really the problem without more information about your dataset and training parameters.

Did you use the debug.py script to check wheter the images and annotations of your custom dataset are loaded as expected? This also can help a lot in such scenarios.

I hope this helps you.

Sincerely,
Yannick

rush9838465 · 2021-01-05T13:55:10Z

You are quite right. I used the unit of meter, and I changed it by 0.02 at noon. I will see the effect tomorrow. I'll try Debug.py later. Thank you very much!

…

---Original--- From: "ybkscht"<notifications@github.com> Date: Tue, Jan 5, 2021 21:28 PM To: "ybkscht/EfficientPose"<EfficientPose@noreply.github.com>; Cc: "Author"<author@noreply.github.com>;"rush9838465"<9838465@qq.com>; Subject: Re: [ybkscht/EfficientPose] Loss nonconvergence (#6) Hi rush, The jump in the classification and regression loss is strange but the mAP is still near 1.0, so the 2D detection task is still doing well and I think this shouldn't be the main problem here (classification and regression losses are from the 2D detection part and the mAP is a metric for it). The transformation loss (which is for the 6D pose estimation part) instead is decreasing but when looking at the 6D pose estimation metrics like ADD, they start to increase very slowly. From my experience the absolute value of the transformation loss should be much higher (factor 100x - 1000x) and I could imagine that your dataset unit is different from Linemod which uses mm. If this is the case, you need to change the translation_scale_norm parameter in the generator according to the unit of your dataset and probably the transformation loss weight (in train.py) from 0.02 to something greater because the transformation loss becomes otherwise too small relative to the other losses. EfficientPose works internally with meter but the output is scaled with the translation_scale_norm parameter. For example when using Linemod which is in mm, the translation_scale_norm parameter is set to 1000 (which is also the default). But that's just a guess and hard to say if this is really the problem without more information about your dataset and training parameters. Did you use the debug.py script to check wheter the images and annotations of your custom dataset are loaded as expected? This also can help a lot in such scenarios. I hope this helps you. Sincerely, Yannick — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ghoshaw · 2021-01-10T03:36:21Z

Hi, I also train my own data, but the loss seems strange either.

BTW, my data is in meter, so I set the translation_scale_norm to 1, and the loss weight is still 0.02, and I find that the scale of loss between the 2D detect and pose estimating is not that large. I also use the debug.py to show the data, the result seems ok, except for that the 3D points order I used is different from the origin code. I random set the 3D points order, but I think this should not be an issue, because I find that the result saved in training process can plot the point correctly, it is just the pose estimate is not accurate. So any advice for my training? Thanks a lot!

rush9838465 · 2021-01-11T02:20:27Z

@ghoshaw My previous problem was caused by extrinsic matrix error. If debug.py shows that your 3D bounding box is normal, it indicates that your extrinsic matrix should have no problem. But I changed 0.02 to 5.

ghoshaw · 2021-01-11T02:25:30Z

@rush9838465 , in your case, the loss of transformation is smaller than the 2d detect loss, but in my case, my transformation loss is larger than 2d detect loss, So I think I should change 0.02 t0 0.002?

rush9838465 · 2021-01-11T02:35:22Z

@ghoshaw I think it can be left unchanged, your loss is pretty low. Have you tried inference?

ghoshaw · 2021-01-11T02:43:22Z

I did not try inference, but the image saved in training process is not that good.
And I just find a bug, I accidentally freeze the backbone, and I will try to do a new experiment. Thanks for your answer!

rush9838465 closed this as completed Jan 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss nonconvergence #6

Loss nonconvergence #6

rush9838465 commented Jan 5, 2021

ybkscht commented Jan 5, 2021

rush9838465 commented Jan 5, 2021 via email

ghoshaw commented Jan 10, 2021

rush9838465 commented Jan 11, 2021

ghoshaw commented Jan 11, 2021

rush9838465 commented Jan 11, 2021

ghoshaw commented Jan 11, 2021

Loss nonconvergence #6

Loss nonconvergence #6

Comments

rush9838465 commented Jan 5, 2021

ybkscht commented Jan 5, 2021

rush9838465 commented Jan 5, 2021 via email

ghoshaw commented Jan 10, 2021

rush9838465 commented Jan 11, 2021

ghoshaw commented Jan 11, 2021

rush9838465 commented Jan 11, 2021

ghoshaw commented Jan 11, 2021