Terrible result during training #49

Colinsnow1 · 2018-12-20T03:49:30Z

Hi Xingyi,
When I was training in stage2&3, I found that the accuracy and MPJPE is so terrible. I noticed accuracy drop from 0.83 to 0.02 in the first epoch of stage1! Is that possible reason for such case?

Here is the log. pytorch-gpu version=0.3.1

xingyizhou · 2018-12-20T15:38:04Z

Hi,
Thanks for reporting. This is a known issue of the pytorch cudnn BN implementation #16 . If your pytorch version is greater than 0.1.12, you will need to disable cudnn BN by following the instruction here.

Colinsnow1 · 2018-12-21T03:30:09Z

Hi Xingyi,
Thanks for reply. Actually, I already noticed the known issue before and I also set torch.backends.cudnn.enabled = False to disable cudnn BN, but it didn't work. Moreover, the log I submit seem to be unnormal, may you release part of the your training log for me to debug？Thanks again!

xingyizhou · 2018-12-21T20:38:47Z

Hi,
I don't have the log with me on my current machine. As I remembered, training MPJPE goes down very fast, and validation goes down slower but drops a lot after decreasing learning rate. ACC should be always > 0.9. I will suggest switching to pytorch 0.1.12 for a safe option to reproduce the result.

Colinsnow1 · 2018-12-24T06:22:30Z

Hi Xingyi,
It worked after I downgraded pytorch to 0.1.12 version and changed Upsample module to UpsamplingBilinear2d. Thanks for help.

xingyizhou closed this as completed Jan 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terrible result during training #49

Terrible result during training #49

Colinsnow1 commented Dec 20, 2018

xingyizhou commented Dec 20, 2018

Colinsnow1 commented Dec 21, 2018

xingyizhou commented Dec 21, 2018

Colinsnow1 commented Dec 24, 2018

Terrible result during training #49

Terrible result during training #49

Comments

Colinsnow1 commented Dec 20, 2018

xingyizhou commented Dec 20, 2018

Colinsnow1 commented Dec 21, 2018

xingyizhou commented Dec 21, 2018

Colinsnow1 commented Dec 24, 2018