Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terrible result during training #49

Closed
Colinsnow1 opened this issue Dec 20, 2018 · 4 comments
Closed

Terrible result during training #49

Colinsnow1 opened this issue Dec 20, 2018 · 4 comments

Comments

@Colinsnow1
Copy link

Hi Xingyi,
When I was training in stage2&3, I found that the accuracy and MPJPE is so terrible. I noticed accuracy drop from 0.83 to 0.02 in the first epoch of stage1! Is that possible reason for such case?

Here is the log. pytorch-gpu version=0.3.1
image

@xingyizhou
Copy link
Owner

Hi,
Thanks for reporting. This is a known issue of the pytorch cudnn BN implementation #16 . If your pytorch version is greater than 0.1.12, you will need to disable cudnn BN by following the instruction here.

@Colinsnow1
Copy link
Author

Hi Xingyi,
Thanks for reply. Actually, I already noticed the known issue before and I also set torch.backends.cudnn.enabled = False to disable cudnn BN, but it didn't work. Moreover, the log I submit seem to be unnormal, may you release part of the your training log for me to debug?Thanks again!

@xingyizhou
Copy link
Owner

Hi,
I don't have the log with me on my current machine. As I remembered, training MPJPE goes down very fast, and validation goes down slower but drops a lot after decreasing learning rate. ACC should be always > 0.9. I will suggest switching to pytorch 0.1.12 for a safe option to reproduce the result.

@Colinsnow1
Copy link
Author

Hi Xingyi,
It worked after I downgraded pytorch to 0.1.12 version and changed Upsample module to UpsamplingBilinear2d. Thanks for help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants