-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
networks weight decay #34
Comments
Cool! Thanks for pointing it out. I think you are right that the weight decay is not applied properly here. Have you tried playing with various weight decay values in your implementation? |
Btw, would you like me to include a pointer to your pytorch implementation in the README? I think it would be helpful for people who prefer pytorch to find it. |
I'm still not able to reach your score with my implementation. The weird thing is that when I set a smooth loss of 0.5, everything goes to 0 pretty quickly and never come back. I have a similar output pattern than your hyper parameters when I select a smooth loss of 0.1 and convergence is nice, but results are still not as good as yours. I testes different values of weight decay, without much success for the moment, the search is still ongoing :) And for the pointer, that would be super cool, thanks ! As code is still moving, I'd be happy to have people test it and try some hyperparameters to see what converges best :) |
Cool. Just added a pointer. I would love to be posted on your re-implementation effort as well :) |
Done ! |
@ClementPinard About the weight decay configuration, I experimented a bit and found no big difference. A weight regularizer more than 1e-4 deteriorates the performance, while smaller choices give no improvement. |
@yzcjtr , thanks for sharing your finding! |
Hello, I have been trying to replicate your results on my own pytorch implementation, but had some trouble converging with your hyper parameters.
Especially, the weight decay you use seems very large to me : https://github.com/tinghuiz/SfMLearner/blob/master/nets.py#L27
Weight decay is usually around 5e-5 ~ 5e-4 and here it is 0.05 ! When using it, my two networks just go to zero very quickly.
As I am not very familiar with
tf.slim
, I have done some research, and I am not sure you actually apply weight regularization, since apparently you have to callslim.losses.get_total_loss()
This also corroborates the fact that trying to set l2 regularization to extreme values (like 50.0) doesn't change anything.
The good news here are if weight decay is indeed not applied to your network, you might have something interesting to work on if you want to improve even more your results !
Clément
The text was updated successfully, but these errors were encountered: