Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

networks weight decay #34

Closed
ClementPinard opened this issue Nov 23, 2017 · 7 comments
Closed

networks weight decay #34

ClementPinard opened this issue Nov 23, 2017 · 7 comments

Comments

@ClementPinard
Copy link
Contributor

Hello, I have been trying to replicate your results on my own pytorch implementation, but had some trouble converging with your hyper parameters.

Especially, the weight decay you use seems very large to me : https://github.com/tinghuiz/SfMLearner/blob/master/nets.py#L27

Weight decay is usually around 5e-5 ~ 5e-4 and here it is 0.05 ! When using it, my two networks just go to zero very quickly.

As I am not very familiar with tf.slim, I have done some research, and I am not sure you actually apply weight regularization, since apparently you have to call slim.losses.get_total_loss()

This also corroborates the fact that trying to set l2 regularization to extreme values (like 50.0) doesn't change anything.

The good news here are if weight decay is indeed not applied to your network, you might have something interesting to work on if you want to improve even more your results !

Clément

@tinghuiz
Copy link
Owner

Cool! Thanks for pointing it out. I think you are right that the weight decay is not applied properly here. Have you tried playing with various weight decay values in your implementation?

@tinghuiz
Copy link
Owner

Btw, would you like me to include a pointer to your pytorch implementation in the README? I think it would be helpful for people who prefer pytorch to find it.

@ClementPinard
Copy link
Contributor Author

I'm still not able to reach your score with my implementation. The weird thing is that when I set a smooth loss of 0.5, everything goes to 0 pretty quickly and never come back. I have a similar output pattern than your hyper parameters when I select a smooth loss of 0.1 and convergence is nice, but results are still not as good as yours. I testes different values of weight decay, without much success for the moment, the search is still ongoing :)

And for the pointer, that would be super cool, thanks ! As code is still moving, I'd be happy to have people test it and try some hyperparameters to see what converges best :)

@tinghuiz
Copy link
Owner

tinghuiz commented Dec 6, 2017

Cool. Just added a pointer. I would love to be posted on your re-implementation effort as well :)

@tinghuiz tinghuiz closed this as completed Dec 6, 2017
@ClementPinard
Copy link
Contributor Author

Done !

@yzcjtr
Copy link

yzcjtr commented Dec 9, 2017

@ClementPinard About the weight decay configuration, I experimented a bit and found no big difference. A weight regularizer more than 1e-4 deteriorates the performance, while smaller choices give no improvement.

@tinghuiz
Copy link
Owner

@yzcjtr , thanks for sharing your finding!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants