networks weight decay #34

ClementPinard · 2017-11-23T12:50:53Z

Hello, I have been trying to replicate your results on my own pytorch implementation, but had some trouble converging with your hyper parameters.

Especially, the weight decay you use seems very large to me : https://github.com/tinghuiz/SfMLearner/blob/master/nets.py#L27

Weight decay is usually around 5e-5 ~ 5e-4 and here it is 0.05 ! When using it, my two networks just go to zero very quickly.

As I am not very familiar with tf.slim, I have done some research, and I am not sure you actually apply weight regularization, since apparently you have to call slim.losses.get_total_loss()

This also corroborates the fact that trying to set l2 regularization to extreme values (like 50.0) doesn't change anything.

The good news here are if weight decay is indeed not applied to your network, you might have something interesting to work on if you want to improve even more your results !

Clément

The text was updated successfully, but these errors were encountered:

tinghuiz · 2017-11-29T18:05:06Z

Cool! Thanks for pointing it out. I think you are right that the weight decay is not applied properly here. Have you tried playing with various weight decay values in your implementation?

tinghuiz · 2017-11-29T18:13:46Z

Btw, would you like me to include a pointer to your pytorch implementation in the README? I think it would be helpful for people who prefer pytorch to find it.

ClementPinard · 2017-11-30T09:56:25Z

I'm still not able to reach your score with my implementation. The weird thing is that when I set a smooth loss of 0.5, everything goes to 0 pretty quickly and never come back. I have a similar output pattern than your hyper parameters when I select a smooth loss of 0.1 and convergence is nice, but results are still not as good as yours. I testes different values of weight decay, without much success for the moment, the search is still ongoing :)

And for the pointer, that would be super cool, thanks ! As code is still moving, I'd be happy to have people test it and try some hyperparameters to see what converges best :)

tinghuiz · 2017-12-06T01:18:48Z

Cool. Just added a pointer. I would love to be posted on your re-implementation effort as well :)

ClementPinard · 2017-12-06T08:33:10Z

Done !

yzcjtr · 2017-12-09T13:36:22Z

@ClementPinard About the weight decay configuration, I experimented a bit and found no big difference. A weight regularizer more than 1e-4 deteriorates the performance, while smaller choices give no improvement.

tinghuiz · 2017-12-14T17:52:55Z

@yzcjtr , thanks for sharing your finding!

tinghuiz closed this as completed Dec 6, 2017

tinghuiz mentioned this issue Dec 31, 2017

Question about the depth prediction result #42

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

networks weight decay #34

networks weight decay #34

ClementPinard commented Nov 23, 2017

tinghuiz commented Nov 29, 2017

tinghuiz commented Nov 29, 2017

ClementPinard commented Nov 30, 2017

tinghuiz commented Dec 6, 2017

ClementPinard commented Dec 6, 2017

yzcjtr commented Dec 9, 2017

tinghuiz commented Dec 14, 2017

networks weight decay #34

networks weight decay #34

Comments

ClementPinard commented Nov 23, 2017

tinghuiz commented Nov 29, 2017

tinghuiz commented Nov 29, 2017

ClementPinard commented Nov 30, 2017

tinghuiz commented Dec 6, 2017

ClementPinard commented Dec 6, 2017

yzcjtr commented Dec 9, 2017

tinghuiz commented Dec 14, 2017