Difficult to train with LargeMargin_Softmax_Loss on cifar10 #10

qianxinchun · 2017-03-27T01:31:45Z

I have tried to train myexamples/cifar10/model/cifar_train_test.prototxt with different settings- DOUBLE/TRIPLE/QUADRUPLE, but it always goes like this:

I0327 02:22:00.515635 16177 solver.cpp:228] Iteration 12000, loss = 87.3365
I0327 02:22:00.515707 16177 solver.cpp:244] Train net output #0: lambda = 0.0624753
I0327 02:22:00.515720 16177 solver.cpp:244] Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I0327 02:22:00.586127 16177 solver.cpp:244] Train net output #2: mean_length = inf
I0327 02:22:00.586163 16177 sgd_solver.cpp:106] Iteration 12000, lr = 0.001
I0327 02:26:54.401607 16177 solver.cpp:228] Iteration 12200, loss = 87.3365
I0327 02:26:54.401752 16177 solver.cpp:244] Train net output #0: lambda = 0.0540467
I0327 02:26:54.401765 16177 solver.cpp:244] Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I0327 02:26:54.471928 16177 solver.cpp:244] Train net output #2: mean_length = inf
I0327 02:26:54.471937 16177 sgd_solver.cpp:106] Iteration 12200, lr = 0.001
I0327 02:31:48.234402 16177 solver.cpp:228] Iteration 12400, loss = 87.3365
I0327 02:31:48.234601 16177 solver.cpp:244] Train net output #0: lambda = 0.0467769
I0327 02:31:48.234617 16177 solver.cpp:244] Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I0327 02:31:48.304947 16177 solver.cpp:244] Train net output #2: mean_length = inf
I0327 02:31:48.304958 16177 sgd_solver.cpp:106] Iteration 12400, lr = 0.001
I0327 02:36:42.063432 16177 solver.cpp:228] Iteration 12600, loss = 87.3365
I0327 02:36:42.063588 16177 solver.cpp:244] Train net output #0: lambda = 0.0405035
I0327 02:36:42.063603 16177 solver.cpp:244] Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I0327 02:36:42.134166 16177 solver.cpp:244] Train net output #2: mean_length = inf

How to tackle with the problem ?

xqpinitial · 2017-03-27T02:06:17Z

http://blog.csdn.net/yan_joy/article/details/53608519

qianxinchun · 2017-03-28T02:53:29Z

I tried "clip_gradients" the solver.prototxt, but it still ended up with 87.3365.

xqpinitial · 2017-03-28T04:13:00Z

firstly,please change the deplay iteration from 200 to 10 to see how the loss change.
seconfly,please reduce the base_lr = 0.0001 or lr = 0.000001 to see the loss
thirdly
1、观察数据中是否有异常样本或异常label导致数据读取异常
2、调小初始化权重，以便使softmax输入的feature尽可能变小
3、降低学习率，这样就能减小权重参数的波动范围，从而减小权重变大的可能性。这条也是网上出现较多的方法。
4、如果有BN（batch normalization）层，finetune时最好不要冻结BN的参数，否则数据分布不一致时很容易使输出值变的很大

wy1iu · 2017-03-29T07:56:08Z

For CIFAR10, it should be easy to train. If the network diverges, consider decreasing lambda more smoothly. Or simply lower down the difficulty of the loss, i.e. setting a smaller m.

shenmanmiao · 2017-05-17T08:02:23Z

Same problem with @qianxinchun , the network diverges even i set lambda_min=0.5 and m=2. @wy1iu Could you please share your training log(m=4) please?

wy1iu · 2017-06-02T00:33:59Z

I believe you could train it using PReLU. Using ReLU may need more parameter tuning. @shenmanmiao

shenmanmiao · 2017-06-02T01:50:40Z

PReLU works well on Cifar10, thanks @wy1iu for your reply.

billhyde · 2017-07-01T15:43:08Z

Hi,thank you for your sharing. I use CASIA-WebFace and A-softmax(Sphereface paper) train the model. The model converged and the accuracy on lfw is 97.5%. It is really hard to achieve the accuracy above 99%, It is really grateful,if you can provide any suggestions. My QQ is 729512518,

yfllllll · 2018-03-26T05:32:34Z

@shenmanmiao have you reproduce the result on cifar10? can you share the train_val.prototxt?

wy1iu closed this as completed Mar 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difficult to train with LargeMargin_Softmax_Loss on cifar10 #10

Difficult to train with LargeMargin_Softmax_Loss on cifar10 #10

qianxinchun commented Mar 27, 2017

xqpinitial commented Mar 27, 2017

qianxinchun commented Mar 28, 2017

xqpinitial commented Mar 28, 2017

wy1iu commented Mar 29, 2017

shenmanmiao commented May 17, 2017

wy1iu commented Jun 2, 2017

shenmanmiao commented Jun 2, 2017

billhyde commented Jul 1, 2017

yfllllll commented Mar 26, 2018

Difficult to train with LargeMargin_Softmax_Loss on cifar10 #10

Difficult to train with LargeMargin_Softmax_Loss on cifar10 #10

Comments

qianxinchun commented Mar 27, 2017

xqpinitial commented Mar 27, 2017

qianxinchun commented Mar 28, 2017

xqpinitial commented Mar 28, 2017

wy1iu commented Mar 29, 2017

shenmanmiao commented May 17, 2017

wy1iu commented Jun 2, 2017

shenmanmiao commented Jun 2, 2017

billhyde commented Jul 1, 2017

yfllllll commented Mar 26, 2018