Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difficult to train with LargeMargin_Softmax_Loss on cifar10 #10

Closed
qianxinchun opened this issue Mar 27, 2017 · 9 comments
Closed

Difficult to train with LargeMargin_Softmax_Loss on cifar10 #10

qianxinchun opened this issue Mar 27, 2017 · 9 comments

Comments

@qianxinchun
Copy link

I have tried to train myexamples/cifar10/model/cifar_train_test.prototxt with different settings- DOUBLE/TRIPLE/QUADRUPLE, but it always goes like this:

I0327 02:22:00.515635 16177 solver.cpp:228] Iteration 12000, loss = 87.3365
I0327 02:22:00.515707 16177 solver.cpp:244] Train net output #0: lambda = 0.0624753
I0327 02:22:00.515720 16177 solver.cpp:244] Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I0327 02:22:00.586127 16177 solver.cpp:244] Train net output #2: mean_length = inf
I0327 02:22:00.586163 16177 sgd_solver.cpp:106] Iteration 12000, lr = 0.001
I0327 02:26:54.401607 16177 solver.cpp:228] Iteration 12200, loss = 87.3365
I0327 02:26:54.401752 16177 solver.cpp:244] Train net output #0: lambda = 0.0540467
I0327 02:26:54.401765 16177 solver.cpp:244] Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I0327 02:26:54.471928 16177 solver.cpp:244] Train net output #2: mean_length = inf
I0327 02:26:54.471937 16177 sgd_solver.cpp:106] Iteration 12200, lr = 0.001
I0327 02:31:48.234402 16177 solver.cpp:228] Iteration 12400, loss = 87.3365
I0327 02:31:48.234601 16177 solver.cpp:244] Train net output #0: lambda = 0.0467769
I0327 02:31:48.234617 16177 solver.cpp:244] Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I0327 02:31:48.304947 16177 solver.cpp:244] Train net output #2: mean_length = inf
I0327 02:31:48.304958 16177 sgd_solver.cpp:106] Iteration 12400, lr = 0.001
I0327 02:36:42.063432 16177 solver.cpp:228] Iteration 12600, loss = 87.3365
I0327 02:36:42.063588 16177 solver.cpp:244] Train net output #0: lambda = 0.0405035
I0327 02:36:42.063603 16177 solver.cpp:244] Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I0327 02:36:42.134166 16177 solver.cpp:244] Train net output #2: mean_length = inf

How to tackle with the problem ?

@xqpinitial
Copy link

@qianxinchun
Copy link
Author

I tried "clip_gradients" the solver.prototxt, but it still ended up with 87.3365.

@xqpinitial
Copy link

firstly,please change the deplay iteration from 200 to 10 to see how the loss change.
seconfly,please reduce the base_lr = 0.0001 or lr = 0.000001 to see the loss
thirdly
1、观察数据中是否有异常样本或异常label导致数据读取异常
2、调小初始化权重,以便使softmax输入的feature尽可能变小
3、降低学习率,这样就能减小权重参数的波动范围,从而减小权重变大的可能性。这条也是网上出现较多的方法。
4、如果有BN(batch normalization)层,finetune时最好不要冻结BN的参数,否则数据分布不一致时很容易使输出值变的很大

@wy1iu
Copy link
Owner

wy1iu commented Mar 29, 2017

For CIFAR10, it should be easy to train. If the network diverges, consider decreasing lambda more smoothly. Or simply lower down the difficulty of the loss, i.e. setting a smaller m.

@wy1iu wy1iu closed this as completed Mar 29, 2017
@shenmanmiao
Copy link

Same problem with @qianxinchun , the network diverges even i set lambda_min=0.5 and m=2. @wy1iu Could you please share your training log(m=4) please?

@wy1iu
Copy link
Owner

wy1iu commented Jun 2, 2017

I believe you could train it using PReLU. Using ReLU may need more parameter tuning. @shenmanmiao

@shenmanmiao
Copy link

PReLU works well on Cifar10, thanks @wy1iu for your reply.

@billhyde
Copy link

billhyde commented Jul 1, 2017

Hi,thank you for your sharing. I use CASIA-WebFace and A-softmax(Sphereface paper) train the model. The model converged and the accuracy on lfw is 97.5%. It is really hard to achieve the accuracy above 99%, It is really grateful,if you can provide any suggestions. My QQ is 729512518,

@yfllllll
Copy link

@shenmanmiao have you reproduce the result on cifar10? can you share the train_val.prototxt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants