-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Difficult to train with LargeMargin_Softmax_Loss on cifar10 #10
Comments
I tried "clip_gradients" the solver.prototxt, but it still ended up with 87.3365. |
firstly,please change the deplay iteration from 200 to 10 to see how the loss change. |
For CIFAR10, it should be easy to train. If the network diverges, consider decreasing lambda more smoothly. Or simply lower down the difficulty of the loss, i.e. setting a smaller m. |
Same problem with @qianxinchun , the network diverges even i set lambda_min=0.5 and m=2. @wy1iu Could you please share your training log(m=4) please? |
I believe you could train it using PReLU. Using ReLU may need more parameter tuning. @shenmanmiao |
PReLU works well on Cifar10, thanks @wy1iu for your reply. |
Hi,thank you for your sharing. I use CASIA-WebFace and A-softmax(Sphereface paper) train the model. The model converged and the accuracy on lfw is 97.5%. It is really hard to achieve the accuracy above 99%, It is really grateful,if you can provide any suggestions. My QQ is 729512518, |
@shenmanmiao have you reproduce the result on cifar10? can you share the train_val.prototxt? |
I have tried to train myexamples/cifar10/model/cifar_train_test.prototxt with different settings- DOUBLE/TRIPLE/QUADRUPLE, but it always goes like this:
I0327 02:22:00.515635 16177 solver.cpp:228] Iteration 12000, loss = 87.3365
I0327 02:22:00.515707 16177 solver.cpp:244] Train net output #0: lambda = 0.0624753
I0327 02:22:00.515720 16177 solver.cpp:244] Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I0327 02:22:00.586127 16177 solver.cpp:244] Train net output #2: mean_length = inf
I0327 02:22:00.586163 16177 sgd_solver.cpp:106] Iteration 12000, lr = 0.001
I0327 02:26:54.401607 16177 solver.cpp:228] Iteration 12200, loss = 87.3365
I0327 02:26:54.401752 16177 solver.cpp:244] Train net output #0: lambda = 0.0540467
I0327 02:26:54.401765 16177 solver.cpp:244] Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I0327 02:26:54.471928 16177 solver.cpp:244] Train net output #2: mean_length = inf
I0327 02:26:54.471937 16177 sgd_solver.cpp:106] Iteration 12200, lr = 0.001
I0327 02:31:48.234402 16177 solver.cpp:228] Iteration 12400, loss = 87.3365
I0327 02:31:48.234601 16177 solver.cpp:244] Train net output #0: lambda = 0.0467769
I0327 02:31:48.234617 16177 solver.cpp:244] Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I0327 02:31:48.304947 16177 solver.cpp:244] Train net output #2: mean_length = inf
I0327 02:31:48.304958 16177 sgd_solver.cpp:106] Iteration 12400, lr = 0.001
I0327 02:36:42.063432 16177 solver.cpp:228] Iteration 12600, loss = 87.3365
I0327 02:36:42.063588 16177 solver.cpp:244] Train net output #0: lambda = 0.0405035
I0327 02:36:42.063603 16177 solver.cpp:244] Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I0327 02:36:42.134166 16177 solver.cpp:244] Train net output #2: mean_length = inf
How to tackle with the problem ?
The text was updated successfully, but these errors were encountered: