You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Are you able to explain a couple of bits about the implementation of the margin_inner_product_layer?
What is lambda_? It looks like it's a constant that decreases w/ the iterations, but doesn't seem to be mentioned in the paper. ** Edit: Looks like you mix x'w and margin(x'w) via (margin(x'w) + lambda_ * x'w) / (1 + lambda_) where lambda_ decreases exponentially w/ iterations. Is that right? **
What is the type parameter? (eg. SINGLE, DOUBLE, TRIPLE, QUADRUPLE) I'm guessing this is how you set the value of m from the paper? ** Edit: I gather these are ways of implementing the margin for m={1,2,3,4}. Any particular reason why you implemented this way? Numerical stability? **
Thanks
The text was updated successfully, but these errors were encountered:
As I analyse the code, I will answer your questions:
This idea was not mentioned is SphereFace (unfortunately). But it is explained at Large-Margin Softmax. The idea behind it is to at the begging use pure SoftMax and at every iteration increase the weight from A-Softmax and lower weight of SoftMax till the weight of SoftMax will be 0 and there will be pure A-Softmax
Your guess is right. I think that this implementation look like that because of speed. For the much cleaner version of Large-Margin Softmax (which is old version of A-SoftMax and main difference is normalized weights) is here. There are function where you just input as argument 'margin' and it calculate values depended on that. No hard-coded values etc. Also much cleaner because look like pure numpy code. Bu its is also much slower (even 10x)
Are you able to explain a couple of bits about the implementation of the
margin_inner_product_layer
?What is
lambda_
? It looks like it's a constant that decreases w/ the iterations, but doesn't seem to be mentioned in the paper. ** Edit: Looks like you mixx'w
andmargin(x'w)
via(margin(x'w) + lambda_ * x'w) / (1 + lambda_)
wherelambda_
decreases exponentially w/ iterations. Is that right? **What is the
type
parameter? (eg.SINGLE, DOUBLE, TRIPLE, QUADRUPLE
) I'm guessing this is how you set the value ofm
from the paper? ** Edit: I gather these are ways of implementing the margin form={1,2,3,4}
. Any particular reason why you implemented this way? Numerical stability? **Thanks
The text was updated successfully, but these errors were encountered: