Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation questions #4

Closed
bkj opened this issue Jul 22, 2017 · 2 comments
Closed

Implementation questions #4

bkj opened this issue Jul 22, 2017 · 2 comments

Comments

@bkj
Copy link

bkj commented Jul 22, 2017

Are you able to explain a couple of bits about the implementation of the margin_inner_product_layer?

  1. What is lambda_? It looks like it's a constant that decreases w/ the iterations, but doesn't seem to be mentioned in the paper. ** Edit: Looks like you mix x'w and margin(x'w) via (margin(x'w) + lambda_ * x'w) / (1 + lambda_) where lambda_ decreases exponentially w/ iterations. Is that right? **

  2. What is the type parameter? (eg. SINGLE, DOUBLE, TRIPLE, QUADRUPLE) I'm guessing this is how you set the value of m from the paper? ** Edit: I gather these are ways of implementing the margin for m={1,2,3,4}. Any particular reason why you implemented this way? Numerical stability? **

Thanks

@melgor
Copy link

melgor commented Jul 24, 2017

As I analyse the code, I will answer your questions:

  1. This idea was not mentioned is SphereFace (unfortunately). But it is explained at Large-Margin Softmax. The idea behind it is to at the begging use pure SoftMax and at every iteration increase the weight from A-Softmax and lower weight of SoftMax till the weight of SoftMax will be 0 and there will be pure A-Softmax

  2. Your guess is right. I think that this implementation look like that because of speed. For the much cleaner version of Large-Margin Softmax (which is old version of A-SoftMax and main difference is normalized weights) is here. There are function where you just input as argument 'margin' and it calculate values depended on that. No hard-coded values etc. Also much cleaner because look like pure numpy code. Bu its is also much slower (even 10x)

@wy1iu
Copy link
Owner

wy1iu commented Jul 31, 2017

Thanks @melgor for answering the questions. :)

@wy1iu wy1iu closed this as completed Jul 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants