New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question about gradient calculation with respect to weight #2
Comments
same question here. why update of weight it the same for m=1, 2, 3, 4 |
also, since you normalize the weight by overwriting in forward() instead of keeping the original weight like here:
so everytime the after update of parameter: weight = weight + grad_w, the weight is simply |
+1, backward seems not correspond with forward. |
It is actually a normalized version of the gradient, which can help converge more stably. The direction is the same as before. What we do here is simply to rescale the gradient (the learning rate can help us decide the scale). Similar idea and intuition also appear in https://arxiv.org/pdf/1707.04822.pdf. However, if you use the original gradient to do the backprop, you could still make it work and obtain similar results, but may not be as stable as this normalized one. |
In Margin Inner Product, the gradient with respect to weight is very simple:
But in large margin softmax, the gradient calcuation is much more complex...
Can you please tell me how to simplify the gradient calcuation?
I failed to derive it ...
The text was updated successfully, but these errors were encountered: