-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch size and penalty terms #35
Comments
Gradients are not averaged, however the losses are if the loss is instantiated in such a way. Relevant part is here: https://github.com/uma-pi1/kge/blob/master/kge/job/train.py#L270
penalty_value is not averaged. |
Loss average implies gradient average. The relevant code piece is here:
7aa82ce should thus be reverted. If anything, the norms should be averaged over the number of training examples. But since this is constant (and can be thus be viewed as part of lambda) and not meaningful for all penalties, we shouldn't do this. |
So smaller batch size (bs) means more penalty per epoch? With your suggestion we apply a penalty of N / bs * || T || per epoch and with mine || T || , correct? Another question to this topic: Why didn't we implement the weighted norm like in the CP paper? |
Perhaps |
Yes, more penalty per epoch. But also more loss per epoch, since the gradient of every example now has more impact in every step. Again, without the patch:
This is what we want: the expected gradient is independent of the batch size. |
As for weighted norm: that's a separate issue. If you mean that frequency-based weighting is not implemented: #20 |
7aa82ce introduces a patch that divides penalty terms by the number of batches to keep the penelty terms consistent. This needs discussion.
In particular: we average example losses/gradients over the batch. Thus, before 7aa82ce:
That's independent of the number of batches. After 7aa82ce :
E[gradient] = E[gradient of a random example] + gradient of penalty term / num_batches
That's dependent on the number of batches. This patch thus seems to introduce what it tries to avoid.
The text was updated successfully, but these errors were encountered: