Batch size and penalty terms #35

rgemulla · 2019-08-13T11:05:38Z

7aa82ce introduces a patch that divides penalty terms by the number of batches to keep the penelty terms consistent. This needs discussion.

In particular: we average example losses/gradients over the batch. Thus, before 7aa82ce:

E[gradient] = E[gradient of a random example] + gradient of penatly term

That's independent of the number of batches. After 7aa82ce :

E[gradient] = E[gradient of a random example] + gradient of penalty term / num_batches

That's dependent on the number of batches. This patch thus seems to introduce what it tries to avoid.

The text was updated successfully, but these errors were encountered:

samuelbroscheit · 2019-08-13T11:58:57Z

Gradients are not averaged, however the losses are if the loss is instantiated in such a way.

Relevant part is here:

https://github.com/uma-pi1/kge/blob/master/kge/job/train.py#L270

            penalty_values = self.model.penalty(
                epoch=self.epoch, batch_index=batch_index, num_batches=len(self.loader)
            )
            for pv_index, pv_value in enumerate(penalty_values):
                penalty_value = penalty_value + pv_value
                if len(sum_penalties) > pv_index:
                    sum_penalties[pv_index] += pv_value.item()
                else:
                    sum_penalties.append(pv_value.item())
            sum_penalty += penalty_value.item()
            batch_forward_time += time.time()
            forward_time += batch_forward_time

            # backward pass
            batch_backward_time = -time.time()
            cost_value = loss_value + penalty_value
            cost_value.backward()

penalty_value is not averaged.

rgemulla · 2019-08-13T12:19:35Z

Loss average implies gradient average. The relevant code piece is here:

            loss_value, batch_size, batch_prepare_time, batch_forward_time = self._compute_batch_loss(
                batch_index, batch
            )
            sum_loss += loss_value.item() * batch_size

_compute_batch_loss is thus supposed to average, that's why it's multiplied by the batch size afterwards.

7aa82ce should thus be reverted. If anything, the norms should be averaged over the number of training examples. But since this is constant (and can be thus be viewed as part of lambda) and not meaningful for all penalties, we shouldn't do this.

samuelbroscheit · 2019-08-13T12:45:47Z

So smaller batch size (bs) means more penalty per epoch? With your suggestion we apply a penalty of N / bs * || T || per epoch and with mine || T || , correct?

Another question to this topic: Why didn't we implement the weighted norm like in the CP paper?

rgemulla · 2019-08-13T12:45:48Z

Perhaps _compute_batch_loss should be renamed into _compute_batch_loss_avg to be easier to interpret.

rgemulla · 2019-08-13T12:50:00Z

Yes, more penalty per epoch. But also more loss per epoch, since the gradient of every example now has more impact in every step. Again, without the patch:

E[gradient] = E[gradient of a random example] + gradient of penatly term

This is what we want: the expected gradient is independent of the batch size.

rgemulla · 2019-08-13T12:53:17Z

As for weighted norm: that's a separate issue. If you mean that frequency-based weighting is not implemented: #20

rgemulla assigned samuelbroscheit Aug 13, 2019

samuelbroscheit closed this as completed Aug 13, 2019

rgemulla mentioned this issue Aug 13, 2019

Add weighted regularization #20

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch size and penalty terms #35

Batch size and penalty terms #35

rgemulla commented Aug 13, 2019

samuelbroscheit commented Aug 13, 2019

rgemulla commented Aug 13, 2019

samuelbroscheit commented Aug 13, 2019 •

edited

Loading

rgemulla commented Aug 13, 2019

rgemulla commented Aug 13, 2019 •

edited

Loading

rgemulla commented Aug 13, 2019 •

edited

Loading

Batch size and penalty terms #35

Batch size and penalty terms #35

Comments

rgemulla commented Aug 13, 2019

samuelbroscheit commented Aug 13, 2019

rgemulla commented Aug 13, 2019

samuelbroscheit commented Aug 13, 2019 • edited Loading

rgemulla commented Aug 13, 2019

rgemulla commented Aug 13, 2019 • edited Loading

rgemulla commented Aug 13, 2019 • edited Loading

samuelbroscheit commented Aug 13, 2019 •

edited

Loading

rgemulla commented Aug 13, 2019 •

edited

Loading

rgemulla commented Aug 13, 2019 •

edited

Loading