In [1]:
import torch

## Testing broadcasting rules to see why keepdim=True matters

In [20]:
a = torch.Tensor([
    [1, 2], 
    [3, 4]])

In [21]:
s1 = a.sum(1, True)
print(s1)
print(s1.shape)

print(a / s1)

tensor([[3.],
        [7.]])
torch.Size([2, 1])
tensor([[0.3333, 0.6667],
        [0.4286, 0.5714]])


In [22]:
s2 = a.sum(1, False)
print(s2)
print(s2.shape)

print(a / s2)

tensor([3., 7.])
torch.Size([2])
tensor([[0.3333, 0.2857],
        [1.0000, 0.5714]])


In this example, `a` is a 2x2 tensor.

`s1 = a.sum(1, keepdim=True)` gives a tensor with shape `[2, 1]`

`s2 = a.sum(1, keepdim=False)` gives a tensor with shape `[2]`

`a / s1` causes each column to be copied. `[[1, 2], [3, 4]]` is divided by `[[3, 3], [7, 7]]`
`a / s2` causes each row to be copied. `[[1, 2], [3, 4]]` is divided by `[[3, 7], [3, 7]]`.

Another way to say this is that a tensor with one dimension is treated like a row tensor. When you perform a broadcasting operation with a one dimensional tensor, you will treat the one dimensional tensor as a row and make copies of it to make multiple columns.

## Testing broadcasting rules with a 3x2 tensor

In [32]:
a2 = torch.Tensor([
    [1, 2], 
    [3, 4],
    [5, 6]])

In [33]:
s1 = a2.sum(1, True)
print(s1.shape)

print(a2 / s1)


torch.Size([3, 1])
tensor([[0.3333, 0.6667],
        [0.4286, 0.5714],
        [0.4545, 0.5455]])


In [35]:
s2 = a2.sum(1, False)
print(s2.shape)

print(a2 / s2)


torch.Size([3])


RuntimeError: The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 1

With a 3x2, tensor, we see that the broadcasting operation fails entirely, because we match the trailing dimension before moving on to other dimensions.

This also highlights that the overall operation we are trying to do doesn't make sense. If we trying to turn each row from a list of counts into a list of probabilities, the operation will fail entirely, rather than succeed buggily when we use a non-square tensor.