This notebook is a collection of examples of 
1. implementing description of tensors in articles to pytorch code
2. tensor manipulation

In articles, the dimensionality of tensors are described using the following notation: $\mathbb{R}^{b \times c \times h \times w}$. In this case $b$ represents the batch dimension, $c$ represents the channels dimension, $h$ represents the hight dimension and $w$ represents the width dimension.

In [1]:
import torch

BATCH_SIZE = 32
N_CHANNELS = 3
HEIGHT = 128
WIDTH = 128

tensor = torch.randn(BATCH_SIZE, N_CHANNELS, HEIGHT, WIDTH) # Creates a 4-dimensional tensor. 
print(tensor.shape) # Tensor shape: [BATCH_SIZE, N_CHANNELS, HEIGHT, WIDTH]

torch.Size([32, 3, 128, 128])


### The dim= and keepdim= arguments
In functions like torch.mean() which calculates the mean for either all dimensions or a specified dimension, there is a 'dim=' argument which can be specified to allow us to collapse any given dimension to calculate the mean for specific dimensions. 'keepdim=' is an argument that takes a true or false. If true, the operation will retain the original tensor shape. 'keepdim=True' can only be used when we specify a dimension to collapse in the 'dim=' argument.

In [5]:
# Using torch.mean() without any input arguments:

mean_tensor = torch.mean(tensor)
print(mean_tensor)
print(mean_tensor.shape)

tensor(0.0010)
torch.Size([])


In [10]:
# Using torch.mean(dim=-1) with keepdim=False

keepdim_false_mean_tensor = torch.mean(tensor, dim=-1, keepdim=False)
print(keepdim_false_mean_tensor)
print(keepdim_false_mean_tensor.shape)

tensor([[[ 0.0103, -0.0636, -0.0599,  ...,  0.0900, -0.0788, -0.0492],
         [ 0.0899,  0.1647, -0.0868,  ...,  0.2007, -0.0442,  0.0627],
         [ 0.1145, -0.0778, -0.2108,  ...,  0.0831,  0.0509,  0.1069]],

        [[ 0.1141, -0.0008,  0.0062,  ...,  0.1020, -0.0013, -0.0997],
         [ 0.1376,  0.1353,  0.0533,  ...,  0.0048, -0.0218, -0.1752],
         [ 0.1114,  0.0043, -0.0084,  ..., -0.1878, -0.0517, -0.0220]],

        [[-0.2787, -0.0900,  0.1750,  ...,  0.0444, -0.0831, -0.1077],
         [ 0.1114, -0.1210, -0.0892,  ...,  0.1100, -0.0670, -0.0466],
         [ 0.0324, -0.0396, -0.1429,  ...,  0.0389, -0.0826,  0.0812]],

        ...,

        [[ 0.0066,  0.0088,  0.0434,  ...,  0.1500,  0.0973,  0.0213],
         [ 0.1311, -0.0912,  0.0539,  ..., -0.0416,  0.1004,  0.0154],
         [-0.1867,  0.0043, -0.1352,  ...,  0.0136,  0.1593, -0.0230]],

        [[-0.1921,  0.0881, -0.0896,  ...,  0.0970,  0.1282,  0.0787],
         [-0.0669,  0.0340, -0.0762,  ..., -0.0529,  0.

From the example above we can see that the last dimension disappeared, and we're left with a 3D tensor. As we set 'dim=-1' which is the last dimension, or the width dimension of our tensor, we are calculating the average value across the width of the images. For every single row of pixels in every channel of every image in your batch, you take the average of that row. You are left with a vertical strip of values representing the mean intensity of each row. 

In [11]:
# Using torch.mean(dim=-1) with keepdim=True

keepdim_true_mean_tensor = torch.mean(tensor, dim=-1, keepdim=True)
print(keepdim_true_mean_tensor)
print(keepdim_true_mean_tensor.shape)

tensor([[[[ 0.0103],
          [-0.0636],
          [-0.0599],
          ...,
          [ 0.0900],
          [-0.0788],
          [-0.0492]],

         [[ 0.0899],
          [ 0.1647],
          [-0.0868],
          ...,
          [ 0.2007],
          [-0.0442],
          [ 0.0627]],

         [[ 0.1145],
          [-0.0778],
          [-0.2108],
          ...,
          [ 0.0831],
          [ 0.0509],
          [ 0.1069]]],


        [[[ 0.1141],
          [-0.0008],
          [ 0.0062],
          ...,
          [ 0.1020],
          [-0.0013],
          [-0.0997]],

         [[ 0.1376],
          [ 0.1353],
          [ 0.0533],
          ...,
          [ 0.0048],
          [-0.0218],
          [-0.1752]],

         [[ 0.1114],
          [ 0.0043],
          [-0.0084],
          ...,
          [-0.1878],
          [-0.0517],
          [-0.0220]]],


        [[[-0.2787],
          [-0.0900],
          [ 0.1750],
          ...,
          [ 0.0444],
          [-0.0831],
          [-0.1077

If we set 'dim=0', that means that we will calculate the average values for N_CHANNELS, HEIGHT, WIDTH for all samples in that batch. 

Setting 'keepdim=True' retains the tensor shape, but effectively the last dimension is still collapsed. This is cucial if we want to be able to broadcast the tensor with other tensors that has the same shape as the original shape of the tensor before we calculated the mean along one dimension.

In [29]:
tensor2 = torch.rand(2, 2, 2, 2)
print(tensor2)

tensor([[[[0.3934, 0.0250],
          [0.9241, 0.0187]],

         [[0.0944, 0.6992],
          [0.8301, 0.0452]]],


        [[[0.4931, 0.5485],
          [0.1791, 0.3526]],

         [[0.3641, 0.5757],
          [0.9030, 0.2807]]]])


In [30]:
mean_tensor2 = torch.mean(tensor2)
print(mean_tensor2)

tensor(0.4204)


In [32]:
dim_0_mean_tensor2 = torch.mean(tensor2, dim=0)
print(dim_0_mean_tensor2)

tensor([[[0.4433, 0.2868],
         [0.5516, 0.1857]],

        [[0.2292, 0.6375],
         [0.8665, 0.1629]]])
