# **Conv layers**

In [4]:
import torch
from torch import nn

In PyTorch, we can create a convolutional layer using nn.Conv2d:

In [2]:
conv = nn.Conv2d(in_channels=3,  # number of input channels
                 out_channels=7, # number of output channels
                 kernel_size=5)  # size of the kernel

The conv layer expects as input a tensor in the format "NCHW", meaning that the dimensions of the tensor should follow the order:

* batch size
* channel
* height
* width

For example, we can emulate a batch of 32 colour images, each of size 128x128, like this:

In [5]:
x = torch.randn(32, 3, 128, 128)
y = conv(x)
y.shape

torch.Size([32, 7, 124, 124])

The output tensor is also in the "NCHW" format. We still have 32 images, and 7 channels (consistent with out_channels of conv), and of size 124x124. If we added the appropriate padding to conv, namely padding = kernel_size // 2, then our output width and height should be consistent with the input width and height:

In [6]:
conv2 = nn.Conv2d(in_channels=3,
                  out_channels=7,
                  kernel_size=5,
                  padding=2)

x = torch.randn(32, 3, 128, 128)
y = conv2(x)
y.shape

torch.Size([32, 7, 128, 128])

# **Parameters of a Convolutional Layer**

Recall that the trainable parameters of a fully-connected layer includes the network weights and biases. There is one weight for each connection, and one bias for each output unit:

In [7]:
fc = nn.Linear(100, 30)
fc_params = list(fc.parameters())
print("len(fc_params)", len(fc_params))
print("Weights:", fc_params[0].shape)
print("Biases:", fc_params[1].shape)

len(fc_params) 2
Weights: torch.Size([30, 100])
Biases: torch.Size([30])


In a convolutional layer, the trainable parameters include the convolutional kernels (filters) and also a set of biases:

In [8]:
conv2 = nn.Conv2d(in_channels=3,
                  out_channels=7,
                  kernel_size=5,
                  padding=2)
conv_params = list(conv2.parameters())
print("len(conv_params):", len(conv_params))
print("Filters:", conv_params[0].shape)
print("Biases:", conv_params[1].shape)

len(conv_params): 2
Filters: torch.Size([7, 3, 5, 5])
Biases: torch.Size([7])


There is one bias for each output channel. Each bias is added to every element in that output channel. Note that the bias computation was not shown in the above figures, and are often omitted in other texts describing convolutional arithmetics. Nevertheless, the biases are there.

# **Pooling layers**

In [77]:
pool = nn.MaxPool2d(kernel_size=2, stride=2) # take the max over each 2-by-2 small blocks
y = conv2(x)
z = pool(y)
x.shape, y.shape, z.shape

(torch.Size([32, 3, 128, 128]),
 torch.Size([32, 7, 128, 128]),
 torch.Size([32, 7, 64, 64]))

Usually, the kernel size and the stride length will be equal.

The pooling layer has no trainable parameters:

In [10]:
list(pool.parameters())

[]

# Additional*

cross-correlation operation

In [19]:
def corr2d(X, K):
    h, w = K.shape
    Y = torch.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            # print(X[i:i+h, j:j+w])
            # print(K)
            Y[i,j] = (X[i:i+h, j:j+w] * K).sum()
    return Y

In [20]:
X = torch.tensor([[0.0, 1.0, 2.0],
                  [3.0, 4.0, 5.0],
                  [6.0, 7.0, 8.0]])
K = torch.tensor([[0.0, 1.0],
                  [2.0, 3.0]])
corr2d(X, K)

tensor([[19., 25.],
        [37., 43.]])

Convolutional layer

In [21]:
class Conv2D(nn.Module):
    def __init__(self, kernel_size):
        super().__init__()
        self.weight = nn.Parameter(torch.rand(kernel_size))
        self.bias = nn.Parameter(torch.rand(1))
    
    def forward(self, x):
        return corr2d(x, self.weight) + self.bias

## An experiment to show why kernel is useful

In [22]:
X = torch.ones((6,8))
X[:, 2:6] = 0
X

tensor([[1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.]])

In [23]:
K = torch.tensor([[1.0, -1.0]])

In [24]:
Y = corr2d(X, K)
Y

tensor([[ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.]])

In [25]:
corr2d(X.t(), K)

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])

The key take away from this experiment is that: each kernel can detect a particular kind of pattern.

## Learning a Kernel

Apply gradient descent algorithm can allow the kernel $K$ to adjust its value so that $\hat{Y} = conv2d(X,K)$ and $Y$ being close enough.

In [26]:
conv2d = nn.Conv2d(1, 1, kernel_size=(1,2), bias=False)

X = X.reshape((1,1,6,8))
Y = Y.reshape((1,1,6,7))
lr = 3e-2

for i in range(10):
    Y_hat = conv2d(X)
    l = (Y_hat-Y)**2
    conv2d.zero_grad()
    l.sum().backward()

    conv2d.weight.data[:] -= lr * conv2d.weight.grad
    if (i+1) % 2 == 0:
        print(f'epoch {i+1}, loss {l.sum(): 3f}')

epoch 2, loss  1.667570
epoch 4, loss  0.280130
epoch 6, loss  0.047145
epoch 8, loss  0.007970
epoch 10, loss  0.001362


In [27]:
conv2d.weight.data.reshape((1,2))

tensor([[ 0.9926, -0.9940]])

The result is close to $[[1, -1]]$, the theoritical solution that we have seen.

## Padding
Padding is to add `edges' for the original image, so that we can manipulate the shape of the output.

In [59]:
def comp_conv2d(conv2d, X):
    X = X.reshape((1,1) + X.shape)
    Y = conv2d(X)      # the default class nn.Conv2D takes a 4-dimensional tensor as its input
    return Y.reshape(Y.shape[2:])

conv2d = nn.Conv2d(1, 1, kernel_size=3, padding=1)
X = torch.rand(size=(8,8))
comp_conv2d(conv2d, X).shape

torch.Size([8, 8])

In [31]:
conv2d = nn.Conv2d(1, 1, kernel_size=3) # default padding=0
X = torch.rand(size=(8,8))
comp_conv2d(conv2d, X).shape

torch.Size([6, 6])

In [None]:
X = torch.rand(size=(8,8))
conv2d = nn.Conv2d(1,1,kernel_size=(5,3), padding=(2,1)) # set different for vertical and horrisontal 
comp_conv2d(conv2d, X).shape

torch.Size([8, 8])

up, down, left, right each direction add a zero, so 6 becomes 8 rather than 7.

Let us then see a direct example.

In [72]:
# Without padding
conv2d = nn.Conv2d(1, 1, kernel_size=(2,2), bias=False) 
conv2d.weight = nn.Parameter(torch.tensor([[[[0.0, 1.0],
                                             [2.0, 3.0]]]]))

X = torch.tensor([[0.0, 1.0, 2.0],
                  [3.0, 4.0, 5.0],
                  [6.0, 7.0, 8.0]])
output = conv2d(X.reshape((1,1) + X.shape))
print(output)

tensor([[[[19., 25.],
          [37., 43.]]]], grad_fn=<ConvolutionBackward0>)


In [73]:
# Without padding
conv2d = nn.Conv2d(1, 1, kernel_size=(2,2), bias=False, padding=1) 
conv2d.weight = nn.Parameter(torch.tensor([[[[0.0, 1.0],
                                             [2.0, 3.0]]]]))

X = torch.tensor([[0.0, 1.0, 2.0],
                  [3.0, 4.0, 5.0],
                  [6.0, 7.0, 8.0]])
output = conv2d(X.reshape((1,1) + X.shape))
print(output)

tensor([[[[ 0.,  3.,  8.,  4.],
          [ 9., 19., 25., 10.],
          [21., 37., 43., 16.],
          [ 6.,  7.,  8.,  0.]]]], grad_fn=<ConvolutionBackward0>)


## Stride

basically means skip some blocks

In [75]:
conv2d = nn.Conv2d(1, 1, kernel_size=2, padding=1, stride=2, bias=False)
conv2d.weight = nn.Parameter(torch.tensor([[[[0.0, 1.0],
                                             [2.0, 3.0]]]]))

X = torch.tensor([[0.0, 1.0, 2.0],
                  [3.0, 4.0, 5.0],
                  [6.0, 7.0, 8.0]])
output = conv2d(X.reshape((1,1) + X.shape))
print(output)

tensor([[[[ 0.,  8.],
          [21., 43.]]]], grad_fn=<ConvolutionBackward0>)


Theoritically, the number of rows should be floor( (3 + 1*2 - 2 + 2)/2 ) = floor(2.5) = 2.