# Exploring Pytorch nn.Conv2d
`nn.Conv2d()`, also known as a convolutional layer. We will do some simulation to see what's going on when we run this `nn.Conv2d()`.

Refference:
* https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html
* https://poloclub.github.io/cnn-explainer/

In [1]:
import torch
import torch.nn as nn

In [2]:
torch.__version__

'2.5.1+cpu'

In [3]:
torch.manual_seed(42)

# We will create 32 fake images with RGB color and 64x64 pixel size

# Create sample batch of random numbers with same size as image batch
images = torch.randn(size=(32, 3, 64, 64)) # [batch_size, color_channels, height, width]

# get a single image for testing
test_image = images[0] 

print(f"Image batch shape: {images.shape} -> [batch_size, color_channels, height, width]")
print(f"Single image shape: {test_image.shape} -> [color_channels, height, width]") 
print(f"Single image pixel values:\n{test_image}")

Image batch shape: torch.Size([32, 3, 64, 64]) -> [batch_size, color_channels, height, width]
Single image shape: torch.Size([3, 64, 64]) -> [color_channels, height, width]
Single image pixel values:
tensor([[[ 1.9269,  1.4873,  0.9007,  ...,  1.8446, -1.1845,  1.3835],
         [ 1.4451,  0.8564,  2.2181,  ...,  0.3399,  0.7200,  0.4114],
         [ 1.9312,  1.0119, -1.4364,  ..., -0.5558,  0.7043,  0.7099],
         ...,
         [-0.5610, -0.4830,  0.4770,  ..., -0.2713, -0.9537, -0.6737],
         [ 0.3076, -0.1277,  0.0366,  ..., -2.0060,  0.2824, -0.8111],
         [-1.5486,  0.0485, -0.7712,  ..., -0.1403,  0.9416, -0.0118]],

        [[-0.5197,  1.8524,  1.8365,  ...,  0.8935, -1.5114, -0.8515],
         [ 2.0818,  1.0677, -1.4277,  ...,  1.6612, -2.6223, -0.4319],
         [-0.1010, -0.4388, -1.9775,  ...,  0.2106,  0.2536, -0.7318],
         ...,
         [ 0.2779,  0.7342, -0.3736,  ..., -0.4601,  0.1815,  0.1850],
         [ 0.7205, -0.2833,  0.0937,  ..., -0.1002, -2.3609,

## Conv2D 1

In [4]:
conv_layer_1 = nn.Conv2d(
    in_channels=3,
    out_channels=10,
    kernel_size=3,
    stride=1,
    padding=0
)

conv_layer_1(test_image)

tensor([[[-2.8778e-01, -6.0596e-02, -5.6306e-02,  ...,  2.8654e-01,
           6.6224e-01, -2.3216e-01],
         [-9.8911e-01, -4.0099e-01,  4.1832e-01,  ...,  4.7459e-01,
          -1.8552e-01, -5.7622e-01],
         [-4.1340e-02, -2.3277e-01,  3.7418e-01,  ...,  2.8255e-02,
           1.4923e-01,  1.4236e-01],
         ...,
         [-8.0374e-01, -7.6687e-01, -5.9457e-02,  ...,  1.7452e-01,
           4.2594e-01, -4.8341e-01],
         [-1.4512e-01, -1.1566e-01,  6.1783e-01,  ...,  2.4126e-01,
          -3.6626e-01,  3.5645e-01],
         [ 3.6096e-02,  1.5214e-01,  2.3123e-01,  ...,  3.0904e-01,
          -4.9680e-01, -7.2258e-01]],

        [[-1.0853e+00, -1.6079e+00,  1.3346e-01,  ...,  2.1698e-01,
          -1.7643e+00,  2.5263e-01],
         [-8.2507e-01,  6.3866e-01,  1.8845e-01,  ..., -1.0936e-01,
           4.8068e-01,  8.4869e-01],
         [ 6.4927e-01, -4.2061e-03, -4.9991e-01,  ...,  5.8356e-01,
           2.4611e-01,  6.6233e-01],
         ...,
         [ 9.8860e-02,  1

In [5]:
test_image.unsqueeze(0).shape

torch.Size([1, 3, 64, 64])

In [6]:
conv_layer_1(test_image).unsqueeze(0).shape

torch.Size([1, 10, 62, 62])


### Breakdown:

The reason the number of channels changes to 10 after applying the convolutional layer is due to the `out_channels` parameter specified when defining the convolutional layer.

Here’s a breakdown of what happens:

- **Input**: we started with a tensor of shape `[1, 3, 64, 64]`. This means:
  - Batch size = 1
  - 3 channels (e.g., RGB channels for an image)
  - Height and width = 64x64 pixels
  
- **Convolutional Layer**: We applied a `nn.Conv2d` layer with:
  - `in_channels=3`: the layer expects an input with 3 channels (as in the RGB image).
  - `out_channels=10`: this means the convolutional layer will produce 10 output channels after the convolution operation. 
  - `kernel_size=3`: the kernel size is 3x3.
  - `stride=1`: the stride is 1, so the kernel moves by 1 pixel at a time.
  - `padding=0`: no padding is applied, so the output spatial dimensions shrink.

### How the Output Shape is Computed:
- The convolution operation applies a filter to the input channels and produces a new set of feature maps, each corresponding to one of the output channels.
- Since we set `out_channels=10`, the layer will generate 10 different feature maps. This is why the output has 10 channels.
- The spatial dimensions of the feature maps (height and width) are computed as:

$\text{Output Height} = \frac{\text{Input Height} - \text{Kernel Height}}{\text{Stride}} + 1 = \frac{64 - 3}{1} + 1 = 62$

$\text{Output Width} = \frac{\text{Input Width} - \text{Kernel Width}}{\text{Stride}} + 1 = \frac{64 - 3}{1} + 1 = 62$

So, the output shape becomes `[1, 10, 62, 62]`, where:
- `1` is the batch size,
- `10` is the number of output channels (since we specified `out_channels=10`),
- `62` is the new height and width of the output after the convolution.

### Exploring??:
What's going on if we change kernel to **5**?

$\text{Output Height} = \frac{\text{Input Height} - \text{Kernel Height}}{\text{Stride}} + 1 = \frac{64 - 5}{1} + 1 = 60$

$\text{Output Width} = \frac{\text{Input Width} - \text{Kernel Width}}{\text{Stride}} + 1 = \frac{64 - 5}{1} + 1 = 60$

### Try this simple calculation

In [13]:
input_height, input_width = 64 , 64

kernel = 5
stride = 2

output_height = int((input_height - kernel)/ stride) + 1
print(f"output_height: {output_height}")

output_height: 30


## Conv2D 2

We try to change kernel size to 5 and stride = 2, and lets's do with pytorch.

In [8]:
conv_layer_2 = nn.Conv2d(
    in_channels=3,
    out_channels=10,
    kernel_size=5,
    stride=2,
    padding=0
)


In [9]:
test_image.unsqueeze(0).shape

torch.Size([1, 3, 64, 64])

In [10]:
conv_layer_2(test_image).unsqueeze(0).shape

torch.Size([1, 10, 30, 30])

## Weight & Bias

In [11]:
# Get shapes of weight and bias tensors within conv_layer_2
print(f"conv_layer_2 weight shape: \n{conv_layer_2.weight.shape} -> [out_channels=10, in_channels=3, kernel_size=5, kernel_size=5]")
print(f"\nconv_layer_2 bias shape: \n{conv_layer_2.bias.shape} -> [out_channels=10]")

conv_layer_2 weight shape: 
torch.Size([10, 3, 5, 5]) -> [out_channels=10, in_channels=3, kernel_size=5, kernel_size=5]

conv_layer_2 bias shape: 
torch.Size([10]) -> [out_channels=10]
