# Max Pooling in Convolutional Neural Networks


Convolutional Neural Networks (CNNs) are a class of deep neural networks, most commonly applied to analyzing visual imagery. CNNs are known for their ability to automatically and adaptively learn spatial hierarchies of features from input images.

## Max Pooling: Theory

### What is Max Pooling?

Max pooling is a downsampling technique commonly used in convolutional neural networks to reduce the dimensionality of the input data. It works by sliding a window (kernel) across the input data and taking the maximum value within the window at each step. This process is repeated across the entire input array to produce a downsampled output.

### Purpose of Max Pooling

1. **Reduction of Computational Load:** By reducing the dimensions of the input data, max pooling decreases the number of parameters to learn, thus speeding up the learning process.
2. **Translation Invariance:** Max pooling helps the model to be robust against minor translations of the input data.
3. **Feature Highlighting:** It emphasizes the most present feature in the window, which can be critical for the learning process.

## Mathematical Formulation

Given an input matrix $X$ of size $H \times W$, the max pooling operation with a kernel size of $k$ and stride $s$ produces an output matrix $Y$ where each element $y_{ij}$ is calculated as:

$$ y_{ij} = \max_{a=0}^{k-1} \max_{b=0}^{k-1} x_{(i \cdot s + a)(j \cdot s + b)} $$

where $i$ and $j$ iterate over the output dimensions.

### Setup

In [1]:
import torch
import torch.nn.functional as F
from torch import nn
import time
import matplotlib.pyplot as plt

## Max Pooling Operation

In [2]:
def maxpool2d_simple(input, kernel_size=2, stride=2):
    """
    Applies a simple max pooling operation to the input tensor.

    Args:
        input (torch.Tensor): The input tensor of shape (input_height, input_width).
        kernel_size (int): The size of the kernel used for pooling. Default is 2.
        stride (int): The stride value used for pooling. Default is 2.

    Returns:
        torch.Tensor: The output tensor after max pooling, of shape (output_height, output_width).
    """

    # Get the dimensions of the input tensor
    input_height, input_width = input.shape

    # Calculate the output height and width
    output_height = (input_height - kernel_size) // stride + 1
    output_width = (input_width - kernel_size) // stride + 1

    # Create an output tensor of zeros
    output = torch.zeros(output_height, output_width)

    # Apply max pooling
    for i in range(output_height):
        for j in range(output_width):
            # Take the maximum value within the kernel window
            output[i, j] = torch.max(
                input[
                    i * stride : i * stride + kernel_size,
                    j * stride : j * stride + kernel_size,
                ]
            )

    return output

In [3]:
input = torch.tensor([
    [1, 3, 2, 4],
    [5, 6, 8, 8],
    [9, 7, 5, 6],
    [8, 4, 3, 2]
])

print("Input Matrix:")
print(maxpool2d_simple(input))

Input Matrix:
tensor([[6., 8.],
        [9., 6.]])


## Extended Implementation

In [5]:
def maxpool2d(input, kernel_size=2, stride=2):
    # Get the dimensions of the input tensor
    batch_size, in_channels, input_height, input_width = input.shape

    # Calculate the dimensions of the output tensor
    output_height = (input_height - kernel_size) // stride + 1
    output_width = (input_width - kernel_size) // stride + 1

    # Create an output tensor with zeros
    output = torch.zeros(batch_size, in_channels, output_height, output_width)

    # Perform max pooling operation
    for b in range(batch_size):
        for c in range(in_channels):
            for i in range(output_height):
                for j in range(output_width):
                    # Take the maximum value within the kernel window
                    output[b, c, i, j] = torch.max(
                        input[
                            b,
                            c,
                            i * stride : i * stride + kernel_size,
                            j * stride : j * stride + kernel_size,
                        ]
                    )

    return output

## Checking the implementations

In [8]:
# Generate a test input tensor
input = torch.randn(5, 3, 28, 28)  # Example shape: batch_size=1, channels=3, height=32, width=32

# Apply Vectorized Custom Max Pooling
start_vectorized = time.time()
custom_output = maxpool2d(input, kernel_size=2, stride=2)
end_vectorized = time.time()

# Apply PyTorch Max Pooling
maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
start_pytorch = time.time()
pytorch_output = maxpool(input)
end_pytorch = time.time()

# Compare Outputs
are_close = torch.allclose(custom_output, pytorch_output, atol=1e-6)
print(f"Are the outputs close? {are_close}")

# Measure and Compare Performance
print(f"Vectorized Max Pooling Time: {end_vectorized - start_vectorized:.6f} seconds")
print(f"PyTorch Max Pooling Time: {end_pytorch - start_pytorch:.6f} seconds")

Are the outputs close? True
Vectorized Max Pooling Time: 0.081519 seconds
PyTorch Max Pooling Time: 0.000000 seconds


Because Pytorch's implementation is much more efficient we will use this implementation from now on.