# Convolutional Neural Networks (CNNs) - Basics

## 1. What is a CNN?

A Convolutional Neural Network (CNN) is a type of deep learning model especially effective for image-related tasks.

- **Input**: Labeled dataset of images `\(\boldsymbol{x}\)` and target values `\(y\)`
- **Goal**: Learn a mapping function `\(f(\boldsymbol{x}; \theta)\)` that minimises prediction error using a loss function `\(\mathcal{L}\)`
- **Training**: Optimises model parameters `\(\theta\)` by minimising the empirical loss:

D = { (x₁, y₁), (x₂, y₂), ..., (x_N, y_N) }

min_θ (1/N) * Σᵢ=1ⁿ L(f(xᵢ; θ), yᵢ)

---

## 2. Key Layers in a CNN

- **Conv2D Layer**: Applies filters (kernels) to extract features.
- **ReLU**: Applies non-linearity (activation function).
- **MaxPooling**: Downsamples feature maps to reduce spatial size.
- **Fully Connected Layer**: Final classifier that outputs prediction scores.

---

## 3. Simple CNN Architecture Example

```text
Input (e.g., 28x28 image)
↓
Conv2D → ReLU
↓
MaxPooling
↓
Conv2D → ReLU
↓
MaxPooling
↓
Flatten
↓
Fully Connected → Softmax

```
---

## 4. Sample Use Case: MNIST Digit Classification
cnn_basic.py for a runnable PyTorch implementation.

In [10]:
%pip install torch torchvision

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.0 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [9]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Define a simple CNN
class BasicCNN(nn.Module):
    """A simple Convolutional Neural Network for MNIST classification."""
    def __init__(self):
        super(BasicCNN, self).__init__() # Initialize the parent class
        self.conv1 = nn.Conv2d(1, 16, kernel_size=3, padding=1) # Input channels = 1 (grayscale), Output channels = 16
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, padding=1) # Input channels = 16, Output channels = 32
        self.pool = nn.MaxPool2d(2, 2) # 2x2 max pooling
        self.fc1 = nn.Linear(32 * 7 * 7, 128) # Fully connected layer
        self.fc2 = nn.Linear(128, 10) # Output layer for 10 classes (digits 0-9)

    def forward(self, x):
        """ Defines the forward pass of the CNN."""
        x = self.pool(F.relu(self.conv1(x)))  # [batch, 16, 14, 14]
        x = self.pool(F.relu(self.conv2(x)))  # [batch, 32, 7, 7]
        x = x.view(-1, 32 * 7 * 7)            # flatten - vectorize the output
        x = F.relu(self.fc1(x))               # [batch, 128]
        x = self.fc2(x)                       # [batch, 10]
        return x

# Load MNIST dataset
transform = transforms.Compose([transforms.ToTensor()]) # Convert images to PyTorch tensors
trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform) # Training set
trainloader = DataLoader(trainset, batch_size=64, shuffle=True) # DataLoader for batching

# Instantiate model, loss, and optimizer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Use GPU if available
model = BasicCNN().to(device) # Move model to device
criterion = nn.CrossEntropyLoss() # Loss function for multi-class classification 
optimizer = torch.optim.Adam(model.parameters(), lr=0.001) # Optimizer for training

# Training loop
for epoch in range(5):  # 5 epochs
    running_loss = 0.0
    for images, labels in trainloader:
        images, labels = images.to(device), labels.to(device) # Move data to device
        
        optimizer.zero_grad() # Zero the gradients
        outputs = model(images) # Forward pass
        loss = criterion(outputs, labels) # Compute loss 
        loss.backward() # Backpropagation
        optimizer.step() # Update weights
        
        running_loss += loss.item() # Accumulate loss for this batch
    print(f"Epoch {epoch+1}, Loss: {running_loss:.4f}")

Epoch 1, Loss: 210.2475
Epoch 2, Loss: 59.3162
Epoch 3, Loss: 41.2711
Epoch 4, Loss: 31.2467
Epoch 5, Loss: 24.5889


# 5. Explanation
## 1. `nn.Conv2d`: 2D Convolution Layer

```
nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0)
```

| Argument       | Description                                        |
|----------------|----------------------------------------------------|
| `in_channels`  | Number of input channels (e.g., 1 for grayscale)   |
| `out_channels` | Number of output channels (filters)                |
| `kernel_size`  | Size of the convolutional kernel (e.g., 3 for 3×3) |
| `stride`       | Step size of the convolution                       |
| `padding`      | Amount of zero-padding around input image          |

The output height `H_out` is calculated as:

```
H_out = floor((H_in + 2P - D(K - 1) - 1) / S + 1)
```

Where:

- `P`: padding  
- `D`: dilation (default = 1)  
- `K`: kernel size  
- `S`: stride

---

## 2. F.relu: Activation Function (ReLU)

F.relu(x)

Formula:

ReLU(x) = max(0, x)

Explanation:

- The ReLU function adds non-linearity to the model. Without non-linear functions, no matter how many layers you stack, the entire network behaves like a single linear transformation. This limits the ability to learn complex patterns.

- ReLU outputs zero for any input less than or equal to zero, and outputs the input itself if it is greater than zero.  
  In other words, negative values are cut off to zero, while positive values pass through unchanged.

- This simple rule makes ReLU computationally efficient because it only requires a comparison and no complex math like exponentials.

- Compared to older activation functions like sigmoid or tanh, ReLU helps reduce the vanishing gradient problem, making it easier for deep networks to learn.

Example:

If the input to ReLU is -3, the output is 0.  
If the input is 3, the output is 3.

This "cutting off" of negative values allows neural networks to model complex data while maintaining efficient training.


---

## 3. `nn.MaxPool2d`: Max Pooling Layer

```
nn.MaxPool2d(kernel_size=2, stride=2)
```

- Reduces spatial dimensions (downsampling)  
- Provides translation invariance  
- Helps reduce overfitting  

---

## 4. `view()`: Flattening Tensors

```
x = x.view(-1, 32 * 7 * 7)
```

- Flattens a 4D tensor into 2D for input into a fully connected layer  
- `-1` automatically infers the batch size  

---

## 5. `nn.Linear`: Fully Connected Layer

```
nn.Linear(in_features, out_features)
```

- Connects all input features to output neurons  
- Often used as the final classification layer  

---

## 6. `nn.CrossEntropyLoss`: Loss Function

```
criterion = nn.CrossEntropyLoss()
```

This function combines **LogSoftmax** and **Negative Log Likelihood Loss**.

Mathematically:

```
L(x, y) = -log( exp(x[y]) / sum_j exp(x[j]) )
```

Where:

- `x`: raw logits from the model  
- `y`: true class index  

---

## 7. Backpropagation Functions

```
optimizer.zero_grad()
loss.backward()
optimizer.step()
```

- `zero_grad()` clears old gradients  
- `backward()` computes new gradients  
- `step()` updates the model parameters  

---

## 8. Loading the MNIST Dataset

```
trainset = torchvision.datasets.MNIST(
    root='./data',
    train=True,
    download=True,
    transform=transforms.ToTensor()
)
```

- Downloads the MNIST dataset  
- Each image is 28×28 grayscale  
- `ToTensor()` converts PIL images to PyTorch tensors  

---

## 9. Full Example: Basic CNN in PyTorch

```python
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Define a simple CNN
class BasicCNN(nn.Module):
    def __init__(self):
        super(BasicCNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 16, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(32 * 7 * 7, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))  # 28x28 -> 14x14
        x = self.pool(F.relu(self.conv2(x)))  # 14x14 -> 7x7
        x = x.view(-1, 32 * 7 * 7)
        x = self.fc1(x)
        return x

# Data loading
transform = transforms.ToTensor()
trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
trainloader = DataLoader(trainset, batch_size=64, shuffle=True)

# Model, loss, optimizer
model = BasicCNN()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Training loop
for epoch in range(1):
    for images, labels in trainloader:
        outputs = model(images)
        loss = criterion(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    print(f"Epoch [{epoch+1}], Loss: {loss.item():.4f}")
```

---

## References and Further Reading

- Andrew Ng, *Machine Learning*, Coursera (Stanford University)  
  [https://www.coursera.org/learn/machine-learning](https://www.coursera.org/learn/machine-learning)

- DeepLearning.AI, *Supervised Machine Learning: Regression and Classification* (Coursera)  
  [https://www.coursera.org/learn/machine-learning](https://www.coursera.org/learn/machine-learning)

- Kaggle, *Intro to Machine Learning Micro-course*  
  [https://www.kaggle.com/learn/intro-to-machine-learning](https://www.kaggle.com/learn/intro-to-machine-learning)

- scikit-learn Documentation, *Supervised Learning*  
  [https://scikit-learn.org/stable/supervised_learning.html](https://scikit-learn.org/stable/supervised_learning.html)

- Stanford University, *CS231n: Convolutional Neural Networks for Visual Recognition*  
  [http://cs231n.stanford.edu/](http://cs231n.stanford.edu/)