# Convolutional Neural Networks (CNNs) - Basics

# 1. What is a CNN?

A Convolutional Neural Network (CNN) is a type of deep learning model especially effective for image-related tasks.

**Input:**  
Labeled dataset of images `x` and target values `y`.  
(Input data consists of images `x` with corresponding labels or target values `y`.)

**Goal:**  
Learn a mapping function `f(x; θ)` that minimizes prediction error using a loss function `L`.  
(The goal is to find a function `f`, parameterized by `θ`, which maps input images `x` to outputs as close as possible to the true labels `y`. The model learns by minimizing the difference between predictions and true labels, measured by a loss function `L`.)

**Training:**  
Optimizes model parameters `θ` by minimizing the empirical loss:

D = { (x₁, y₁), (x₂, y₂), ..., (x_N, y_N) }

min_θ (1/N) * Σᵢ=1ⁿ L(f(xᵢ; θ), yᵢ)

(During training, the model adjusts its parameters `θ` to minimize the average loss over the dataset, where `N` is the number of samples.)

---

## 2. Key Layers in a CNN

- **Conv2D Layer**: Applies filters (kernels) to extract features.
- **ReLU**: Applies non-linearity (activation function).
- **MaxPooling**: Downsamples feature maps to reduce spatial size.
- **Fully Connected Layer**: Final classifier that outputs prediction scores.

---

## 3. Simple CNN Architecture Example

```text
Input (e.g., 28x28 image)
↓
Conv2D → ReLU
↓
MaxPooling
↓
Conv2D → ReLU
↓
MaxPooling
↓
Flatten
↓
Fully Connected → Softmax

```
---

## 4. Sample Use Case: MNIST Digit Classification
cnn_basic.py for a runnable PyTorch implementation.

In [10]:
%pip install torch torchvision

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.0 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [9]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Define a simple CNN
class BasicCNN(nn.Module):
    """A simple Convolutional Neural Network for MNIST classification."""
    def __init__(self):
        super(BasicCNN, self).__init__() # Initialize the parent class
        self.conv1 = nn.Conv2d(1, 16, kernel_size=3, padding=1) # Input channels = 1 (grayscale), Output channels = 16
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, padding=1) # Input channels = 16, Output channels = 32
        self.pool = nn.MaxPool2d(2, 2) # 2x2 max pooling
        self.fc1 = nn.Linear(32 * 7 * 7, 128) # Fully connected layer
        self.fc2 = nn.Linear(128, 10) # Output layer for 10 classes (digits 0-9)

    def forward(self, x):
        """ Defines the forward pass of the CNN."""
        x = self.pool(F.relu(self.conv1(x)))  # [batch, 16, 14, 14]
        x = self.pool(F.relu(self.conv2(x)))  # [batch, 32, 7, 7]
        x = x.view(-1, 32 * 7 * 7)            # flatten - vectorize the output
        x = F.relu(self.fc1(x))               # [batch, 128]
        x = self.fc2(x)                       # [batch, 10]
        return x

# Load MNIST dataset
transform = transforms.Compose([transforms.ToTensor()]) # Convert images to PyTorch tensors
trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform) # Training set
trainloader = DataLoader(trainset, batch_size=64, shuffle=True) # DataLoader for batching

# Instantiate model, loss, and optimizer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Use GPU if available
model = BasicCNN().to(device) # Move model to device
criterion = nn.CrossEntropyLoss() # Loss function for multi-class classification 
optimizer = torch.optim.Adam(model.parameters(), lr=0.001) # Optimizer for training

# Training loop
for epoch in range(5):  # 5 epochs
    running_loss = 0.0
    for images, labels in trainloader:
        images, labels = images.to(device), labels.to(device) # Move data to device
        
        optimizer.zero_grad() # Zero the gradients
        outputs = model(images) # Forward pass
        loss = criterion(outputs, labels) # Compute loss 
        loss.backward() # Backpropagation
        optimizer.step() # Update weights
        
        running_loss += loss.item() # Accumulate loss for this batch
    print(f"Epoch {epoch+1}, Loss: {running_loss:.4f}")

Epoch 1, Loss: 210.2475
Epoch 2, Loss: 59.3162
Epoch 3, Loss: 41.2711
Epoch 4, Loss: 31.2467
Epoch 5, Loss: 24.5889


## 5. Math Concepts

### Conv2D Output Size

```
H_out = floor((H_in + 2P - D(K - 1) - 1) / S + 1)
```
 
- **H_in**: Input height (or width) of the image or feature map  
- **P (padding)**: Number of pixels added around the border of the input  
- **D (dilation)**: Spacing between kernel elements, usually 1 (no dilation)  
- **K (kernel size)**: Size of the convolutional filter (e.g., 3 means 3×3)  
- **S (stride)**: Step size for moving the filter across the input  

This formula calculates the size of the output after convolution.

### ReLU

```
F.relu(x) = max(0, x)
```

- Outputs zero if input is less than or equal to zero, otherwise outputs the input.  
- Adds non-linearity so the network can learn complex patterns.


### CrossEntropyLoss

```
L(x, y) = -log(exp(x[y]) / sum_j exp(x[j]))
```

- **x**: Raw output scores (logits) from the model for each class  
- **y**: True class label index  

CrossEntropyLoss measures the difference between predicted probabilities and true labels.

In PyTorch:

```python
criterion = nn.CrossEntropyLoss()
loss = criterion(outputs, labels)
```

---

## 6. Hyperparameters

- **Learning rate**  
  Controls how much the model's weights are updated during training.  
  - Typical values: `0.001`, `0.01`, `0.0001`  
  - **If too large**: Training may become unstable or diverge (loss jumps around).  
  - **If too small**: Training will be very slow and may get stuck in suboptimal solutions.

- **Batch size**  
  Number of training samples used to compute the gradient in one iteration.  
  - Typical values: `32`, `64`, `128`  
  - **Large batch size**: Faster training per epoch but may lead to worse generalization. Requires more memory.  
  - **Small batch size**: Noisier gradient estimates, which can help escape local minima but slower training.

- **Epochs**  
  Number of full passes through the entire training dataset.  
  - Typical values: `5`, `10`, `50`  
  - **Too few epochs**: Underfitting (model doesn’t learn enough).  
  - **Too many epochs**: Overfitting (model memorizes training data, poor on unseen data).

- **Weight decay (L2 regularization)**  
  Adds a penalty to large weights to reduce overfitting.  
  - Typical values: `1e-5`, `1e-4`, `1e-3`  
  - **Higher weight decay**: Stronger regularization, can prevent overfitting but may underfit.  
  - **Lower weight decay**: Less regularization, may overfit.

- **Dropout**  
  Randomly disables neurons during training to prevent over-reliance on any one feature.  
  - Typical dropout rates: `0.1` (10%), `0.3`, `0.5` (50%)  
  - **Higher dropout**: Stronger regularization, may slow down learning.  
  - **Lower dropout**: Less regularization, risk of overfitting.

- **Data augmentation**  
  Artificially increases dataset size by applying random transformations such as flipping, rotation, or cropping.  
  - Helps model generalize better by seeing more varied examples.  
  - Over-aggressive augmentation may create unrealistic samples that confuse the model.

  

---

## 7. Other Activation Functions

| Function    | Formula                         | Notes                         |
|-------------|----------------------------------|-------------------------------|
| Sigmoid     | `1 / (1 + exp(-x))`             | Can cause vanishing gradients |
| Tanh        | `(e^x - e^-x)/(e^x + e^-x)`     | Zero-centered                 |
| ReLU        | `max(0, x)`                     | Efficient, may die on 0       |
| LeakyReLU   | `max(αx, x)`                    | Prevents dying ReLU           |

---

## 8. Evaluation

- **Accuracy**: correct predictions / total samples  
- **Precision, Recall, F1-score**: useful for imbalanced datasets  
- **Confusion Matrix**: shows true/false positives/negatives  

---

## 9. Transfer Learning (Optional)

Use pretrained models like **ResNet**:

```python
from torchvision import models  
model = models.resnet18(pretrained=True)  

# Freeze all layers
for param in model.parameters():  
    param.requires_grad = False  

# Replace the final fully connected layer
model.fc = nn.Linear(512, 10)
```

## References and Further Reading

- Andrew Ng, *Machine Learning*, Coursera (Stanford University)  
  [https://www.coursera.org/learn/machine-learning](https://www.coursera.org/learn/machine-learning)

- DeepLearning.AI, *Supervised Machine Learning: Regression and Classification* (Coursera)  
  [https://www.coursera.org/learn/machine-learning](https://www.coursera.org/learn/machine-learning)

- Kaggle, *Intro to Machine Learning Micro-course*  
  [https://www.kaggle.com/learn/intro-to-machine-learning](https://www.kaggle.com/learn/intro-to-machine-learning)

- scikit-learn Documentation, *Supervised Learning*  
  [https://scikit-learn.org/stable/supervised_learning.html](https://scikit-learn.org/stable/supervised_learning.html)

- Stanford University, *CS231n: Convolutional Neural Networks for Visual Recognition*  
  [http://cs231n.stanford.edu/](http://cs231n.stanford.edu/)

- PyTorch, *Official Documentation*  
  [https://pytorch.org](https://pytorch.org)

- Yann LeCun, *The MNIST Database of Handwritten Digits*  
  [http://yann.lecun.com/exdb/mnist/](http://yann.lecun.com/exdb/mnist/)