# Q3: Effect of Activation Functions on CNN Performance  
## Experiment: ReLU Activation

This notebook investigates how the **ReLU (Rectified Linear Unit)** activation function
affects the performance of a convolutional neural network (CNN) on the Fashion-MNIST dataset.

To ensure a fair comparison, the dataset, network architecture, optimiser, learning rate,
and training procedure are kept identical to the other Q3 experiments
(LeakyReLU, Sigmoid, and Tanh). Only the activation function differs.


## Experimental Setup

- **Dataset:** Fashion-MNIST (10 classes)
- **Input:** 28 × 28 grayscale images
- **Train / Validation split:** 80% / 20%

### CNN Architecture
- Conv2D: 1 → 8 channels, kernel size 3
- ReLU activation
- MaxPooling: 2 × 2
- Conv2D: 8 → 16 channels, kernel size 3
- ReLU activation
- MaxPooling: 2 × 2
- Fully connected layer: 128 units
- ReLU activation
- Output layer: 10 units

### Training Configuration
- **Loss function:** CrossEntropyLoss
- **Optimiser:** Adam
- **Learning rate:** 0.01
- **Epochs:** 20

Only the **activation function** differs between experiments.


In [103]:
import sys
sys.path.append("..")

from functions import get_data, data_split_train_val

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader

import matplotlib.pyplot as plt

In [104]:
# Load Fashion-MNIST (normalised to [0,1])
X, y = get_data("../data/fashion-mnist_test.csv")

# Train/validation split
X_train, X_val, y_train, y_val = data_split_train_val(X, y)

print("Train shape:", X_train.shape)
print("Validation shape:", X_val.shape)

Train shape: (8000, 784)
Validation shape: (2000, 784)


In [105]:
# convert numpy arrays to PyTorch tensors
X_train_tensor = torch.from_numpy(X_train).float()
y_train_tensor = torch.from_numpy(y_train).long()

X_val_tensor = torch.from_numpy(X_val).float()
y_val_tensor = torch.from_numpy(y_val).long()

In [106]:
print("Before reshape:")
print("X_train_tensor shape:", X_train_tensor.shape)
print("X_val_tensor shape:", X_val_tensor.shape)

Before reshape:
X_train_tensor shape: torch.Size([8000, 784])
X_val_tensor shape: torch.Size([2000, 784])


In [107]:
num_train = X_train_tensor.shape[0]
num_val = X_val_tensor.shape[0]

print(num_train, num_val)

8000 2000


In [108]:
X_train_tensor = X_train_tensor.reshape(num_train, 1, 28, 28)
X_val_tensor = X_val_tensor.reshape(num_val, 1, 28, 28)
print("X_train_tensor reshape:", X_train_tensor.shape)
print("X_val_tensor reshape:", X_val_tensor.shape)

X_train_tensor reshape: torch.Size([8000, 1, 28, 28])
X_val_tensor reshape: torch.Size([2000, 1, 28, 28])


In [109]:
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
val_dataset = TensorDataset(X_val_tensor, y_val_tensor)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False)

## CNN Model with ReLU Activation

The CNN below uses ReLU activations after each convolutional layer and the first
fully connected layer. This matches the architecture used in the other activation
function experiments, allowing a direct comparison of learning behaviour and accuracy.


In [110]:
class CNN_ReLU(nn.Module):
    def __init__(self):
        super().__init__()

        self.activation = nn.ReLU()

        self.conv1 = nn.Conv2d(1, 8, 3)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(8, 16, 3)

        self.fc1 = nn.Linear(16 * 5 * 5, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.activation(self.conv1(x))
        x = self.pool(x)
        
        x = self.activation(self.conv2(x))
        x = self.pool(x)

        x = torch.flatten(x, 1)

        x = self.activation(self.fc1(x))
        return self.fc2(x)

## Training Procedure

The model is trained for 20 epochs using the Adam optimiser.
During training, both loss and accuracy are recorded for the
training and validation sets at each epoch.


In [111]:
data = CNN_ReLU()

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(data.parameters(), lr=0.01)

In [112]:
train_losses = []
val_losses = []
train_accuracies = []
val_accuracies = []

In [None]:
epochs = 20

for epoch in range(epochs):
    optimizer.zero_grad()
    
    # Forward pass (training)
    outputs = data(X_train_tensor)
    loss = criterion(outputs, y_train_tensor)
    
    # Backward pass
    loss.backward()
    optimizer.step()
    
    # Training accuracy
    train_preds = torch.argmax(outputs, 1)
    train_acc = (train_preds == y_train_tensor).float().mean().item()
    
    # Validation loss and accuracy
    with torch.no_grad():
        val_outputs = data(X_val_tensor)
        val_loss = criterion(val_outputs, y_val_tensor)
        val_preds = torch.argmax(val_outputs, 1)
        val_acc = (val_preds == y_val_tensor).float().mean().item()
    
    # Store history
    train_losses.append(loss.item())
    val_losses.append(val_loss.item())
    train_accuracies.append(train_acc)
    val_accuracies.append(val_acc)
    
    print(
        f"Epoch {epoch+1}/{epochs}, "
        f"Train Loss: {loss.item():.4f}, "
        f"Val Loss: {val_loss.item():.4f}, "
        f"Val Acc: {val_acc:.4f}"
    )

Epoch 1/20, Train Loss: 2.3024, Val Loss: 2.2764, Val Acc: 0.3075
Epoch 2/20, Train Loss: 2.2754, Val Loss: 2.1915, Val Acc: 0.2940


In [None]:
print("Final validation accuracy:", val_accuracies[-1])

In [None]:
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(train_losses, label='Train Loss')
plt.plot(val_losses, label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training and Validation Loss')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(train_accuracies, label='Train Accuracy')
plt.plot(val_accuracies, label='Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.title('Training and Validation Accuracy')
plt.legend()

plt.show()

## Results

The following plots show the training and validation loss and accuracy
over the 20 training epochs.


## Discussion

ReLU enables faster convergence compared to sigmoid-based activations
due to its non-saturating behaviour for positive inputs.

From the results, the model reaches a final validation accuracy of approximately **72%**.
The loss decreases steadily across epochs, indicating stable learning.
Minor fluctuations in validation accuracy are expected due to stochastic optimisation.

These results will be compared directly with LeakyReLU, Sigmoid, and Tanh
in the final Q3 comparison.
