# CNN From Scratch using NumPy

**Goal:** Build and train a simple Convolutional Neural Network (CNN) from scratch (no deep learning libraries) using a toy image dataset.


## 1. Import Required Library

We only use **NumPy** to understand the core mechanics.

In [1]:
import numpy as np

## 2. Generate Toy Dataset

We create 8×8 grayscale images:
- Class 0: vertical line
- Class 1: horizontal line

In [2]:

def generate_toy_data(n_samples=200):
    X = []
    y = []

    for _ in range(n_samples):
        img = np.zeros((8, 8))

        if np.random.rand() > 0.5:
            col = np.random.randint(1, 7)
            img[:, col] = 1
            y.append(0)
        else:
            row = np.random.randint(1, 7)
            img[row, :] = 1
            y.append(1)

        X.append(img)

    return np.array(X), np.array(y)

X, y = generate_toy_data()


## 3. Convolution Layer

Performs a **sliding window dot-product** using a 3×3 filter.

In [3]:

class Conv2D:
    def __init__(self, filter_size=3):
        self.filter = np.random.randn(filter_size, filter_size) * 0.1

    def forward(self, x):
        self.last_x = x
        h, w = x.shape
        f = self.filter.shape[0]
        out = np.zeros((h - f + 1, w - f + 1))

        for i in range(out.shape[0]):
            for j in range(out.shape[1]):
                region = x[i:i+f, j:j+f]
                out[i, j] = np.sum(region * self.filter)
        return out

    def backward(self, d_out, lr):
        f = self.filter.shape[0]
        d_filter = np.zeros_like(self.filter)

        for i in range(d_out.shape[0]):
            for j in range(d_out.shape[1]):
                region = self.last_x[i:i+f, j:j+f]
                d_filter += d_out[i, j] * region

        self.filter -= lr * d_filter


## 4. ReLU Activation

Introduces non-linearity by zeroing negative values.

In [4]:

class ReLU:
    def forward(self, x):
        self.mask = x > 0
        return x * self.mask

    def backward(self, d_out):
        return d_out * self.mask


## 5. Fully Connected Layer

Maps extracted features to class scores.

In [5]:

class Dense:
    def __init__(self, in_features, out_features):
        self.W = np.random.randn(in_features, out_features) * 0.1
        self.b = np.zeros(out_features)

    def forward(self, x):
        self.x = x
        return x @ self.W + self.b

    def backward(self, d_out, lr):
        dW = self.x[:, None] @ d_out[None, :]
        db = d_out
        dx = self.W @ d_out

        self.W -= lr * dW
        self.b -= lr * db
        return dx


## 6. Softmax & Loss

Softmax converts scores to probabilities.
Cross-entropy measures classification loss.

In [6]:

def softmax(x):
    e = np.exp(x - np.max(x))
    return e / np.sum(e)

def cross_entropy(pred, target):
    return -np.log(pred[target] + 1e-9)


## 7. Training the CNN

Training includes:
- Forward pass
- Loss computation
- Backward pass (gradient descent)

In [7]:

conv = Conv2D()
relu = ReLU()
fc = Dense(36, 2)

lr = 0.01
epochs = 20

for epoch in range(epochs):
    loss = 0
    correct = 0

    for i in range(len(X)):
        out = conv.forward(X[i])
        out = relu.forward(out)
        out = out.flatten()
        out = fc.forward(out)
        probs = softmax(out)

        loss += cross_entropy(probs, y[i])
        if np.argmax(probs) == y[i]:
            correct += 1

        grad = probs
        grad[y[i]] -= 1

        grad = fc.backward(grad, lr)
        grad = grad.reshape(6, 6)
        grad = relu.backward(grad)
        conv.backward(grad, lr)

    print(f"Epoch {epoch+1}, Loss: {loss/len(X):.4f}, Accuracy: {correct/len(X):.2f}")


Epoch 1, Loss: 0.6955, Accuracy: 0.53
Epoch 2, Loss: 0.6750, Accuracy: 0.59
Epoch 3, Loss: 0.6017, Accuracy: 0.85
Epoch 4, Loss: 0.4403, Accuracy: 0.96
Epoch 5, Loss: 0.3057, Accuracy: 0.96
Epoch 6, Loss: 0.2369, Accuracy: 0.96
Epoch 7, Loss: 0.2021, Accuracy: 0.96
Epoch 8, Loss: 0.1830, Accuracy: 0.96
Epoch 9, Loss: 0.1716, Accuracy: 0.96
Epoch 10, Loss: 0.1645, Accuracy: 0.96
Epoch 11, Loss: 0.1599, Accuracy: 0.96
Epoch 12, Loss: 0.1567, Accuracy: 0.96
Epoch 13, Loss: 0.1544, Accuracy: 0.96
Epoch 14, Loss: 0.1528, Accuracy: 0.96
Epoch 15, Loss: 0.1516, Accuracy: 0.96
Epoch 16, Loss: 0.1508, Accuracy: 0.96
Epoch 17, Loss: 0.1501, Accuracy: 0.96
Epoch 18, Loss: 0.1495, Accuracy: 0.96
Epoch 19, Loss: 0.1491, Accuracy: 0.96
Epoch 20, Loss: 0.1488, Accuracy: 0.96


## 8. Summary

- You implemented a CNN **entirely from scratch**
- Learned how convolution, ReLU, dense layers and backprop work
- Ideal for **conceptual clarity and exams** ✅

**Next steps:** Add pooling, multiple filters or visualize feature maps.

In [9]:
# Add pooling layer and improve training loopclass MaxPool2D:
def __init__(self, pool_size=2, stride=2):
    self.pool_size = pool_size
    self.stride = stride
    
def forward(self, input):
    self.input = input
    batch_size, channels, height, width = input.shape
    out_height = (height - self.pool_size) // self.stride + 1
    out_width = (width - self.pool_size) // self.stride + 1
    output = np.zeros((batch_size, channels, out_height, out_width))
    
    for b in range(batch_size):
        for c in range(channels):
            for i in range(out_height):
                for j in range(out_width):
                    h_start = i * self.stride
                    h_end = h_start + self.pool_size
                    w_start = j * self.stride
                    w_end = w_start + self.pool_size
                    output[b, c, i, j] = np.max(input[b, c, h_start:h_end, w_start:w_end])
    
    return output
