<!-- Dense Layer -->
A dense layer is a fully connected layer, a neural network layer where each input neuron is connected to every output neuron. It contains learnable parameters - weights and biases - that are updated during training to capture patterns in the data. Dense layers are commonly used in feedforward NNs and are often followed by activation functions to introduce non linearity.

<!-- Activation Function -->
function that introduces non linearity and make neural network able to learn curves in data instead of just linear.

<!-- Forward and Backward pass -->
Forward pass - feeding input data into the network where each layer applies a transformation z = Wx+b followed by an activation function a = f(z) continuing layer by layer until we get the final output(prediction)(inference)

Backward pass - after forward pass, we compute the loss and backward pass uses backpropagation to compute the derivatives of the loss wrt to each weight and bias, done by using chain rule of calculus layer by layer in reverse.

In [1]:
import numpy as np

np.random.seed(2)
X = np.random.randn(4, 2)
y = np.array([[0], [1], [0], [1]])

W1 = np.random.randn(2, 2) * 0.01
b1 = np.zeros((1,2))
W2 = np.random.randn(2, 1) * 0.01
b2 = np.zeros((1,1))

lr = 0.1

def sigmoid(z):
    return 1/(1+np.exp(-z))

def relu(z):
    return np.maximum(0, z)

def relu_deriv(z):
    return (z > 0).astype(float)

for epoch in range(1000):

    z1 = X @ W1 + b1
    a1 = relu(z1)
    z2 = a1 @ W2 + b2
    y_hat = sigmoid(z2)

    m = y.shape[0]
    loss = -np.mean(y * np.log(y_hat + 1e-8) + (1-y) * np.log(1-y_hat + 1e-8))

    dz2 = y_hat - y
    dW2 = (a1.T @ dz2) / m
    db2 = np.sum(dz2, axis=0, keepdims=True) /m

    da1 = dz2 @ W2.T
    dz1 = da1 * relu_deriv(z1)
    dW1 = (X.T @ dz1) / m
    db1 = np.sum(dz1, axis=0, keepdims=True) /m 

    W1 -= lr * dW1
    b1 -= lr*db1
    W2 -= lr * dW2
    b2 -= lr*db2

    if epoch % 200 == 0:
        print(f"Epoch {epoch}, Loss: {loss:.4f}")




Epoch 0, Loss: 0.6932
Epoch 200, Loss: 0.5338
Epoch 400, Loss: 0.4804
Epoch 600, Loss: 0.4786
Epoch 800, Loss: 0.4781
