
# Deep Dive into Neural Networks: Understanding Forward and Backward Propagation with MLX

This notebook is designed to help you achieve an expert-level understanding of neural networks, particularly focusing on the detailed calculations that occur during the forward and backward propagation processes. We will explore each step involved in these processes, breaking down complex operations into understandable parts.

## What You Will Learn
- **How a Neural Network Processes Inputs**: Understand the transformation from input to output in the forward pass.
- **How Gradients are Computed and Used**: Dive deep into the backpropagation algorithm to see how the network learns.
- **Detailed Mathematical Breakdown**: Manual calculations for each step to solidify your understanding.



# Understanding Neural Networks with MLX
In this notebook, we will explore how a simple neural network works using the MLX framework. We will walk through the process of:
- Initializing the network with weights and biases.
- Performing a forward pass to compute predictions.
- Calculating the loss.
- Performing backpropagation to compute gradients.
- Updating the weights and biases to reduce the loss.

Let's get started!


In [1]:

import mlx.core as mx
import mlx.nn as nn



## Defining the Neural Network
We will define a simple neural network with one hidden layer. The network will have the following architecture:
- Input layer: Takes in the input features.
- Hidden layer: Fully connected layer with a ReLU activation function.
- Output layer: Fully connected layer that outputs the prediction.

We will also define the weights and biases manually for clarity.


In [2]:

class SimpleNN(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        
        # Manually initialize weights and biases for clarity
        self.weights1 = mx.random.uniform(-1, 1, (input_dim, hidden_dim))  # Weights from input to hidden layer
        self.bias1 = mx.zeros((hidden_dim,))  # Bias for hidden layer

        self.weights2 = mx.random.uniform(-1, 1, (hidden_dim, output_dim))  # Weights from hidden to output layer
        self.bias2 = mx.zeros((output_dim,))  # Bias for output layer

    def forward(self, x):
        # Forward pass: Compute hidden layer activations
        z1 = mx.matmul(x, self.weights1) + self.bias1
        a1 = mx.maximum(z1, 0)  # ReLU activation

        # Forward pass: Compute output layer activations
        z2 = mx.matmul(a1, self.weights2) + self.bias2
        return z2  # No activation function in output (e.g., for regression)

# Initialize the network
input_dim = 2  # Number of input features
hidden_dim = 3  # Number of neurons in the hidden layer
output_dim = 1  # Number of output neurons

model = SimpleNN(input_dim, hidden_dim, output_dim)


In [8]:
print(model)

array([[-0.0746283, -0.750631, 0.238348],
       [-0.233977, -0.885689, -0.267174]], dtype=float32)



## Performing the Forward Pass
Next, we'll define a simple input and perform the forward pass through the network. We'll manually calculate the activations at each layer to understand how the network processes inputs.


In [3]:

# Define a simple input (e.g., a batch of 1 sample with 2 features)
x = mx.array([[1.0, 2.0]])

# Perform the forward pass
output = model.forward(x)
print("Output of the network:", output)


Output of the network: array([[0]], dtype=float32)



## Manual Calculation of the Forward Pass
Let's manually calculate the activations step by step to see how the input moves through the network.


In [4]:

# Hidden layer calculation
z1_manual = mx.matmul(x, model.weights1) + model.bias1
a1_manual = mx.maximum(z1_manual, 0)  # ReLU activation

# Output layer calculation
z2_manual = mx.matmul(a1_manual, model.weights2) + model.bias2

print("Hidden layer activations (z1):", z1_manual)
print("Hidden layer activations after ReLU (a1):", a1_manual)
print("Output layer activations (z2):", z2_manual)


Hidden layer activations (z1): array([[-0.542583, -2.52201, -0.296]], dtype=float32)
Hidden layer activations after ReLU (a1): array([[0, 0, 0]], dtype=float32)
Output layer activations (z2): array([[0]], dtype=float32)



## Defining a Loss Function
We will use a simple Mean Squared Error (MSE) loss function to measure the difference between the network's predictions and the true target values.


In [5]:

# Define a simple target output for the loss calculation
y_true = mx.array([[3.0]])

# Define the MSE loss function
def mse_loss(y_pred, y_true):
    return mx.mean(mx.square(y_pred - y_true))

# Calculate the loss
loss = mse_loss(output, y_true)
print("Loss:", loss)


Loss: array(9, dtype=float32)



## Backpropagation and Gradient Calculation
Now, we'll perform backpropagation to compute the gradients of the loss with respect to the network's weights and biases. These gradients will tell us how to adjust the parameters to reduce the loss.


In [6]:

# Define the function that computes loss and gradients
def loss_and_grad_fn(x, y_true):
    y_pred = model.forward(x)  # Forward pass
    loss = mse_loss(y_pred, y_true)  # Loss calculation
    return loss

# Compute loss and gradients using nn.value_and_grad
loss, grads = nn.value_and_grad(model, loss_and_grad_fn)(x, y_true)

# Extract gradients for weights and biases
grads_w1 = grads["weights1"]
grads_b1 = grads["bias1"]
grads_w2 = grads["weights2"]
grads_b2 = grads["bias2"]

print("Gradients for weights1:", grads_w1)
print("Gradients for bias1:", grads_b1)
print("Gradients for weights2:", grads_w2)
print("Gradients for bias2:", grads_b2)


Gradients for weights1: array([[0, 0, 0],
       [0, 0, 0]], dtype=float32)
Gradients for bias1: array([0, -0, -0], dtype=float32)
Gradients for weights2: array([[0],
       [0],
       [0]], dtype=float32)
Gradients for bias2: array([-6], dtype=float32)
