Step-by-Step Breakdown of Backpropagation

Let’s break it into four main steps:

1️⃣ Forward Propagation:

Compute the output using the current weights & biases

Calculate the loss (error) between predicted and actual values

2️⃣ Compute the Gradient (Partial Derivatives):

Use the Chain Rule to find how much each weight contributed to the error

3️⃣ Backpropagate the Error:

Pass the gradients backward through the network to adjust weights

4️⃣ Update Weights & Biases:

Use Gradient Descent to update weights and minimize the error


Mathematical Formulation
Let's assume a simple one-layer neural network:

ŷ =f(WX+b)
Where:

X = Input

W = Weights

b = Bias

f = Activation function

​	
ŷ  = Predicted output

single-layer neural network to demonstrate how backpropagation works.

Here’s how we’ll break it down:

Create a simple dataset.

Define a simple model with a single layer.

Use Mean Squared Error (MSE) as the loss function.

Backpropagate the error and update the weights.

In [8]:
import torch

import torch.nn as nn

import torch.optim as optim

In [9]:
# Step 1: Create a simple dataset

X = torch.tensor([[0.0,0.0] ,[0.0,1.0],[1.0,0.0],[1.0,1.0]], dtype=torch.float32)

y = torch.tensor([[0.0],[1.0],[1.0],[0.0]], dtype=torch.float32)

In [10]:
# Step 2: Define a simple neural network model (1 layer with Sigmoid activation)

class simpleNN(nn.Module):

    def __init__(self):

        super(simpleNN,self).__init__()
        self.fc = nn.Linear(2,1) # 2 input features, 1 output
        self.sigmoid = nn.Sigmoid()

    
    def forward(self, x):

        x = self.fc(x)
        x = self.sigmoid(x)

        return x

In [11]:
# Instantiate the model

model = simpleNN()

In [12]:
# Step 3: Define a loss function (MSE) and an optimizer (SGD)

loss_fn = nn.MSELoss()

optimizer = optim.SGD(model.parameters(), lr=0.1)

In [13]:
# Step 4: Train the model (implementing backpropagation)

epochs = 10000 # Number of training iterations

for epoch in range(epochs):

    # Forward pass: Compute predicted y by passing X to the model

    predictions = model(X)

    # Compute the loss (error)

    loss = loss_fn(predictions, y)


    # Zero the gradients before backward pass

    optimizer.zero_grad()

    # Backward pass: Compute gradients

    loss.backward()

    # Update weights and biases

    optimizer.step()

    # Print loss every 1000 epochs to monitor progress

    if epoch % 1000==0:

        print(f'Epoch : {epoch + 1}/ {epochs}, Loss : {loss.item()}')

Epoch : 1/ 10000, Loss : 0.2904219627380371
Epoch : 1001/ 10000, Loss : 0.2500074505805969
Epoch : 2001/ 10000, Loss : 0.25000008940696716
Epoch : 3001/ 10000, Loss : 0.25
Epoch : 4001/ 10000, Loss : 0.2499999850988388
Epoch : 5001/ 10000, Loss : 0.25
Epoch : 6001/ 10000, Loss : 0.2499999850988388
Epoch : 7001/ 10000, Loss : 0.25
Epoch : 8001/ 10000, Loss : 0.25
Epoch : 9001/ 10000, Loss : 0.25


In [14]:
# Step 5: Final predictions after training

with torch.no_grad(): # Disable gradient computation for inference

    final_prediction = model(X)

    print('\n Final Predictions:')

    print(final_prediction)


 Final Predictions:
tensor([[0.5000],
        [0.5000],
        [0.5000],
        [0.5000]])
