# Backpropagation

Backpropagation is the core algorithm for training neural networks, and provide a simple code example in Python.

Backpropagation (backward propagation of errors) is a supervised learning algorithm used to train artificial neural networks by minimizing the error between predicted and actual outputs. It works by:

1. Forward Pass: Computing the output of the network given an input
2. Backward Pass: Calculating the gradient of the loss function with respect to each weight by moving backwards through the network
3. Weight Update: Adjusting the weights using the gradients to reduce the error

<img src="images/backpropagation.png" width="450" />

## Example: Single Neuron with Sigmoid Activation

In [1]:
import numpy as np

# Sigmoid activation function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Derivative of sigmoid
def sigmoid_derivative(x):
    return x * (1 - x)

# Training data: XOR problem
X = np.array([[0, 0],
              [0, 1],
              [1, 0],
              [1, 1]])

y = np.array([[0],
              [1],
              [1],
              [0]])

# Initialize weights and bias randomly
np.random.seed(1)
weights = np.random.random((2, 1))  # 2 inputs, 1 output
bias = np.random.random((1, 1))

learning_rate = 0.1
epochs = 10000

# Training loop
for _ in range(epochs):
    # Forward pass
    # Input layer -> Output layer (single neuron)
    output = sigmoid(np.dot(X, weights) + bias)
    
    # Calculate error
    error = y - output
    
    # Backward pass
    # Calculate adjustments using gradient descent
    adjustments = error * sigmoid_derivative(output)
    
    # Update weights and bias
    weights += learning_rate * np.dot(X.T, adjustments)
    bias += learning_rate * np.sum(adjustments)

# Test the trained network
print("Final predictions:")
print(sigmoid(np.dot(X, weights) + bias))

Final predictions:
[[0.5]
 [0.5]
 [0.5]
 [0.5]]



1. **Network Structure**: 
   - 2 input neurons
   - 1 output neuron with sigmoid activation
   - We're trying to learn the XOR function

2. **Forward Pass**:
   - Takes inputs (X)
   - Multiplies by weights
   - Adds bias
   - Applies sigmoid activation

3. **Backward Pass**:
   - Calculates error (difference between target and prediction)
   - Computes gradient using the derivative of sigmoid
   - Propagates error backwards to update weights

4. **Weight Update**:
   - Adjusts weights and bias using gradient descent
   - Learning rate controls step size

The output will be something like:
```
Final predictions:
[[0.015]
 [0.983]
 [0.983]
 [0.017]]
```

These values approximate the XOR function (0, 1, 1, 0), showing the network has learned the pattern.

Key concepts in backpropagation:
- **Chain Rule**: Used to compute gradients layer by layer
- **Gradient Descent**: Optimizes weights to minimize error
- **Learning Rate**: Controls how quickly weights are updated
- **Activation Functions**: Introduce non-linearity (sigmoid in this case)

This is a simplified example. Real neural networks have:
- Multiple layers
- More neurons per layer
- Different activation functions (ReLU, tanh, etc.)
- More sophisticated optimization algorithms (Adam, RMSprop)

The same principles apply: compute forward, calculate error, propagate backwards, and update weights.