# Neural Networks: Implementing Backpropagation

## Code explanation

### Description
This project is a simple implementation of a neural network built from scratch using NumPy. It is designed to demonstrate the core concepts of forward propagation, backpropagation, and gradient descent.

### Structure
**1. Helper Functions:**
- get_activation(name): Returns a tuple containing the specified activation function (func) and its gradient (grad).
- get_loss(name): Returns a tuple containing the loss function (loss) and its gradient (grad).

**2. MyNeuralNetwork Class**
- __init__(): Initializes the network. Sets up layer sizes, randomly initializes weights, sets biases to zero, and prepares the activation/loss functions. It also calls initialize_log_file().
- initialize_log_file(): Creates the CSV log file and writes the header row. The header includes epoch, loss, accuracy, and dynamic columns for the Z-value (pre-activation) and activation-value of each neuron in the hidden layer (e.g., z_hidden_0, a_hidden_0, ...).
- log_training_step(): Appends a new row of data to the CSV file for the current training step.
- forward(X): Performs a forward pass through the network, calculating and storing the hidden layer and output layer values. Returns the final predictions and the hidden layer's Z-values.
- backward(X, y, y_hat, ...): Performs backpropagation to calculate the gradients of the loss with respect to all weights and biases using the chain rule.
- train(X, y, ...): The main training loop. It iterates for the specified number of epochs, performs forward and backward passes, updates the weights and biases, and logs progress to the console (every 100 epochs) and the CSV file (every epoch).

**3. Training Log**
- The output CSV file (default: statistics.csv) provides a complete, epoch-by-epoch trace of the network's internal state.

### How to use it
1. Prepare your training data: The network needs input data (X) and target labels (y).
2. Initialize the Neural Network: Create an instance of the MyNeuralNetwork class. Define your network's architecture and settings.
3. Train the network: Call the .train() method to begin training. Pass your data (X_train, y_train) and specify the epochs (how many times to loop over the data) and the learning_rate.


In [15]:
# Import libraries
import numpy as np
import csv

# Activation and loss functions
def get_activation(name):
    if name == "sigmoid":
        def func(x):
            return 1 / (1 + np.exp(-x))
        def grad(x):
            return x * (1 - x)
    elif name == "relu":
        def func(x):
            return np.maximum(0, x)
        def grad(x):
            return (x > 0).astype(float)
    elif name == "tanh":
        def func(x):
            return np.tanh(x)
        def grad(x):
            return 1 - np.square(x)
    elif name == "softmax":
        def func(x):
            exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
            return exp_x / np.sum(exp_x, axis=1, keepdims=True)
        def grad(x):
            return x * (1 - x)
    else:
        raise ValueError(f"Unknown activation: {name}")
    return func, grad

def get_loss(name):
    if name == "mean_squared_error":
        def loss(y_true, y_pred):
            return np.mean((y_true - y_pred) ** 2)
        def grad(y_true, y_pred):
            # return 2 * (y_pred - y_true) / y_true.size
            return (y_pred - y_true)
    elif name == "cross_entropy":
        def loss(y_true, y_pred):
            eps = 1e-7
            y_pred = np.clip(y_pred, eps, 1 - eps)
            return -np.mean(np.sum(y_true * np.log(y_pred), axis=1))
        def grad(y_true, y_pred):
            # gradient for softmax + cross-entropy simplification
            # return (y_pred - y_true) / y_true.shape[0]
            return (y_pred - y_true)
    else:
        raise ValueError(f"Unknown loss function: {name}")
    return loss, grad

# Define the Neural Network class
class MyNeuralNetwork:

    def __init__(
        self, 
        input_size = 8, 
        hidden_size = 3, 
        output_size = 8, 
        hidden_activation="sigmoid",
        output_activation="softmax",
        loss_function="cross_entropy",
        output_file = 'statistics.csv'
    ):
        
        # Generating the weights
        self.weights_input_hidden = np.random.randn(input_size, hidden_size)
        self.weights_hidden_output = np.random.randn(hidden_size, output_size)

        # Generating the biases
        self.bias_input_hidden = np.zeros((1, hidden_size))
        self.bias_hidden_output = np.zeros((1, output_size))
        
        # Get activation and loss functions
        self.hidden_act, self.hidden_grad = get_activation(hidden_activation)
        self.output_act, self.output_grad = get_activation(output_activation)
        self.loss_func, self.loss_grad = get_loss(loss_function)
        self.configuration = f'{hidden_activation}_{output_activation}_{loss_function}'

        # Defining the output file for the statistics
        self.output_file = output_file

        # Initialize log file
        self.initialize_log_file()

    def initialize_log_file(self):
        # Create headers for the CSV file
        headers = ['epoch', 'loss', 'accuracy', 'configuration']
        
        # Add columns for each hidden neuron's Z and activation
        for i in range(self.weights_input_hidden.shape[1]):  # For each hidden neuron
            headers.extend([f'z_hidden_{i}', f'a_hidden_{i}'])
        
        # Write headers to file
        with open(self.output_file, 'w', newline='') as f:
            writer = csv.writer(f)
            writer.writerow(headers)

    def log_training_step(self, epoch, loss, accuracy, hidden_z, hidden_activations):
        # Prepare row data
        row = [epoch, loss, accuracy, self.configuration]
        # Add Z and activation for each hidden neuron
        for z, a in zip(hidden_z[0], hidden_activations[0]):
            row.extend([z, a])
        
        # Append to CSV
        with open(self.output_file, 'a', newline='') as f:
            writer = csv.writer(f)
            writer.writerow(row)

    def forward(self, X):
        
        # Calculating the hidden layer
        # Store Z values
        self.z_hidden = np.dot(X, self.weights_input_hidden) + self.bias_input_hidden
        # Calculate activation
        self.hidden_layer = self.hidden_act(self.z_hidden)

        # Calculating the output layer
        self.z_output = np.dot(self.hidden_layer, self.weights_hidden_output) + self.bias_hidden_output
        self.output_layer = self.output_act(self.z_output)
        
        return self.output_layer, self.z_hidden

    def backward(self, X, y, y_hat, learning_rate = 0.01):
        m = X.shape[0]

        # ================================= [Output layer error] =================================
        # For loss function we use MSE
        # The loss function derivative with respect to the predicted values y_hat
        dL_dyhat = self.loss_grad(y, y_hat)
        # The derivative of the predicted values with respect to z (W * X + b)
        # is the derivative of sigmoid
        d_act_output = self.output_grad(y_hat)
        # The derivative of the loss function with respect to z 
        # is the product of the loss function derivative and the sigmoid derivative
        # This is the error term
        error_term_output = dL_dyhat * d_act_output

        # The derivative of the loss function with respect to weights
        # is the product of the error term and the hidden layer
        # and then we divide by the number of samples
        dW_hidden_output = np.dot(self.hidden_layer.T, error_term_output) / m
        # The derivative of the loss function with respect to bias
        # is the sum of the error term
        # and then we divide by the number of samples
        db_hidden_output = np.sum(error_term_output, axis=0, keepdims=True) / m

        # ================================= [Hidden layer error] =================================
        # The derivative of the loss function with respect to weights
        
        error_term_hidden = np.dot(error_term_output, self.weights_hidden_output.T) * self.hidden_grad(self.hidden_layer)

        # The derivative of the loss function with respect to weights
        dW_input_hidden = np.dot(X.T, error_term_hidden) / m
        # The derivative of the loss function with respect to bias
        db_input_hidden = np.sum(error_term_hidden, axis=0, keepdims=True) / m

        # ================================= [Update weights and biases] ==========================
        # Update weights and biases
        self.weights_hidden_output -= learning_rate * dW_hidden_output
        self.bias_hidden_output -= learning_rate * db_hidden_output

        self.weights_input_hidden -= learning_rate * dW_input_hidden
        self.bias_input_hidden -= learning_rate * db_input_hidden
        
    def train(self, X, y, epochs=10000, learning_rate=0.01):
        
        for epoch in range(epochs):
            # Forward pass
            y_hat, z_hidden = self.forward(X)

            self.backward(X, y, y_hat, learning_rate)

            if epoch % 100 == 0:
                # Calculate metrics
                loss = self.loss_func(y, y_hat)
                predictions = np.argmax(y_hat, axis=1)
                accuracy = np.mean(predictions == np.argmax(y, axis=1))
                
                # Log to console
                print(f"Epoch {epoch}, Loss: {loss:.4f}, Accuracy: {accuracy:.4f}, Configuration: {self.configuration}")
            
            # Log to CSV
            self.log_training_step(
                epoch=epoch,
                loss=loss,
                accuracy=accuracy,
                hidden_z=z_hidden,
                hidden_activations=self.hidden_layer
            )

## Analysis

In [2]:
# Define parameters
input_size = 8
hidden_size = 3
output_size = 8

X = np.array([
[1, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 1]
])
y = np.array([
[1, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 1]
])

### Sigmoid activation and MSE loss

In [13]:
file_name1 = 'statistics_sigmoid_mse.csv'
# Create and train the neural network
nn = MyNeuralNetwork(input_size=input_size, 
                     hidden_size=hidden_size, 
                     output_size=output_size,
                     hidden_activation="sigmoid",
                     output_activation="sigmoid",
                     loss_function="mean_squared_error",
                     output_file=file_name1
                     )
nn.train(X, y, epochs=10000, learning_rate=0.2)

Epoch 0, Loss: 0.3365, Accuracy: 0.1250, Configuration: sigmoid_sigmoid_mean_squared_error
Epoch 100, Loss: 0.1188, Accuracy: 0.2500, Configuration: sigmoid_sigmoid_mean_squared_error
Epoch 200, Loss: 0.1081, Accuracy: 0.5000, Configuration: sigmoid_sigmoid_mean_squared_error
Epoch 300, Loss: 0.1053, Accuracy: 0.5000, Configuration: sigmoid_sigmoid_mean_squared_error
Epoch 400, Loss: 0.1036, Accuracy: 0.5000, Configuration: sigmoid_sigmoid_mean_squared_error
Epoch 500, Loss: 0.1021, Accuracy: 0.5000, Configuration: sigmoid_sigmoid_mean_squared_error
Epoch 600, Loss: 0.1005, Accuracy: 0.5000, Configuration: sigmoid_sigmoid_mean_squared_error
Epoch 700, Loss: 0.0987, Accuracy: 0.3750, Configuration: sigmoid_sigmoid_mean_squared_error
Epoch 800, Loss: 0.0969, Accuracy: 0.5000, Configuration: sigmoid_sigmoid_mean_squared_error
Epoch 900, Loss: 0.0950, Accuracy: 0.5000, Configuration: sigmoid_sigmoid_mean_squared_error
Epoch 1000, Loss: 0.0931, Accuracy: 0.6250, Configuration: sigmoid_sigmo

### Softmax activation and Cross entropy

In [16]:
file_name2 = 'statistics_softmax_ce.csv'
# Create and train the neural network
nn = MyNeuralNetwork(input_size=input_size, 
                     hidden_size=hidden_size, 
                     output_size=output_size,
                     hidden_activation="sigmoid",
                     output_activation="softmax",
                     loss_function="cross_entropy",
                     output_file=file_name2
                     )
nn.train(X, y, epochs=10000, learning_rate=0.2)

Epoch 0, Loss: 2.1717, Accuracy: 0.1250, Configuration: sigmoid_softmax_cross_entropy
Epoch 100, Loss: 2.0257, Accuracy: 0.2500, Configuration: sigmoid_softmax_cross_entropy
Epoch 200, Loss: 1.9590, Accuracy: 0.2500, Configuration: sigmoid_softmax_cross_entropy
Epoch 300, Loss: 1.9029, Accuracy: 0.2500, Configuration: sigmoid_softmax_cross_entropy
Epoch 400, Loss: 1.8516, Accuracy: 0.2500, Configuration: sigmoid_softmax_cross_entropy
Epoch 500, Loss: 1.8036, Accuracy: 0.5000, Configuration: sigmoid_softmax_cross_entropy
Epoch 600, Loss: 1.7582, Accuracy: 0.5000, Configuration: sigmoid_softmax_cross_entropy
Epoch 700, Loss: 1.7147, Accuracy: 0.5000, Configuration: sigmoid_softmax_cross_entropy
Epoch 800, Loss: 1.6728, Accuracy: 0.5000, Configuration: sigmoid_softmax_cross_entropy
Epoch 900, Loss: 1.6320, Accuracy: 0.5000, Configuration: sigmoid_softmax_cross_entropy
Epoch 1000, Loss: 1.5921, Accuracy: 0.5000, Configuration: sigmoid_softmax_cross_entropy
Epoch 1100, Loss: 1.5528, Accurac

### Comparative

In [None]:
import pandas as pd
import plotly.express as px

# Read log file
df1 = pd.read_csv(file_name1)
df2 = pd.read_csv(file_name2)
df = pd.concat([df1, df2], ignore_index=True)

# Plot training loss over epochs
fig = px.line(
    df,
    x="epoch",
    y="loss",
    color="configuration",
    title="Training Loss over Epochs"
)
fig.update_layout(
    xaxis_title="Epochs",
    yaxis_title="Loss",
    template="plotly_white"
)
fig.show()


In [20]:
# Plot accuracy loss over epochs
fig = px.line(
    df,
    x="epoch",
    y="accuracy",
    color="configuration",
    title="Accuracy over Epochs"
)
fig.update_layout(
    xaxis_title="Epochs",
    yaxis_title="Accuracy",
    template="plotly_white"
)
fig.show()