<a href="https://colab.research.google.com/github/xzo1d/Softmax_ReLU_visualization/blob/main/SoftMax_Training_visualization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Visualization of Neural Network Training Process and Decision Boundaries

## Project Description

This project demonstrates the training process of a simple neural network for classifying a 3D dataset and visualizes the evolution of the network's decision boundaries during training.

**Key project stages:**

1.  **3D Dataset Generation:** Creation of a synthetic 3D dataset with three classes. The dataset can be linearly or non-linearly separable, with varying numbers of points and class shapes to demonstrate different training scenarios.
2.  **Neural Network Implementation:** Building a simple feedforward neural network with one hidden layer using the NumPy library.
3.  **Neural Network Training:** Training the neural network on the generated dataset using the ADAM optimizer. Network parameters are saved at each epoch during training for subsequent animation.
4.  **Visualization:**
    *   Displaying the initial 3D dataset.
    *   Creating an animation that shows how the network's decision boundaries change over training epochs.
    *   Visualizing the final decision boundaries after training completion.

## Neural Network Architecture

The neural network used is a simple feedforward network with one hidden layer.

*   **Input Layer:** Receives 3D input data (3 features).
*   **Hidden Layer:** Consists of 25 neurons (the number can be adjusted). Uses the ReLU activation function (`max(0, x)`).
*   **Output Layer:** Has a number of neurons equal to the number of classes in the dataset (3 classes in this project). Uses the Softmax activation function to obtain class probabilities.

**Schematic network structure:**

Input Layer (3 neurons) -> Hidden Layer (25 neurons, ReLU) -> Output Layer (3 neurons, Softmax)

The neural network is trained to minimize classification error using cross-entropy loss and the ADAM optimizer.


<!-- Insert your neural network architecture diagram image here -->

In [None]:
# This code cell generates a 3D dataset with nested arc shapes, suitable for visualizing how a neural network learns to classify non-linearly separable data.
# The `create_3d_dataset_complex` function is used to create the data points based on specified parameters like the number of points per class, radii of the arcs, and the amount of noise.
# The generated dataset is then visualized using Plotly to show the distribution of the three classes in 3D space.
import numpy as np
# Remove matplotlib import
# import matplotlib.pyplot as plt
import plotly.graph_objects as go # Import plotly

def create_3d_dataset_complex(n_points_per_class=[200, 200, 200], radii=[5, 10, 15], noise=0.5):
    """
    Generates a 3D dataset with classes shaped like nested arcs.

    Args:
        n_points_per_class: A list of the number of points for each class.
        radii: A list of radii for each arc.
        noise: The amount of noise to add to the data points.

    Returns:
        A tuple containing:
            - X: NumPy array of shape (n_points, 3) representing the features.
            - y: NumPy array of shape (n_points,) representing the labels.
    """
    n_classes = len(n_points_per_class)
    if n_classes != len(radii):
        raise ValueError("Number of classes and radii must match.")

    n_points = sum(n_points_per_class)
    X = np.zeros((n_points, 3))
    y = np.zeros(n_points, dtype='uint8')

    current_idx = 0

    for i in range(n_classes):
        num_points = n_points_per_class[i]
        radius = radii[i]
        start_idx = current_idx
        end_idx = current_idx + num_points

        # Generate points on an arc in the XZ plane and extend to 3D
        theta = np.linspace(0, np.pi, num_points) # Arc from 0 to pi
        x = radius * np.cos(theta)
        z = radius * np.sin(theta)
        y_coords = np.random.randn(num_points) * noise # Add noise in the y-direction

        X[start_idx:end_idx, 0] = x + np.random.randn(num_points) * noise * 0.5 # Add some noise in x
        X[start_idx:end_idx, 1] = y_coords
        X[start_idx:end_idx, 2] = z + np.random.randn(num_points) * noise * 0.5 # Add some noise in z
        y[start_idx:end_idx] = i
        current_idx = end_idx

    return X, y

# Generate the nested arcs dataset
X, y = create_3d_dataset_complex(n_points_per_class=[200, 200, 200], radii=[5, 10, 15], noise=0.5)

# Define a colorblind-friendly colorscale for Plotly
plotly_colorscale = [[0, '#1f77b4'], [0.5, '#ff7f0e'], [1, '#2ca02c']]

# Visualize the dataset
fig = go.Figure(data=[go.Scatter3d(
    x=X[:, 0],
    y=X[:, 1],
    z=X[:, 2],
    mode='markers',
    marker=dict(
        size=5,
        color=y,
        colorscale=plotly_colorscale,
        opacity=0.8
    )
)])

fig.update_layout(
    title='3D Dataset with Nested Arcs (Denser Points)',
    scene=dict(
        xaxis_title='Feature 1',
        yaxis_title='Feature 2',
        zaxis_title='Feature 3'
    )
)

fig.show()

In [None]:
# This code cell defines the core components of the neural network, including activation functions (ReLU and Softmax),
# parameter initialization, the forward pass calculation, loss computation (Cross-Entropy), and one-hot encoding.
# It then initializes the network parameters and performs an initial forward pass to calculate the loss before training.
def relu(x):
    return np.maximum(0, x)

def softmax(x):
    exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
    return exp_x / np.sum(exp_x, axis=1, keepdims=True)

def initialize_parameters(input_size, hidden_size, output_size):
    np.random.seed(42)
    weights_input_hidden = np.random.randn(input_size, hidden_size) * 0.01
    bias_hidden = np.zeros((1, hidden_size))
    weights_hidden_output = np.random.randn(hidden_size, output_size) * 0.01
    bias_output = np.zeros((1, output_size))
    return {
        'W1': weights_input_hidden, 'b1': bias_hidden,
        'W2': weights_hidden_output, 'b2': bias_output
    }

def forward_pass(X, parameters):
    Z1 = np.dot(X, parameters['W1']) + parameters['b1']
    A1 = relu(Z1)
    Z2 = np.dot(A1, parameters['W2']) + parameters['b2']
    A2 = softmax(Z2)
    return Z1, A1, Z2, A2

def compute_loss(y_true, y_pred):
    m = y_true.shape[0]
    loss = -np.sum(y_true * np.log(y_pred + 1e-9)) / m
    return loss

def one_hot_encode(y, num_classes):
    m = y.shape[0]
    one_hot_y = np.zeros((m, num_classes))
    one_hot_y[np.arange(m), y] = 1
    return one_hot_y

# Example usage (assuming X and y are already defined from the previous step)
input_size = X.shape[1]
hidden_size = 25 # Example hidden size
output_size = len(np.unique(y))

parameters = initialize_parameters(input_size, hidden_size, output_size)
y_one_hot = one_hot_encode(y, output_size)

Z1, A1, Z2, A2 = forward_pass(X, parameters)
loss = compute_loss(y_one_hot, A2)

print(f"Initial Loss: {loss}")

In [None]:
def adam_optimizer(parameters, grads, v, s, t, learning_rate=0.01, beta1=0.9, beta2=0.999, epsilon=1e-8):
    """
    Updates parameters using the ADAM optimization algorithm.

    Args:
        parameters: Dictionary containing the weights and biases.
        grads: Dictionary containing the gradients of weights and biases.
        v: Dictionary containing the exponentially weighted averages of past gradients.
        s: Dictionary containing the exponentially weighted averages of past squared gradients.
        t: Iteration number.
        learning_rate: The learning rate.
        beta1: Exponential decay rate for the first moment estimates.
        beta2: Exponential decay rate for the second moment estimates.
        epsilon: A small number to prevent division by zero.

    Returns:
        A tuple containing:
            - parameters: Updated parameters.
            - v: Updated first moment estimates.
            - s: Updated second moment estimates.
    """
    updated_parameters = parameters.copy()
    updated_v = v.copy()
    updated_s = s.copy()

    for key in parameters.keys():
        updated_v[key] = beta1 * v[key] + (1 - beta1) * grads[key]
        updated_s[key] = beta2 * s[key] + (1 - beta2) * (grads[key] ** 2)

        v_corrected = updated_v[key] / (1 - beta1**t)
        s_corrected = updated_s[key] / (1 - beta2**t)

        updated_parameters[key] -= learning_rate * v_corrected / (np.sqrt(s_corrected) + epsilon)

    return updated_parameters, updated_v, updated_s

def backward_pass(X, y_true, A1, A2, parameters):
    """
    Performs the backward pass to compute gradients.

    Args:
        X: Input features.
        y_true: True labels (one-hot encoded).
        A1: Activation from the hidden layer.
        A2: Activation from the output layer (softmax output).
        parameters: Dictionary containing the weights and biases.

    Returns:
        Dictionary containing the gradients of weights and biases.
    """
    m = X.shape[0]

    dZ2 = A2 - y_true
    dW2 = np.dot(A1.T, dZ2) / m
    db2 = np.sum(dZ2, axis=0, keepdims=True) / m

    dA1 = np.dot(dZ2, parameters['W2'].T)
    dZ1 = dA1 * (A1 > 0) # Derivative of ReLU
    dW1 = np.dot(X.T, dZ1) / m
    db1 = np.sum(dZ1, axis=0, keepdims=True) / m

    return {'W1': dW1, 'b1': db1, 'W2': dW2, 'b2': db2}

# Training loop
epochs = 100
learning_rate = 0.01
num_classes = len(np.unique(y))
y_one_hot = one_hot_encode(y, num_classes)

# Initialize ADAM parameters
v = {key: np.zeros_like(param) for key, param in parameters.items()}
s = {key: np.zeros_like(param) for key, param in parameters.items()}
t = 0

saved_parameters = []
losses = []

for i in range(epochs):
    t += 1
    Z1, A1, Z2, A2 = forward_pass(X, parameters)
    loss = compute_loss(y_one_hot, A2)
    losses.append(loss)

    grads = backward_pass(X, y_one_hot, A1, A2, parameters)
    parameters, v, s = adam_optimizer(parameters, grads, v, s, t, learning_rate=learning_rate)

    if (i + 1) % 100 == 0:
        print(f"Epoch {i+1}, Loss: {loss}")

    # Save parameters at intervals for animation

    saved_parameters.append({key: param.copy() for key, param in parameters.items()})

print("Training finished.")

In [None]:
# This code cell creates the initial visualization of the dataset along with the decision boundaries
# of the neural network in its initial state (before training).
# It defines a function `create_decision_boundary_mesh` to generate a mesh grid and predict class labels
# across the 3D space, which is then used to create an isosurface representing the decision boundary.
import plotly.graph_objects as go
import numpy as np

def create_decision_boundary_mesh(parameters, x_range, y_range, z_range, num_points=30):
    """
    Creates a mesh for the decision boundary of a trained neural network.

    Args:
        parameters: Dictionary containing the trained weights and biases.
        x_range: Tuple (min, max) for the x-axis.
        y_range: Tuple (min, max) for the y-axis.
        z_range: Tuple (min, max) for the z-axis.
        num_points: Number of points to sample along each axis.

    Returns:
        A tuple containing:
            - X_grid, Y_grid, Z_grid: Meshgrid for the input space.
            - predicted_class: Predicted class for each point in the meshgrid.
    """
    x = np.linspace(x_range[0], x_range[1], num_points)
    y = np.linspace(y_range[0], y_range[1], num_points)
    z = np.linspace(z_range[0], z_range[1], num_points)
    X_grid, Y_grid, Z_grid = np.meshgrid(x, y, z)

    input_grid = np.c_[X_grid.ravel(), Y_grid.ravel(), Z_grid.ravel()]

    # Forward pass through the network
    Z1 = np.dot(input_grid, parameters['W1']) + parameters['b1']
    A1 = relu(Z1)
    Z2 = np.dot(A1, parameters['W2']) + parameters['b2']
    A2 = softmax(Z2)

    predicted_class = np.argmax(A2, axis=1)

    return X_grid, Y_grid, Z_grid, predicted_class.reshape(X_grid.shape)

# Determine the range of the dataset for visualization
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
z_min, z_max = X[:, 2].min() - 1, X[:, 2].max() + 1

# Create initial decision boundary mesh
X_grid, Y_grid, Z_grid, predicted_class = create_decision_boundary_mesh(
    saved_parameters[0], (x_min, x_max), (y_min, y_max), (z_min, z_max)
)

# Define a colorblind-friendly colorscale for Plotly
plotly_colorscale = [[0, '#1f77b4'], [0.5, '#ff7f0e'], [1, '#2ca02c']]

# Create the initial 3D scatter plot of the dataset
scatter_plot = go.Scatter3d(
    x=X[:, 0], y=X[:, 1], z=X[:, 2],
    mode='markers',
    marker=dict(
        size=5,
        color=y,
        colorscale=plotly_colorscale, # Using the new colorscale
        opacity=0.8
    )
)

# Create the initial decision boundary surface plot
# We are visualizing the implicit surface where the predicted class changes
# This is a simplified representation of the decision boundary
boundary_surface = go.Isosurface(
    x=X_grid.flatten(),
    y=Y_grid.flatten(),
    z=Z_grid.flatten(),
    value=predicted_class.flatten(),
    isomin=0.5,
    isomax=1.5,
    surface_count=3, # Adjust based on the number of classes
    colorscale=plotly_colorscale, # Using the new colorscale
    opacity=0.3,
    caps=dict(x_show=False, y_show=False, z_show=False),
    visible=True,
    showlegend=False
)

# Create the figure
fig = go.Figure(data=[scatter_plot, boundary_surface])

# Update layout
fig.update_layout(
    title="Neural Network Decision Boundaries (Initial State)",
    scene=dict(
        xaxis_title='Feature 1',
        yaxis_title='Feature 2',
        zaxis_title='Feature 3',
        camera=dict(
            up=dict(x=0, y=0, z=1),
            center=dict(x=0, y=0, z=0),
            eye=dict(x=1.25, y=1.25, z=1.25)
        )
    ),
    margin=dict(l=0, r=0, b=0, t=40)
)

fig.show()

In [None]:
# This code cell generates the frames for the animation, capturing the state of the decision boundaries
# at each saved epoch during training. It iterates through the saved network parameters and creates
# an isosurface for the decision boundary for each epoch, combining it with the original scatter plot of the data.
# These frames will be used to create the animation.
# Define a colorblind-friendly colorscale for Plotly
plotly_colorscale = [[0, '#1f77b4'], [0.5, '#ff7f0e'], [1, '#2ca02c']]

# Create frames for the animation
frames = []
for i, params in enumerate(saved_parameters):
    X_grid_frame, Y_grid_frame, Z_grid_frame, predicted_class_frame = create_decision_boundary_mesh(
        params, (x_min, x_max), (y_min, y_max), (z_min, z_max)
    )
    frames.append(
        go.Frame(
            data=[
                go.Scatter3d( # Keep the original scatter plot data
                    x=X[:, 0], y=X[:, 1], z=X[:, 2],
                    mode='markers',
                    marker=dict(
                        size=5,
                        color=y,
                        colorscale=plotly_colorscale, # Updated colorscale
                        opacity=0.8
                    )
                ),
                go.Isosurface( # Update the isosurface data for the current frame
                    x=X_grid_frame.flatten(),
                    y=Y_grid_frame.flatten(),
                    z=Z_grid_frame.flatten(),
                    value=predicted_class_frame.flatten(),
                    isomin=0.5,
                    isomax=1.5,
                    surface_count=3,
                    colorscale=plotly_colorscale, # Updated colorscale
                    opacity=0.3,
                    caps=dict(x_show=False, y_show=False, z_show=False),
                    visible=True,
                    showlegend=False
                )
            ],
            name=f'frame{i}' # Name the frame for animation
        )
    )

In [None]:
# This code cell sets up and displays the interactive Plotly animation of the neural network training process.
# It uses the frames generated in the previous cell to create a figure with a play button and a slider
# that allow the user to visualize how the decision boundaries evolve over the training epochs.
# Create the main figure
fig = go.Figure(data=frames[0].data, layout=frames[0].layout)

# Define animation steps
animation_steps = []
for i in range(len(frames)):
    animation_steps.append(
        dict(
            method='animate',
            label=f'Epoch {i+1}',
            args=[
                [f'frame{i}'],
                dict(
                    mode='immediate',
                    frame=dict(duration=100, redraw=True),
                    transition=dict(duration=0)
                )
            ]
        )
    )

# Define animation layout
fig.update(frames=frames)
fig.update_layout(
    title="Neural Network Training Animation",
    scene=dict(
        xaxis_title='Feature 1',
        yaxis_title='Feature 2',
        zaxis_title='Feature 3',
        camera=dict(
            up=dict(x=0, y=0, z=1),
            center=dict(x=0, y=0, z=0),
            eye=dict(x=1.25, y=1.25, z=1.25)
        )
    ),
    sliders=[dict(
        steps=animation_steps,
        transition=dict(duration=0),
        x=0.1,
        len=0.9
    )],
    updatemenus=[dict(
        type='buttons',
        buttons=[dict(
            label='Play',
            method='animate',
            args=[None, dict(frame=dict(duration=100, redraw=True), transition=dict(duration=0, easing='linear'))]
        )]
    )],
    margin=dict(l=0, r=0, b=0, t=40)
)

# Display the animated figure
fig.show()