## `FeedforwardNeuralNetwork` class

In the following code block, you can find a template for the class `FeedforwardNeuralNetwork` that implements a feedforward neural network
$$
	\begin{aligned}
		{\rm NN}  & : \mathbb{R}^{d} \to \mathbb{R}^{p}, \\
		{\rm NN} (x,y) & = \mathbf{A} \cdot F_{L} \circ \ldots \circ F_{1} (x,y)
	\end{aligned}
$$
with 
$$
	\begin{aligned}
		F_i : \mathbb{R}^{n_{i-1}} & \to \mathbb{R}^{n_{i}}, \\
		\mathbf{x}_{i} & = \sigma\left( \mathbf{W}_i \mathbf{x}_{i-1} + \mathbf{b}_i \right).
	\end{aligned}
$$
for $i=1,\ldots,L$. Here, $\mathbf{W}_i \in \mathbb{R}^{n_{i} \times n_{i-1}}$ ($n_0 \coloneqq d$), $\mathbf{b}_i \in \mathbb{R}^{n_{i}}$, $\mathbf{A} \in \mathbb{R}^{p \times n_L}$, and $\sigma\left(x\right)$ is the activation function, which is applied element-wise.

The implementations of the methods of this class are missing an have to be filled in.

In [8]:
import numpy as np

class FeedforwardNeuralNetwork:
    def __init__(self, layer_sizes):
        """
        Initialize the feedforward neural network.
        
        Parameters:
        layer_sizes (list): List containing the number of neurons in each layer.
        """
        self.layer_sizes = layer_sizes
        self.num_layers = len(layer_sizes)
        self.weights = []
        self.biases = []


        for i in range(1, len(self.layer_sizes)):
            self.weights.append(np.random.randn(self.layer_sizes[i], self.layer_sizes[i - 1]) * 0.01)
            self.biases.append(np.zeros((self.layer_sizes[i], 1)))
            
            
            
    def activation(self, z):
        """
        Activation function.
        
        Parameters:
        z (numpy.ndarray): Input array.
        
        Returns:
        numpy.ndarray: Output array after applying the activation function.
        """
        
        return np.maximum(0, z) # ReLU
        # tanh return np.tanh(z) 


    def activation_derivative(self, z):
        """
        Derivative of the activation function.
        
        Parameters:
        z (numpy.ndarray): Input array.
        
        Returns:
        numpy.ndarray: Output array after applying derivative of the activation function.
        """
        
        return (z > 0).astype(float)
        # tanh return 1 - np.tanh(z) ** 2
    
    def feedforward(self, x):
        """
        Perform a feedforward pass through the network.
        
        Parameters:
        x (numpy.ndarray): Input array.
        
        Returns:
        numpy.ndarray: Output of the network.
        """
            
        activations = [x]  
        zs = [] 
        a = x
        
        for w, b in zip(self.weights, self.biases):
            z = np.dot(w, a) + b
            zs.append(z)
            a = self.activation(z)
            activations.append(a)
        return activations, zs


    def compute_cost(self, y_pred, y_train):
        """
        Compute the cost function.
        
        Parameters:
        y_pred (numpy.ndarray): Predicted labels.
        y_train (numpy.ndarray): True labels.
        
        Returns:
        float: Cost value.
        """

        return np.mean((y_pred - y_train)**2)
    
   

In [2]:
activation_function_text = """
Activation Function:
    The activation function introduces non-linearity, enabling the network to learn complex patterns and relationships in data.
    1. Default: ReLU (Rectified Linear Unit):
        Mathematical expression:
            𝑓(𝑧) = max(0,𝑧)
    2. Optional: tanh (Hyperbolic Tangent):
        Mathematical expression:
            𝑓(𝑧) = tanh(𝑧)
"""

print(activation_function_text)


Activation Function:
    The activation function introduces non-linearity, enabling the network to learn complex patterns and relationships in data.
    1. Default: ReLU (Rectified Linear Unit):
        Mathematical expression:
            𝑓(𝑧) = max(0,𝑧)
    2. Optional: tanh (Hyperbolic Tangent):
        Mathematical expression:
            𝑓(𝑧) = tanh(𝑧)



In [None]:
def activation(self, z):
        """
        Activation function.
        
        Parameters:
        z (numpy.ndarray): Input array.
        
        Returns:
        numpy.ndarray: Output array after applying the activation function.
        """
        
        return np.maximum(0, z) # ReLU
        # tanh return np.tanh(z) 

In [None]:
derivative_activation_function_text = """
Derivative of the Activation Function:
    Used during backpropagation to compute gradients.
    1. ReLU Derivative:
        Mathematical expression:
            𝑓'(𝑧) = {
                1 if 𝑧 > 0,
                0 if 𝑧 ≤ 0
            }
    2. tanh Derivative:
        Mathematical expression:
            𝑓'(𝑧) = 1 − tanh²(𝑧)
"""

print(derivative_activation_function_text)

In [None]:
   def activation_derivative(self, z):
        """
        Derivative of the activation function.
        
        Parameters:
        z (numpy.ndarray): Input array.
        
        Returns:
        numpy.ndarray: Output array after applying derivative of the activation function.
        """
        
        return (z > 0).astype(float)
        # tanh return 1 - np.tanh(z) ** 2

In [8]:
loss_function_text = """
Loss Function:
    The loss function quantifies the difference between the network’s predicted outputs (𝑦_pred) and the true values (𝑦_train).
    1. Mean Squared Error (MSE):
        Mathematical expression:
            MSE = (1 / 𝑛) * Σ(i=1 to 𝑛) (𝑦_pred,𝑖 − 𝑦_train,𝑖)²
"""

print(loss_function_text)


Loss Function:
    The loss function quantifies the difference between the network’s predicted outputs (𝑦_pred) and the true values (𝑦_train).
    1. Mean Squared Error (MSE):
        Mathematical expression:
            MSE = (1 / 𝑛) * Σ(i=1 to 𝑛) (𝑦_pred,𝑖 − 𝑦_train,𝑖)²



In [None]:
 def compute_cost(self, y_pred, y_train):
        """
        Compute the cost function.
        
        Parameters:
        y_pred (numpy.ndarray): Predicted labels.
        y_train (numpy.ndarray): True labels.
        
        Returns:
        float: Cost value.
        """

        return np.mean((y_pred - y_train)**2)

In [11]:
forward_propagation_text = """
Forward Propagation:
        Forward propagation passes input data through each layer of the network to generate output predictions.
        1. Layer-wise Computation:
            - Linear transformation:
                𝑧[𝑙] = 𝑊[𝑙]𝑎[𝑙−1] + 𝑏[𝑙]
            - Activation function:
                𝑎[𝑙] = 𝑓(𝑧[𝑙])
        2. Intermediate Storage:
            - During forward propagation, both the activations (𝑎[𝑙]) and linear transformations (𝑧[𝑙]) 
              are stored for each layer. These values are later used during backpropagation.
"""

print(forward_propagation_text)


Forward Propagation:
        Forward propagation passes input data through each layer of the network to generate output predictions.
        1. Layer-wise Computation:
            - Linear transformation:
                𝑧[𝑙] = 𝑊[𝑙]𝑎[𝑙−1] + 𝑏[𝑙]
            - Activation function:
                𝑎[𝑙] = 𝑓(𝑧[𝑙])
        2. Intermediate Storage:
            - During forward propagation, both the activations (𝑎[𝑙]) and linear transformations (𝑧[𝑙]) 
              are stored for each layer. These values are later used during backpropagation.



In [12]:
  def feedforward(self, x):
        """
        Perform a feedforward pass through the network.
        
        Parameters:
        x (numpy.ndarray): Input array.
        
        Returns:
        numpy.ndarray: Output of the network.
        """
            
        activations = [x]   # Store activations for all layers
        linear_outputs = []  # Store pre-activation values for all layers
        a = x  # Input is the activation of the first layer
        
        for w, b in zip(self.weights, self.biases):
            z = np.dot(w, a) + b  # Linear transformation
            linear_outputs.append(z)
            a = self.activation(z)  # Apply activation function
            activations.append(a)
        return activations, linear_outputs
