In [None]:
# Scratch implementation of artifical neural network (ANN) using some toy datasets

# Task
Implement an Artificial Neural Network (ANN) from scratch by following the provided plan, which includes defining the network architecture, initializing parameters, implementing activation functions, forward propagation, loss function, backward propagation, parameter updates, creating a training loop, generating a toy dataset, and finally training and evaluating the ANN.

## Define Network Architecture

Outline the structure of the ANN, including the number of layers, neurons per layer, and activation functions for each layer. This will set the foundation for our network.


### Network Architecture Definition

For our scratch implementation of an Artificial Neural Network (ANN) using toy datasets, we will define a simple, yet illustrative, network architecture:

1.  **Input Layer**: The size of the input layer will depend on the number of features in our toy dataset. For typical toy datasets, this could be between 2 to 4 features.
2.  **Hidden Layers**: We will implement **one hidden layer**.
    *   **Hidden Layer 1**: This layer will consist of **4 neurons**. A common practice is to use a number of neurons between the input and output layer sizes, often powers of 2. The **activation function** for this layer will be the **sigmoid function**, as it's a classic choice for hidden layers in simpler ANNs.
3.  **Output Layer**: The size of the output layer will depend on the nature of the classification task.
    *   For a **binary classification** problem, the output layer will have **1 neuron**. The **activation function** for the output layer will also be the **sigmoid function**, as it naturally outputs values between 0 and 1, suitable for binary classification probabilities.

## Initialize Parameters

Write code to randomly initialize the weights and biases for all layers of the neural network.


The task is to initialize the weights and biases of a neural network. We need to define a Python function `initialize_parameters` that takes the network's architecture (input, hidden, output sizes) and uses `numpy` to generate random weights and zero biases, storing them in a dictionary.



In [12]:
import numpy as np

def initialize_parameters(input_size, hidden_size, output_size):
    """
    Initializes the weights and biases for a two-layer neural network.

    Arguments:
    input_size -- size of the input layer
    hidden_size -- size of the hidden layer
    output_size -- size of the output layer

    Returns:
    parameters -- python dictionary containing your parameters:
                    "W1" -- weight matrix of shape (hidden_size, input_size)
                    "b1" -- bias vector of shape (hidden_size, 1)
                    "W2" -- weight matrix of shape (output_size, hidden_size)
                    "b2" -- bias vector of shape (output_size, 1)
    """

    # Initialize W1 with small random values
    W1 = np.random.randn(hidden_size, input_size) * 0.01
    # Initialize b1 with zeros
    b1 = np.zeros((hidden_size, 1))
    # Initialize W2 with small random values
    W2 = np.random.randn(output_size, hidden_size) * 0.01
    # Initialize b2 with zeros
    b2 = np.zeros((output_size, 1))

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}

    return parameters

print("Parameters initialization function defined.")

Parameters initialization function defined.


## Implement Activation Functions

Define common activation functions like sigmoid, ReLU, or tanh, which will be used in the network's layers.


First, we will execute the existing sigmoid function definition to ensure it's loaded into the environment. Then, we will define the ReLU activation function.



In [2]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

Now that the sigmoid function is executed, we will define the `relu` function.

In [3]:
def relu(x):
    return np.maximum(0, x)

print("ReLU function defined.")

ReLU function defined.


## Implement Forward Propagation

Develop the forward pass mechanism, where input data is fed through the network to produce an output. This involves calculating weighted sums and applying activation functions.


We need to define the `forward_propagation` function as described in the instructions, which involves calculating weighted sums and applying activation functions for both the hidden and output layers, and storing intermediate values in a cache.



In [4]:
def forward_propagation(X, parameters):
    """
    Implements the forward propagation for the model:
    LINEAR -> SIGMOID -> LINEAR -> SIGMOID

    Arguments:
    X -- input data of shape (input_size, number of examples)
    parameters -- python dictionary containing your parameters ("W1", "b1", "W2", "b2")

    Returns:
    A2 -- The sigmoid output of the second activation
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"
    """

    # Retrieve parameters from the dictionary
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]

    # Linear combination for the first layer (hidden layer)
    Z1 = np.dot(W1, X) + b1
    # Apply sigmoid activation
    A1 = sigmoid(Z1)

    # Linear combination for the second layer (output layer)
    Z2 = np.dot(W2, A1) + b2
    # Apply sigmoid activation for the final output
    A2 = sigmoid(Z2)

    # Store intermediate values in cache for backpropagation
    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2}

    return A2, cache

print("Forward propagation function defined.")

Forward propagation function defined.


## Implement Loss Function

Choose and implement a suitable loss function (e.g., Mean Squared Error for regression or Cross-Entropy for classification) to measure the difference between predicted and actual outputs.


We need to implement the `compute_cost` function, which will calculate the binary cross-entropy loss between the predicted output `A2` and the true labels `Y`, handling potential `log(0)` errors and averaging the loss over all examples.



In [5]:
def compute_cost(A2, Y):
    """
    Computes the binary cross-entropy cost.

    Arguments:
    A2 -- The sigmoid output of the second activation, of shape (1, number of examples)
    Y -- "true" labels vector of shape (1, number of examples)

    Returns:
    cost -- cross-entropy cost
    """

    m = Y.shape[1]  # Number of examples

    # Add a small epsilon to prevent log(0) errors
    epsilon = 1e-8
    A2 = np.clip(A2, epsilon, 1 - epsilon)

    # Compute the cross-entropy cost
    cost = - (1/m) * np.sum(Y * np.log(A2) + (1 - Y) * np.log(1 - A2))

    # Ensure cost is a single number
    cost = np.squeeze(cost)

    return cost

print("Cost function defined.")

Cost function defined.


## Implement Backward Propagation (Backpropagation)

Develop the backpropagation algorithm to calculate the gradients of the loss function with respect to the network's weights and biases. This is essential for learning.


We need to define the `backward_propagation` function as per the instructions, which involves calculating the gradients of the loss function with respect to the network's weights and biases (dW1, db1, dW2, db2) using the provided formulas and intermediate values from the cache.



In [6]:
def backward_propagation(parameters, cache, X, Y):
    """
    Implements the backward propagation for the model.

    Arguments:
    parameters -- python dictionary containing our parameters
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"
    X -- input data of shape (input_size, number of examples)
    Y -- "true" labels vector of shape (1, number of examples)

    Returns:
    grads -- python dictionary containing your gradients with respect to different parameters:
             "dW1" -- gradient of cost with respect to W1
             "db1" -- gradient of cost with respect to b1
             "dW2" -- gradient of cost with respect to W2
             "db2" -- gradient of cost with respect to b2
    """
    m = X.shape[1] # Number of examples

    # Retrieve W1 and W2 from parameters
    W1 = parameters["W1"]
    W2 = parameters["W2"]

    # Retrieve A1, A2, Z1, Z2 from cache
    A1 = cache["A1"]
    A2 = cache["A2"]
    Z1 = cache["Z1"]
    Z2 = cache["Z2"]

    # Backward propagation for the output layer
    dZ2 = A2 - Y
    dW2 = (1 / m) * np.dot(dZ2, A1.T)
    db2 = (1 / m) * np.sum(dZ2, axis=1, keepdims=True)

    # Backward propagation for the hidden layer
    dA1 = np.dot(W2.T, dZ2)
    # Sigmoid gradient: A1 * (1 - A1)
    dZ1 = dA1 * (A1 * (1 - A1))
    dW1 = (1 / m) * np.dot(dZ1, X.T)
    db1 = (1 / m) * np.sum(dZ1, axis=1, keepdims=True)

    grads = {"dW1": dW1,
             "db1": db1,
             "dW2": dW2,
             "db2": db2}

    return grads

print("Backward propagation function defined.")

Backward propagation function defined.


## Implement Parameter Update (Optimization)

Write code to update the network's weights and biases using an optimization algorithm like Gradient Descent, based on the gradients computed during backpropagation.


We need to define the `update_parameters` function as instructed, which will apply the gradient descent update rule to the network's weights and biases using the calculated gradients and a specified learning rate.



In [7]:
def update_parameters(parameters, grads, learning_rate):
    """
    Updates parameters using the gradient descent update rule.

    Arguments:
    parameters -- python dictionary containing your parameters
    grads -- python dictionary containing your gradients
    learning_rate -- the learning rate, a scalar

    Returns:
    parameters -- python dictionary containing your updated parameters
    """

    # Retrieve current parameters
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]

    # Retrieve gradients
    dW1 = grads["dW1"]
    db1 = grads["db1"]
    dW2 = grads["dW2"]
    db2 = grads["db2"]

    # Update rule for each parameter
    W1 = W1 - learning_rate * dW1
    b1 = b1 - learning_rate * db1
    W2 = W2 - learning_rate * dW2
    b2 = b2 - learning_rate * db2

    # Store updated parameters
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}

    return parameters

print("Parameters update function defined.")

Parameters update function defined.


## Create Training Loop

Combine forward propagation, loss calculation, backpropagation, and parameter updates into an iterative training loop that runs for a specified number of epochs.


We need to define the `nn_model` function to encapsulate the entire training loop as described in the instructions. This function will call the previously defined functions for initialization, forward propagation, cost computation, backward propagation, and parameter updates, and will also record the cost over iterations.



In [8]:
def nn_model(X, Y, n_h, num_iterations, learning_rate, print_cost=False):
    """
    Implements a two-layer neural network model.

    Arguments:
    X -- input data of shape (input_size, number of examples)
    Y -- true "label" vector of shape (1, number of examples)
    n_h -- size of the hidden layer
    num_iterations -- Number of iterations to run for gradient descent
    learning_rate -- learning rate of the gradient descent update rule
    print_cost -- if True, print the cost every 100 iterations

    Returns:
    parameters -- parameters learnt by the model. They can then be used to predict.
    costs -- list of costs recorded during training
    """

    n_x = X.shape[0]  # input layer size
    n_y = Y.shape[0]  # output layer size (fixed to 1 for binary classification)

    # 1. Initialize parameters
    parameters = initialize_parameters(n_x, n_h, n_y)

    # Lists to store costs for plotting
    costs = []

    # 2. Loop (gradient descent)
    for i in range(0, num_iterations):

        # a. Forward propagation
        A2, cache = forward_propagation(X, parameters)

        # b. Compute cost
        cost = compute_cost(A2, Y)

        # c. Backward propagation
        grads = backward_propagation(parameters, cache, X, Y)

        # d. Update parameters
        parameters = update_parameters(parameters, grads, learning_rate)

        # e. Optionally, print the cost and store it
        if print_cost and i % 100 == 0:
            print(f"Cost after iteration {i}: {cost}")
            costs.append(cost)

    return parameters, costs

print("Neural network model training function defined.")

Neural network model training function defined.


## Generate Toy Dataset

Create a simple synthetic dataset (e.g., for binary classification or regression) to test our ANN implementation. This will allow us to easily visualize and debug the network's performance.


We need to generate a synthetic dataset for binary classification using `make_moons`, reshape it according to the ANN input requirements, and then print the shapes of the resulting `X` and `Y` to verify the transformation. This addresses all parts of the subtask.



In [9]:
from sklearn.datasets import make_moons

# 1. Generate a dataset using make_moons
X, Y = make_moons(n_samples=200, noise=0.20, random_state=42)

# 2. Reshape X and Y to be compatible with the ANN input and output requirements
# X should be of shape (number_of_features, number_of_examples)
# Y should be of shape (1, number_of_examples)
X = X.T
Y = Y.reshape(1, Y.shape[0])

# 3. Print the shapes of X and Y to verify the reshaping
print(f"Shape of X: {X.shape}")
print(f"Shape of Y: {Y.shape}")

Shape of X: (2, 200)
Shape of Y: (1, 200)


## Train and Evaluate the ANN

Train the implemented ANN using the toy dataset and evaluate its performance using appropriate metrics (e.g., accuracy, precision, recall, or MSE) to ensure it's learning correctly.


## Summary:

### Data Analysis Key Findings

*   **Network Architecture Defined**: A two-layer Artificial Neural Network (ANN) was defined, consisting of an input layer, one hidden layer with 4 neurons and a sigmoid activation, and an output layer with 1 neuron and a sigmoid activation for binary classification.
*   **Core ANN Components Implemented**: All essential components for an ANN were successfully implemented from scratch:
    *   **Parameter Initialization**: A function `initialize_parameters` was created to set initial weights with small random values and biases to zeros.
    *   **Activation Functions**: `sigmoid` and `relu` functions were defined, with `sigmoid` being used for both hidden and output layers in the final model.
    *   **Forward Propagation**: The `forward_propagation` function correctly calculates weighted sums and applies activation functions across layers, storing intermediate values in a `cache`.
    *   **Loss Function**: A `compute_cost` function was implemented for binary cross-entropy loss, incorporating numerical stability with an epsilon value of $1\text{e}-8$.
    *   **Backward Propagation**: The `backward_propagation` function accurately computes gradients for all weights and biases, utilizing the cached values and applying the sigmoid gradient.
    *   **Parameter Update**: An `update_parameters` function was developed to update network weights and biases using the gradient descent rule with a specified learning rate.
*   **Training Loop Established**: A comprehensive `nn_model` function was created, integrating all implemented components into an iterative training loop. This function handles parameter initialization, forward and backward propagation, cost computation, and parameter updates over a specified number of iterations.
*   **Toy Dataset Generated**: A synthetic binary classification dataset (200 samples, `make_moons` with $0.20$ noise) was successfully generated and reshaped to be compatible with the ANN's input (`X` of shape (2, 200)) and output (`Y` of shape (1, 200)) requirements.

### Insights or Next Steps

*   The foundational components of an ANN have been successfully implemented and integrated, providing a robust base for further development and experimentation.
*   The next logical step is to train the implemented ANN using the generated toy dataset and then evaluate its performance using appropriate metrics (e.g., accuracy, precision, recall) to ensure it's learning correctly and to identify potential areas for optimization.


## Train and Evaluate the ANN

Train the implemented ANN using the toy dataset and evaluate its performance using appropriate metrics (e.g., accuracy, precision, recall, or MSE) to ensure it's learning correctly.

We need to define a `predict` function that takes the trained parameters and input data `X` to make predictions. This function will use the forward propagation logic and then threshold the output `A2` at 0.5 to classify the data. After defining the predict function, I will execute the `nn_model` function to train the neural network, then make predictions on the `X` dataset and calculate the accuracy.

In [11]:
def predict(parameters, X):
    """
    Using the learned parameters, predicts a class for each example in X

    Arguments:
    parameters -- python dictionary containing your parameters
    X -- input data of size (input_size, number of examples)

    Returns:
    predictions -- vector of predictions of our model (0 or 1)
    """

    # Forward propagation to get A2 (probabilities)
    A2, cache = forward_propagation(X, parameters)

    # Convert probabilities to actual predictions (0 or 1)
    predictions = (A2 > 0.5).astype(int)

    return predictions

# Set hyperparameters
input_size = X.shape[0]
hidden_size = 4 # As defined in the network architecture
output_size = Y.shape[0]
num_iterations = 10000
learning_rate = 0.05

print("Starting ANN training...")
# Train the model
parameters, costs = nn_model(X, Y, n_h=hidden_size, num_iterations=num_iterations, learning_rate=learning_rate, print_cost=True)

print("Training complete. Making predictions...")
# Make predictions
predictions = predict(parameters, X)

# Calculate accuracy, explicitly extracting scalar from dot products to avoid DeprecationWarning
accuracy = (np.dot(Y, predictions.T).item() + np.dot(1 - Y, 1 - predictions.T).item()) / Y.size * 100

print(f"Accuracy: {accuracy:.2f}%")

Starting ANN training...
Cost after iteration 0: 0.6931551143394612
Cost after iteration 100: 0.6931456321884061
Cost after iteration 200: 0.6931362854282024
Cost after iteration 300: 0.6931203760741177
Cost after iteration 400: 0.6930880440425858
Cost after iteration 500: 0.6930192223815069
Cost after iteration 600: 0.692871119565736
Cost after iteration 700: 0.6925517320849316
Cost after iteration 800: 0.6918639221017446
Cost after iteration 900: 0.6903905521471135
Cost after iteration 1000: 0.6872739467718904
Cost after iteration 1100: 0.6808569791753908
Cost after iteration 1200: 0.6683347425296517
Cost after iteration 1300: 0.646136950449808
Cost after iteration 1400: 0.612185851573386
Cost after iteration 1500: 0.5690696753796732
Cost after iteration 1600: 0.5236986548719255
Cost after iteration 1700: 0.4826664387840508
Cost after iteration 1800: 0.44897540964052096
Cost after iteration 1900: 0.4225778742743356
Cost after iteration 2000: 0.40216931781684573
Cost after iteration 2