# Assignment 3: Predicting Mapping Penalties with ANN
**Due:** June 5, 2025, 11:59 PM

**Author:** Tony Liang

**Student Number:** 20990204

In this assignment, a feed-forward artificial neural network (ANN) is implemented from scratch to predict the penalty score of a mapping between tasks and employees.

In this notebook we will:
1. Load the 100 mappings dataset  
2. Preprocess & encode into 110-dim vectors  
3. Define the ANN architectures and implement forward, backward, updates by hand  
5. Train via mini-batch SGD over grid of hyperparameters  
6. Produce the eight required comparison plots  
7. Export results for report submission  



## Assignment Imports

In [62]:
!git clone https://github.com/tonyzrl/ANN_Assignment

fatal: destination path 'ANN_Assignment' already exists and is not an empty directory.


In [63]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import time

# For reproducibility
np.random.seed(42)

# Task data: ID, Estimated Time, Difficulty, Deadline, Skill Required
tasks = [{"id": "T1", "estimated_time": 4, "difficulty": 3, "deadline": 8, "skill_required": "A"},
        {"id": "T2", "estimated_time": 6, "difficulty": 5, "deadline": 12, "skill_required": "B"},
        {"id": "T3", "estimated_time": 2, "difficulty": 2, "deadline": 6, "skill_required": "A"},
        {"id": "T4", "estimated_time": 5, "difficulty": 4, "deadline": 10, "skill_required": "C"},
        {"id": "T5", "estimated_time": 3, "difficulty": 1, "deadline": 7, "skill_required": "A"},
        {"id": "T6", "estimated_time": 8, "difficulty": 6, "deadline": 15, "skill_required": "B"},
        {"id": "T7", "estimated_time": 4, "difficulty": 3, "deadline": 9, "skill_required": "C"},
        {"id": "T8", "estimated_time": 7, "difficulty": 5, "deadline": 14, "skill_required": "B"},
        {"id": "T9", "estimated_time": 2, "difficulty": 2, "deadline": 5, "skill_required": "A"},
        {"id": "T10", "estimated_time": 6, "difficulty": 4, "deadline": 11, "skill_required": "C"},]

# Employee data: ID, Available hours, Skill level, Skills
employees = [{"id": "E1", "hours_avail": 10, "skill_level": 4, "skills": ["A", "C"]},
            {"id": "E2", "hours_avail": 12, "skill_level": 6, "skills": ["A", "B", "C"]},
            {"id": "E3", "hours_avail": 8, "skill_level": 3, "skills": ["A"]},
            {"id": "E4", "hours_avail": 15, "skill_level": 7, "skills": ["B", "C"]},
            {"id": "E5", "hours_avail": 9, "skill_level": 5, "skills": ["A", "C"]}]

## Data Loading & Preprocessing

In [64]:
def one_hot_encode(skills):
    """
    One-hot encode a list of skills, e.g. ['A','C'] -> [1,0,1].
    """
    mapping = {'A': 0, 'B': 1, 'C': 2}
    vec = [0, 0, 0]
    for s in skills:
        vec[mapping[s]] = 1
    return vec

def construct_input_vector(mapping_row):
    """
    Given one row of the mapping CSV (task→employee assignments + penalty),
    plus the list of task & employee, construct the 110-dim vector.
    """
    input_vector = []
    # First 10 entries are employee assignments; last entry is penalty
    assignments = mapping_row[:10]

    for idx, emp_id in enumerate(assignments, start=1):
        task_id = f"T{idx}"
        # Find the task dict
        task = next(t for t in tasks if t["id"] == task_id)
        # Find the employee dict
        emp = next(e for e in employees if e["id"] == emp_id)

        # Task features: [time, difficulty, deadline] + one-hot(required skill)
        task_features = [
            task["estimated_time"],
            task["difficulty"],
            task["deadline"]
        ] + one_hot_encode(task["skill_required"])

        # Employee features: [hours_avail, skill_level] + one-hot(skills)
        emp_features = [
            emp["hours_avail"],
            emp["skill_level"],
        ] + one_hot_encode(emp["skills"])

        input_vector.extend(task_features + emp_features)

    return np.array(input_vector)

In [65]:
data = pd.read_csv('/content/ANN_Assignment/data/task_assignment_data.csv')
data = data.values

assignment_inputs = []
penalties = []

for row in data:
    assignment_inputs.append(construct_input_vector(row))
    penalties.append(row[-1])

X = np.vstack(assignment_inputs)             # (N, 110)
Y = np.array(penalties).reshape(-1, 1) # (N,   1)

# Shuffle data and split 70/15/15
N = X.shape[0]
perm = np.random.permutation(N)
X, Y = X[perm], Y[perm]

X_train, Y_train = X[:70], Y[:70]
X_val,   Y_val   = X[70:85], Y[70:85]
X_test,  Y_test  = X[85:],   Y[85:]

# Transpose for Network
X_train, Y_train = X_train.T, Y_train.T   # (110, N_train), (1, N_train)
X_val, Y_val = X_val.T, Y_val.T     # (110, N_val),   (1, N_val)
X_test,  Y_test  = X_test.T,  Y_test.T    # (110, N_test),  (1, N_test)

print("Shapes:", X_train.shape, Y_train.shape, X_val.shape, Y_val.shape, X_test.shape, Y_test.shape)

Shapes: (110, 70) (1, 70) (110, 15) (1, 15) (110, 15) (1, 15)


## Model Definitions

**ReLU**

The Rectified Linear Unit (ReLU) is a simple, yet highly effective activation function commonly used in Neural Networks. It is defined as:

**\begin{equation}
f(Z) = max(0, Z)
\end{equation}**

Where $Z$ is the input to the function. ReLU sets all negative values of $Z$ to zero, and leaves the positive values unchanged.

The derivative of the ReLU function can be computed as:

**\begin{equation}
f'(Z) = \begin{cases}
0, & \text{if } Z \leq 0 \
1, & \text{if } Z > 0
\end{cases}
\end{equation}**

---

**Sigmoid**

The Sigmoid function is a common activation function used in Neural Networks, particularly for binary classification problems. It is represented by the following formula:

**\begin{equation}
f(Z) = \frac{1}{1+e^{-Z}}
\end{equation}**

Where $Z$ is the input to the function. The Sigmoid function maps any real-valued number to a value between 0 and 1, which can be interpreted as a probability.

The derivative of the Sigmoid function can be computed as:

\begin{equation}
f'(Z) = f(Z)(1-f(Z))
\end{equation}

In [66]:
def sigmoid(Z):
    """
    Implement the Sigmoid function.

    Arguments:
    Z -- Output of the linear layer

    Returns:
    A -- Post-activation parameter
    cache -- a python dictionary containing "A" for backpropagation
    """
    A = 1/(1+np.exp(-Z))
    cache = Z
    return A, cache

def sigmoid_deriv(dA, cache):
    """
    Implement the backward propagation for a single sigmoid unit.

    Arguments:
    dA -- post-activation gradient
    cache -- 'Z' stored during forward pass

    Returns:
    dZ -- Gradient of the cost with respect to Z
    """
    Z = cache
    s = 1/(1+np.exp(-Z))
    dZ = dA * s * (1-s)
    return dZ

def relu(Z):
    """
    Implement the ReLU function.

    Arguments:
    Z -- Output of the linear layer

    Returns:
    A -- Post-activation parameter
    cache -- used for backpropagation
    """
    A = np.maximum(0,Z)
    cache = Z
    return A, cache

def relu_deriv(dA, cache):
    """
    Implement the backward propagation for a single ReLU unit.

    Arguments:
    dA -- post-activation gradient
    cache -- 'Z'  stored for backpropagation

    Returns:
    dZ -- Gradient of the cost with respect to Z
    """
    Z = cache
    dZ = np.array(dA, copy=True)
    # When z <= 0, dz is equal to 0 as well.
    dZ[Z <= 0] = 0

    return dZ

In [67]:
class NeuralNetwork:
    def __init__(self, layer_dims=[110, 256, 1], learning_rate=1e-5, activation='relu'):
        """
        layer_dims: list of layer sizes, e.g [110,256,1] or [110,128,128,1]
        learning_rate: step size for gradient descent
        activation: 'relu' or 'sigmoid' for HIDDEN layers
        """
        self.layer_dims    = layer_dims
        self.learning_rate = learning_rate
        # pick activation & its derivative
        if activation.lower() == 'relu':
            self.activation       = relu
            self.activation_deriv = relu_deriv
        else:
            self.activation       = sigmoid
            self.activation_deriv = sigmoid_deriv

        # number of layers (excluding input)
        self.L = len(layer_dims) - 1

        # initialise parameters for weights and biases
        for l in range(1, self.L + 1):
            n_in  = layer_dims[l-1]
            n_out = layer_dims[l]
            setattr(self, f'W{l}', np.random.randn(n_out, n_in) * 0.01)
            setattr(self, f'b{l}', np.zeros((n_out, 1)))

    def forward(self, X):
        """
        Performs a full forward pass.
        Returns:
          Y_hat: (1, m) predictions
          caches: list of ((A_prev,W,b), Z) tuples
        """
        caches = []
        A = X
        # hidden layers
        for l in range(1, self.L):
            W = getattr(self, f'W{l}')
            b = getattr(self, f'b{l}')
            Z = W @ A + b
            A, _cacheZ = self.activation(Z)      # returns (A, Z)
            caches.append(((A, W, b), _cacheZ))
        # output layer (linear)
        Wl = getattr(self, f'W{self.L}')
        bl = getattr(self, f'b{self.L}')
        ZL = Wl @ A + bl
        caches.append(((A, Wl, bl), ZL))
        return ZL, caches

    def compute_cost(self, Y_hat, Y):
        """
        Mean Squared Error:
          (1/m) * sum((Y_hat - Y)^2)
        Y_hat, Y both shape (1, m)
        """
        return np.mean((Y_hat - Y)**2)

    def back_layer(self, dZ, cache):
        """
        Backprop for a single layer given dZ = dL/dZ_l.
        cache: ((A_prev, W, b), Z)
        Returns dA_prev, dW, db.
        """
        (A_prev, W, b), Z = cache
        m = A_prev.shape[1]
        dW = (1/m) * (dZ @ A_prev.T)
        db = (1/m) * np.sum(dZ, axis=1, keepdims=True)
        dA_prev = W.T @ dZ
        return dA_prev, dW, db

    def backward(self, Y_hat, Y, caches):
        """
        Performs backprop over the whole network.
        Returns a dict of grads {dW1, db1, …, dWL, dbL}.
        """
        grads = {}
        m = Y.shape[1]

        # dZ for MSE loss at output: d/dZ [ (1/m) ∑ (ZL - Y)^2 ] = 2*(ZL - Y)/m
        dZ = 2 * (Y_hat - Y) / m

        # **Output layer** gradients
        cacheL = caches[-1]
        dA_prev, dWl, dbl = self.back_layer(dZ, cacheL)
        grads[f'dW{self.L}'] = dWl
        grads[f'db{self.L}'] = dbl

        # **Hidden layers** (L-1 .. 1)
        dA = dA_prev
        for l in reversed(range(1, self.L)):
            cache_l = caches[l-1]
            # first convert dA → dZ via activation derivative
            Z = cache_l[1]  # this is the stored Z
            dZ = self.activation_deriv(dA, Z)
            dA, dW, db = self.back_layer(dZ, cache_l)
            grads[f'dW{l}'] = dW
            grads[f'db{l}'] = db

        return grads

    def update_parameters(self, grads):
        """
        Applies gradient descent: W -= lr * dW,  b -= lr * db.
        """
        for l in range(1, self.L+1):
            W = getattr(self, f'W{l}')
            b = getattr(self, f'b{l}')
            dW = grads[f'dW{l}']
            db = grads[f'db{l}']
            setattr(self, f'W{l}', W - self.learning_rate * dW)
            setattr(self, f'b{l}', b - self.learning_rate * db)

## Training Loop

In [75]:
def training(layer_dims, X_train, y_train,
               X_val, y_val, X_test, y_test,
               learning_rates, batch_sizes,
               activations, epochs):
    """
    Train a NeuralNetwork with grid search over learning_rates, batch_sizes, activations.

    Returns:
      results: list of dicts with keys
        'learning_rate', 'batch_size', 'activation',
        'train_losses', 'val_losses', 'epoch_times', 'test_loss'
    """
    results = []

    for lr in learning_rates:
        for batch_size in batch_sizes:
            for activation in activations:
                # instantiate a fresh model for this config
                model = NeuralNetwork(layer_dims, activation=activation)

                train_losses = []
                val_losses   = []
                epoch_times  = []

                for epoch in range(epochs):
                    t0 = time.time()

                    # shuffle training examples
                    perm = np.random.permutation(X_train.shape[1])
                    X_sh, y_sh = X_train[:, perm], y_train[:, perm]

                    # mini-batch gradient descent
                    for i in range(0, X_sh.shape[1], batch_size):
                        xb = X_sh[:, i:i+batch_size]  # shape (110, batch_size)
                        yb = y_sh[:, i:i+batch_size]  # shape (1,   batch_size)

                        # forward / backward / update
                        y_pred, cache = model.forward(xb)
                        grads = model.backward(yb, cache)
                        model.update_params(grads, lr)

                    # record epoch losses & time
                    y_tr, _ = model.forward(X_train)
                    y_va, _ = model.forward(X_val)
                    train_losses.append(np.mean((y_tr - y_train)**2))
                    val_losses.append( np.mean((y_va - y_val)**2))
                    epoch_times.append(time.time() - t0)

                # final test-set evaluation
                y_te, _  = model.forward(X_test)
                test_loss = np.mean((y_te - y_test)**2)

                # store this configuration’s results
                results.append({
                    'learning_rate': lr,
                    'batch_size':   batch_size,
                    'activation':   activation,
                    'train_losses': train_losses,
                    'val_losses':   val_losses,
                    'epoch_times':  epoch_times,
                    'test_loss':    test_loss
                })

    return results


# ——— Example usage ———
layer_dims_A = [110, 256, 1]   # Model A
layer_dims_B = [110, 128, 128, 1]  # Model B

# Hyperparameter grid
learning_rates = [0.01, 0.001, 0.0001]
batch_sizes    = [8, 16, 32]
activations    = ['sigmoid', 'relu']
epochs         = 100  # or as required

results_A = training(layer_dims_A,
                       X_train, Y_train,
                       X_val,   Y_val,
                       X_test,  Y_test,
                       learning_rates,
                       batch_sizes,
                       activations,
                       epochs)

print("Example result:", results_A[0])

  return 1 / (1 + np.exp(-z))
  return a * (1 - a)
  grads[f'dW{l}'] = dZ @ A_prev.T
  grads[f'dW{l}'] = dZ @ A_prev.T
  Z = self.params[f'W{l}'] @ A_prev + self.params[f'b{l}']
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)


Example result: {'learning_rate': 0.01, 'batch_size': 8, 'activation': 'sigmoid', 'train_losses': [np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), 