<a href="https://colab.research.google.com/github/tosha26-debug/Heat-Disease-MLP-Model/blob/main/MLP_Heart_Disease_ML.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Heart Disease Classification using Multi-Layer Perceptron (MLP)**


---

\
This project implements a Multi-Layer Perceptron (MLP) neural network to classify heart disease severity based on patient health attributes. The model was trained on the UCI Heart Disease dataset, which includes features such as age, cholesterol, blood pressure, and chest pain type.

\
The MLP was built from scratch using NumPy, implementing all key components manually — including forward propagation, ReLU and Softmax activations, cross-entropy loss with L2 regularization, and backpropagation for weight updates.

\
During training, the model achieved:

- Training Accuracy: 96%

- Validation Accuracy: 51%

- Test Accuracy: 56%

\
Despite strong training performance, the model showed significant overfitting and poor generalization to unseen data, especially for minority classes.
Future improvements include applying regularization (dropout, weight decay), class rebalancing techniques (SMOTE, class weights), hyperparameter tuning, and alternative model comparisons (e.g., Random Forest, XGBoost) to enhance predictive reliability.














In [None]:

# --------------------------------------------------
#   Load packages:  NumPy, pandas and scikit-learn
# -------------------------------------------------

#-----------STEP 1 IMPORT NECESSARY LIBRARIES-----------------

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Implements random generator used for initialisation
RNG = np.random.default_rng(42)






In [None]:

#-----------STEP 2 DOWNLOADING AND INSPECTING DATASET--------------
#-------------Heart Disease Dataset (UCI Cleveland)----------------


      # Downloads dataset from UCI repository
      # Saves under new file name
!wget -O processed.cleveland.data \
  "https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data"

      # Shows the file size
      # Prints first five lines so I can preview the raw data format
!ls -l processed.cleveland.data
!head -n 5 processed.cleveland.data

      # Imports Pandas library
import pandas as pd

      # Manually assigns coloumn names
      # 'num' is the target label (indicating heart disease severity 0-4)
COLUMN_NAMES = [
    "age","sex","cp","trestbps","chol","fbs","restecg","thalach","exang","oldpeak","slope","ca","thal","num"
]

      # Reads the csv and asigns the new coloumn names created
df = pd.read_csv("processed.cleveland.data", header=None, names=COLUMN_NAMES)
      # Shows the number of rows and coloumns
print("Shape:", df.shape)
      # Displays the first 5 rows in a readable table
display(df.head())
      # Lists all the values in the num coloumn
print("Unique target values:", sorted(df["num"].unique()))
      # Checks wether any missing values '?' exist
print("Any '?' present?:", df.isin(["?"]).any().any())


--2025-10-15 20:21:43--  https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified
Saving to: ‘processed.cleveland.data’

processed.cleveland     [ <=>                ]  18.03K  --.-KB/s    in 0.07s   

2025-10-15 20:21:44 (268 KB/s) - ‘processed.cleveland.data’ saved [18461]

-rw-r--r-- 1 root root 18461 Oct 15 20:21 processed.cleveland.data
63.0,1.0,1.0,145.0,233.0,1.0,2.0,150.0,0.0,2.3,3.0,0.0,6.0,0
67.0,1.0,4.0,160.0,286.0,0.0,2.0,108.0,1.0,1.5,2.0,3.0,3.0,2
67.0,1.0,4.0,120.0,229.0,0.0,2.0,129.0,1.0,2.6,2.0,2.0,7.0,1
37.0,1.0,3.0,130.0,250.0,0.0,0.0,187.0,0.0,3.5,3.0,0.0,3.0,0
41.0,0.0,2.0,130.0,204.0,0.0,2.0,172.0,0.0,1.4,1.0,0.0,3.0,0
Shape: (303, 14)


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,num
0,63.0,1.0,1.0,145.0,233.0,1.0,2.0,150.0,0.0,2.3,3.0,0.0,6.0,0
1,67.0,1.0,4.0,160.0,286.0,0.0,2.0,108.0,1.0,1.5,2.0,3.0,3.0,2
2,67.0,1.0,4.0,120.0,229.0,0.0,2.0,129.0,1.0,2.6,2.0,2.0,7.0,1
3,37.0,1.0,3.0,130.0,250.0,0.0,0.0,187.0,0.0,3.5,3.0,0.0,3.0,0
4,41.0,0.0,2.0,130.0,204.0,0.0,2.0,172.0,0.0,1.4,1.0,0.0,3.0,0


Unique target values: [np.int64(0), np.int64(1), np.int64(2), np.int64(3), np.int64(4)]
Any '?' present?: True


In [None]:

# -------------------------------------------------
#   Cleaning / Preprocessing
# -------------------------------------------------

#-----------STEP 3 STANDARDISE DATA AND REPLACE MISSING VALUES------------



COLUMN_NAMES = [
    "age","sex","cp","trestbps","chol","fbs","restecg",
    "thalach","exang","oldpeak","slope","ca","thal","num"
]


class StandardScaler:

# Standardises features so the mean=0 and std deviation=1
  def fit(self,X):
    self.mean = np.mean(X, axis=0)
    self.std = np.std(X, axis=0)
    #avoid divition by 0
    self.std[self.std == 0] = 1.0

  def transform(self,X):
    return (X - self.mean) / self.std

  def fit_transform(self,X):
    self.fit(X)
    return self.transform(X)


# Loads the raw CSV and prepares it
def load_and_preprocess(path,
                        categorical_cols = ["cp","restecg","slope","thal","ca"],
                        drop_na=True):


  """
  Load the USI Cleveland CSV (no header) and preprocesses:
  - Replace '?' to NaN
  - Convert columns to numeric
  - Impute missing values with column median
  - One-hot encode categorical_cols
  - Standardise continuous features (after splitting train/test)
  Return raw dataframe (post-encoding) and label vector (ints)
  """


  df = pd.read_csv(path, header=None, names=COLUMN_NAMES)
        # Replace ? and convert
  df.replace('?', np.nan, inplace=True)
  for col in df.columns:
    df[col] = pd.to_numeric(df[col], errors='coerce')
        # Impute missing with median
  df.fillna(df.median(), inplace=True)

        # Make sure the target 'num' is an integer (0..4)
  df['num'] = df['num'].astype(int)

        # One-hot encode selected categorical columns
  df = pd.get_dummies(df, columns=categorical_cols, prefix=categorical_cols, drop_first=False)

        # Seperate features and label
  X = df.drop(columns=['num']).values.astype(float)
  y = df['num'].values.astype(int)
  return X, y, df.drop(columns=['num']).columns.tolist()



In [None]:

# -------------------------------------------------
#   MLP Implementation (NumPy)
# -------------------------------------------------

#-----------STEP 4 DEFINE ACTIVATION FUNCTIONS------------



# Creation of non-linear decision boundary
def relu(Z):
  return np.maximum(0, Z)

# Derives the activation function = gradient(slope) of loss with respct to each weight
def relu_deriv(Z):
  return (Z > 0).astype(float)

# Converts raw scores of each class to probabilities = USED TO COMPUTE CROSS-ENTROPY LOSS AND MAKE PREDICTIONS
def stable_softmax(Z):
      # Z: (N, K)
  Z_shifted = Z - np.max(Z, axis=1, keepdims=True)
  expZ = np.exp(Z_shifted)
  probs = expZ / np.sum(expZ, axis=1, keepdims=True)
  return probs

# Converts class labels (0,1,2,3,4) to one-hot vectors ([1,0,0,0])
def one_hot_encode(y, num_classes=None):
  if num_classes is None:
    classes = np.unique(y)
    num_classes = classes.max() + 1
  Y = np.zeros((len(y), num_classes))
  Y[np.arange(len(y)), y] = 1.0
  return Y



#-----------STEP 5 Transforms raw features into predictions ------------


class MLP:

  # Takes layer sizes [D_in, H1, H2, ..., K]
  # Sets learning rate
  # Sets L2 regularisation strength = how strongly model penalises large weights (prevent overfitting)
  def __init__(self, layer_sizes, lr=0.01, reg_lambda=0.0, seed=42):

    self.sizes = layer_sizes
    self.L = len(layer_sizes) - 1
    self.lr = lr
    self.reg_lambda = reg_lambda
    self.rng = np.random.default_rng(seed)
    self._init_weights()



  # Store weights W[i] matrics and biasis b[i] vector for each layer
  def _init_weights(self):

    self.W = []
    self.b = []
    for i in range(self.L):
      n_in = self.sizes[i]
      n_out = self.sizes[i+1]
      # 'He initialisation' -> Sets std deviation for the randomised weights so activations can handle it
      if i < self.L - 1:
        std = np.sqrt(2.0 / n_in)
      else:
      # 'Xavier initialisation' ->  Keeps output layer stable
        std = np.sqrt(1.0 / n_in)
      # Randomised weight with mean=0
      W_i = self.rng.normal(0, std, size=(n_in, n_out))
      # Bias vector set to 0
      b_i = np.zeros((1, n_out))
      self.W.append(W_i)
      self.b.append(b_i)



  # Passes inputs through each layer, applying w, b and activation functions = final output probabilities
  def forward(self, X):

    A = [X]
    Zs = [None]
    # Loops through each layer
    for i in range(self.L):
      # Mathematical calculation = matrix multiplication with weights and adds bias
      Z = A[-1] @ self.W[i] + self.b[i]
      Zs.append(Z)
      if i < self.L - 1:
        # For hidden layers, ReLU is applied (helps learn complex patterns)
        A_next = relu(Z)
      else:
        # For output layer, SoftMax is applied = probability
        A_next = stable_softmax(Z)
      A.append(A_next)
    return A, Zs



  # Compute average cross-entropy loss between predicted probability and true label
  # Adds L2 regulisation penalty
  def compute_loss(self, Y_pred, Y_true):

    """
    Y_pred: PREDICTS PROBABILITY FROM SOFTMAX
    Y_true: ONE-HOT ENCODED TRUE LABEL
    Return average cross-entropy + L2 reg
    """

    N = Y_true.shape[0]
    eps = 1e-12
    # Measures how close predicted probability are to true label
    loss_ce = -np.sum(Y_true * np.log(Y_pred + eps)) / N
    # L2 regulisation
    l2 = 0.0
    if self.reg_lambda > 0:
      # Adds penalty for large weights --> reduce overfitting
      for W in self.W:
        l2 += 0.5 * self.reg_lambda * np.sum(W * W)
        # Total loss = prediction error + weight penalty
    return loss_ce + l2



  # Calculates how much each W and b contribute to prediction error = adjusted during training
  def backward(self, A, Zs, Y_true):
    """
    Backpropagate and compute gradients.
    Returns grads dW, db lists aligned with W and b.
    """

    N = Y_true.shape[0]
    grads_W = [None] * self.L
    grads_b = [None] * self.L

    # Final output probability from softmax
    P = A[-1]
    # Computes gradient of loss (measures how much the change in w&b will affect the loss)
    dZ = (P - Y_true) / N

    for i in reversed(range(self.L)):
      A_prev = A[i]
      dW = A_prev.T @ dZ  # gradient of weights
      db = np.sum(dZ, axis=0, keepdims=True)  # gradient of bias

      # add L2 regularization gradient (prevents using very large weight = avoid overfitting)
      if self.reg_lambda > 0:
        dW += self.reg_lambda * self.W[i]

      grads_W[i] = dW
      grads_b[i] = db

      if i > 0:
          # propagate to previous layer = so each previous layer knows how to adjust its weights
          dA_prev = dZ @ self.W[i].T
          dZ = dA_prev * relu_deriv(Zs[i])
    return grads_W, grads_b



  # Updates W & b using gradient from backpropagation & lr = reduce future errors
  def update_params(self, grads_W, grads_b):
    for i in range(self.L):
      self.W[i] -= self.lr * grads_W[i]
      self.b[i] -= self.lr * grads_b[i]



  # Returns final layers SoftMax probability for each class
  def predict_proba(self, X):
    A, _ = self.forward(X)
    return A[-1]



  # Retuns final predicted class --> choses class with highest probability
  def predict(self, X):
    P = self.predict_proba(X)
    return np.argmax(P, axis=1)




# ----------------------------------------
# Training loop
# ----------------------------------------

#-----------STEP 6 Evaluates Accuracy and Loss ------------




# Trains neural network over multiple epochs using mini-batch gradient descent
# Evaluates accuracy and loss on both training and validation sets
def train_model(model, X_train, Y_train_onehot, X_val, Y_val_onehot,
                epochs=100, batch_size=32, verbose=True):
  N = X_train.shape[0]
  history = {'train_loss':[], 'val_loss':[], 'train_acc':[], 'val_acc':[]}


  # Repeats training for 100 epochs
  for epoch in range(1, epochs+1):
    # shuffle --> randomise order of training sample
    perm = RNG.permutation(N)
    X_shuf = X_train[perm]
    Y_shuf = Y_train_onehot[perm]


    # mini-batches --> splits data into chunks of 32 (common default, prevents very slow config)
    for i in range(0, N, batch_size):
      X_batch = X_shuf[i:i+batch_size]
      Y_batch = Y_shuf[i:i+batch_size]


      # Forward pass --> compute prediction
      # Backward pass --> compute gradients
      # Update --> adjusts W & b
      A, Zs = model.forward(X_batch)
      grads_W, grads_b = model.backward(A, Zs, Y_batch)
      model.update_params(grads_W, grads_b)


    # end of epoch: Measures how well model is doing in training and validation
    P_train = model.predict_proba(X_train)
    P_val = model.predict_proba(X_val)
    train_loss = model.compute_loss(P_train, Y_train_onehot)
    val_loss = model.compute_loss(P_val, Y_val_onehot)
    train_acc = accuracy_score(np.argmax(Y_train_onehot, axis=1), np.argmax(P_train, axis=1))
    val_acc = accuracy_score(np.argmax(Y_val_onehot, axis=1), np.argmax(P_val, axis=1))


    # Stores loss and accuracy for each epoch = plot and analyse
    history['train_loss'].append(train_loss)
    history['val_loss'].append(val_loss)
    history['train_acc'].append(train_acc)
    history['val_acc'].append(val_acc)

    # Prints progress every few epochs (every 10%)
    if verbose and (epoch % max(1, epochs//10) == 0 or epochs == 1):
      print(f"Epoch {epoch:03d}/{epochs} | train_loss {train_loss:.4f} train_acc {train_acc:.3f} | val_loss {val_loss:.4f} val_acc {val_acc:.3f}")

  return history











In [None]:
# ----------------------------------------
# Run MLP on Cleveland Dataset
# ----------------------------------------



#-----------STEP 7 Loads Data, Builds Model, Trains it and Evaluates Results------------



if __name__ == "__main__":
  # Loads dataset from CSV
  path = "processed.cleveland.data"
  X, y, feature_names = load_and_preprocess(path)


  # If some classs are missing from dataset (rare), remapt to 0..k-1
  classes = np.unique(y)
  class_map = {c: i for i,c in enumerate(classes)}
  y_mapped = np.array([class_map[c] for c in y])
  K = len(classes)
  print("Detected classes:", classes, "-> mapped to 0..", K-1)


  # Train/Val/Test split
  # First split --> 70% Train, 30% Temp (temporary)
  X_train, X_temp, y_train, y_temp = train_test_split(X, y_mapped, test_size=0.3, random_state=42, stratify=y_mapped)

  # Second split --> Temp --> 50% Val, 50% Test
  X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42, stratify=y_temp)


  # Standardise features using train stats --> Normalises features to zero mean and unit variance
                                              # PREVENTS LARGE FEATURES FROM DOMINATING SMALLEER ONES
  scaler = StandardScaler()
  X_train = scaler.fit_transform(X_train)
  X_val = scaler.transform(X_val)
  X_test = scaler.transform(X_test)


  # One-hot encode labels
  # Converts class labels (0,1,2,3,4) to one-hot vectors ([1,0,0,0]) --> Needed for softmax + cross-entropy
  Y_train_onehot = one_hot_encode(y_train, num_classes=K)
  Y_val_onehot = one_hot_encode(y_val, num_classes=K)
  Y_test_onehot = one_hot_encode(y_test, num_classes=K)


  # Define model architecture --> number of input features
  D = X_train.shape[1]
  # 2 Hidden layers
  layer_sizes = [D, 32, 16, K]
  # Establishes Lr and Regularisation
  model = MLP(layer_sizes, lr=0.01, reg_lambda=1e-4, seed=42)


  # Train --> 200 epochs & batch size 16
  # Loss and Accuracy is tracked
  history = train_model(model, X_train, Y_train_onehot, X_val, Y_val_onehot, epochs=200, batch_size=16, verbose=True)


  # Predicts class label
  # Prints accuracy, confusion matrices and precision/recall/F1 score
  y_pred = model.predict(X_test)
  print("\nTest Accuracy:", accuracy_score(y_test, y_pred))
  print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
  print("\nClassification Report:\n", classification_report(y_test, y_pred, zero_division=0))




Detected classes: [0 1 2 3 4] -> mapped to 0.. 4
Epoch 020/200 | train_loss 0.9101 train_acc 0.637 | val_loss 1.0473 val_acc 0.600
Epoch 040/200 | train_loss 0.7646 train_acc 0.698 | val_loss 1.0350 val_acc 0.533
Epoch 060/200 | train_loss 0.6653 train_acc 0.731 | val_loss 1.0772 val_acc 0.489
Epoch 080/200 | train_loss 0.5767 train_acc 0.769 | val_loss 1.1267 val_acc 0.511
Epoch 100/200 | train_loss 0.5000 train_acc 0.816 | val_loss 1.1879 val_acc 0.533
Epoch 120/200 | train_loss 0.4294 train_acc 0.868 | val_loss 1.2724 val_acc 0.556
Epoch 140/200 | train_loss 0.3687 train_acc 0.896 | val_loss 1.3735 val_acc 0.533
Epoch 160/200 | train_loss 0.3172 train_acc 0.925 | val_loss 1.4492 val_acc 0.511
Epoch 180/200 | train_loss 0.2739 train_acc 0.958 | val_loss 1.5600 val_acc 0.511
Epoch 200/200 | train_loss 0.2381 train_acc 0.962 | val_loss 1.6618 val_acc 0.511

Test Accuracy: 0.5652173913043478
Confusion Matrix:
 [[22  3  0  0  0]
 [ 6  0  2  1  0]
 [ 0  1  2  2  0]
 [ 0  0  3  2  0]
 [ 0 