## Model building - Neural network

In [16]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import roc_auc_score, accuracy_score
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l2
import keras_tuner as kt

### Import data

In [6]:
df_train = pd.read_csv("./data/data_train_borderline_smote.csv")
df_val = pd.read_csv("./data/data_val.csv")
df_test = pd.read_csv("./data/data_test.csv")

In [7]:
X_train = df_train.iloc[:, 1:].values
y_train = df_train.iloc[:, 0].values
X_val = df_val.iloc[:, 1:].values
y_val = df_val.iloc[:, 0].values
X_test = df_test.iloc[:, 1:].values
y_test = df_test.iloc[:, 0].values

### Build baseline model

In [38]:
model = Sequential()
model.add(Input(shape=(X_train.shape[1],)))
model.add(Dense(64, activation="relu"))
model.add(Dense(32, activation="relu"))
model.add(Dense(1, activation="sigmoid"))
model.compile(
    loss="binary_crossentropy",
    metrics=["accuracy", AUC(name="roc_auc")],
    optimizer=Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
    )

In [39]:
history = model.fit(
    X_train,
    y_train,
    epochs=100,
    batch_size=32,
    validation_data=(X_val, y_val),
    verbose=0
    )

In [40]:
training_auc = history.history["roc_auc"][-1]
validation_auc = history.history["val_roc_auc"][-1]
training_accuracy = history.history["accuracy"][-1]
validation_accuracy = history.history["val_accuracy"][-1]
print("Baseline Model Results:")
print(f"Training AUC: {round(training_auc, 4)}, Accuracy: {round(training_accuracy, 4)}")
print(f"Validation AUC: {round(validation_auc, 4)}, Accuracy: {round(validation_accuracy, 4)}")

Baseline Model Results:
Training AUC: 0.9301, Accuracy: 0.841
Validation AUC: 0.7919, Accuracy: 0.799


* Xie's 2019 research paper reports that the neural network achieved an AUC of 0.7949 and an accuracy of 0.8241.
* Assume the optimal performance is 0.7949.
* Since the training AUC (0.9301) is larger than the optimal performance, we don't have the problem of high avoidable bias. 
* Instead, there is a larger gap between training AUC (0.9301) and validation AUC (0.7919), the model has the problem of high variance.
* Before adding regularisation to reduce the variance, the model architecture such as number of layers and units and learning rate were tuned. This also help reduce variance.

### Save baseline model

In [43]:
# model.save('./hide/training history/baseline_model_v2.keras')

### Model selection / tuning

#### Reduce avoidable bias

In [55]:
def build_model(hp):
    model = Sequential()
    model.add(Input(shape=(X_train.shape[1],)))
    for i in range(hp.Int("num_layers", 2, 4)):
        model.add(Dense(
            hp.Int(f"units_{i}", min_value=32, max_value=128, step=32),activation="relu"
            )
        )
    model.add(Dense(1, activation="sigmoid"))
    model.compile(
        loss="binary_crossentropy",
        metrics=["accuracy", AUC(name="auc")],
        optimizer=Adam(
            learning_rate=hp.Float("learning_rate", 1e-4, 1e-2, sampling="log"),
            beta_1=0.9, beta_2=0.999, epsilon=1e-08
            )
    )
    return model

In [56]:
tuner = kt.RandomSearch(
    build_model,
    objective=kt.Objective("val_auc", direction="max"),
    max_trials=10,
    executions_per_trial=1,
    directory="hide/training history",
    project_name="train_bigger_model_v2"
)

In [57]:
tuner.search_space_summary()

Search space summary
Default search space size: 4
num_layers (Int)
{'default': None, 'conditions': [], 'min_value': 2, 'max_value': 4, 'step': 1, 'sampling': 'linear'}
units_0 (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 128, 'step': 32, 'sampling': 'linear'}
units_1 (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 128, 'step': 32, 'sampling': 'linear'}
learning_rate (Float)
{'default': 0.0001, 'conditions': [], 'min_value': 0.0001, 'max_value': 0.01, 'step': None, 'sampling': 'log'}


In [58]:
tuner.search(X_train, y_train, epochs=100, validation_data=(X_val, y_val))

Trial 10 Complete [00h 05m 33s]
val_auc: 0.8140186071395874

Best val_auc So Far: 0.8144325613975525
Total elapsed time: 01h 04m 32s


In [59]:
best_hp = tuner.get_best_hyperparameters(num_trials=1)[0]
print(f"The best NN architecture: {best_hp.values}")

The best NN architecture: {'num_layers': 2, 'units_0': 128, 'units_1': 128, 'learning_rate': 0.004782203154550918, 'units_2': 96, 'units_3': 32}


In [60]:
tuner.results_summary(1)

Results summary
Results in hide/training history/train_bigger_model_v2
Showing 1 best trials
Objective(name="val_auc", direction="max")

Trial 02 summary
Hyperparameters:
num_layers: 2
units_0: 128
units_1: 128
learning_rate: 0.004782203154550918
units_2: 96
units_3: 32
Score: 0.8144325613975525


In [61]:
best_model = tuner.hypermodel.build(best_hp)
history = best_model.fit(X_train, y_train, epochs=100, validation_data=(X_val, y_val), verbose=0)

In [62]:
train_auc = history.history["auc"][-1]
val_auc = history.history["val_auc"][-1]
train_acc = history.history["accuracy"][-1]
val_acc = history.history["val_accuracy"][-1]

In [63]:
print(f"Training AUC: {round(train_auc, 4)}, Accuracy: {round(train_acc, 4)}")
print(f"Validation AUC: {round(val_auc, 4)}, Accuracy: {round(val_acc, 4)}")

Training AUC: 0.9226, Accuracy: 0.8326
Validation AUC: 0.8038, Accuracy: 0.8006


#### Reduce variance

In [68]:
def build_model(hp): 
    model = Sequential()
    model.add(Input(shape=(X_train.shape[1],)))
    model.add(Dense(128, activation="relu", kernel_regularizer=l2(hp.Float("l2_0", min_value=1e-5, max_value=1e-2, sampling="log"))))
    model.add(Dense(128, activation="relu", kernel_regularizer=l2(hp.Float("l2_1", min_value=1e-5, max_value=1e-2, sampling="log"))))
    model.add(Dense(96, activation="relu", kernel_regularizer=l2(hp.Float("l2_2", min_value=1e-5, max_value=1e-2, sampling="log"))))
    model.add(Dense(32, activation="relu", kernel_regularizer=l2(hp.Float("l2_3", min_value=1e-5, max_value=1e-2, sampling="log"))))
    model.add(Dense(1, activation="sigmoid"))
    model.compile(
        loss="binary_crossentropy",
        metrics=["accuracy", AUC(name="auc")],
        optimizer=Adam(learning_rate=0.004782203154550918, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
    )
    return model

In [69]:
tuner = kt.RandomSearch(
    build_model,
    objective="val_auc",
    max_trials=10,
    executions_per_trial=2,
    directory="hide/training history",
    project_name="l2_tuning_v2"
)

In [70]:
tuner.search_space_summary()

Search space summary
Default search space size: 4
l2_0 (Float)
{'default': 1e-05, 'conditions': [], 'min_value': 1e-05, 'max_value': 0.01, 'step': None, 'sampling': 'log'}
l2_1 (Float)
{'default': 1e-05, 'conditions': [], 'min_value': 1e-05, 'max_value': 0.01, 'step': None, 'sampling': 'log'}
l2_2 (Float)
{'default': 1e-05, 'conditions': [], 'min_value': 1e-05, 'max_value': 0.01, 'step': None, 'sampling': 'log'}
l2_3 (Float)
{'default': 1e-05, 'conditions': [], 'min_value': 1e-05, 'max_value': 0.01, 'step': None, 'sampling': 'log'}


In [71]:
tuner.search(X_train, y_train, epochs=50, validation_data=(X_val, y_val))

Trial 10 Complete [00h 09m 04s]
val_auc: 0.8137386441230774

Best val_auc So Far: 0.8168774545192719
Total elapsed time: 01h 23m 28s


In [72]:
best_hp = tuner.get_best_hyperparameters(num_trials=1)[0]
print("Best hyperparameters:", best_hp.values)

Best hyperparameters: {'l2_0': 9.315505072185872e-05, 'l2_1': 0.0008294173681456724, 'l2_2': 0.0001065019888303519, 'l2_3': 0.0006633603807097294}


In [73]:
tuner.results_summary(1)

Results summary
Results in hide/training history/l2_tuning_v2
Showing 1 best trials
Objective(name="val_auc", direction="max")

Trial 03 summary
Hyperparameters:
l2_0: 9.315505072185872e-05
l2_1: 0.0008294173681456724
l2_2: 0.0001065019888303519
l2_3: 0.0006633603807097294
Score: 0.8168774545192719


In [74]:
best_trial = tuner.oracle.get_best_trials(num_trials=1)[0]
training_accuracy = best_trial.metrics.get_last_value("accuracy")
validation_accuracy = best_trial.metrics.get_last_value("val_accuracy")
training_auc = best_trial.metrics.get_last_value("auc")
validation_auc = best_trial.metrics.get_last_value("val_auc")

In [75]:
print(f"Training AUC: {round(training_auc, 4)}, Accuracy: {round(training_accuracy, 4)}")
print(f"Validation AUC: {round(validation_auc, 4)}, Accuracy: {round(validation_accuracy, 4)}")

Training AUC: 0.8262, Accuracy: 0.7571
Validation AUC: 0.8169, Accuracy: 0.6785


* Observation: After adding the regularisation, the training AUC and accuracy dropped. The validation AUC slightly increased, but the validation accuracy significantly decreased.
* Possible reason: The training dataset was resampled using BorderlineSMOTE, which balances the classes. However, the validation and test sets remain stratified splits of the original data, which might still be imbalanced. When the model’s probability outputs are less confident due to regularization, the default threshold of accuracy i.e. 0.5 might misclassify more instances, especially in an imbalanced setting.
* We may not need to take further action with the following reasons:
    1. AUC is robust to class imbalance, while accuracy is not. AUC is chosen as the primary metric.
    2. The model without regularisation has a large gap between training AUC (0.9226) and validation AUC (0.8038). It suggests overfitting. The model with regularisation narrows this gap (0.8262 vs. 0.8169), which is a sign of better generalization, but at the cost of overall performance.

### Model evaluation

In [80]:
best_model = tuner.hypermodel.build(best_hp)

In [81]:
history = best_model.fit(
    X_train, y_train,
    epochs=50,
    batch_size=32,
    validation_data=(X_val, y_val),
    verbose=0
)

In [82]:
test_loss, test_accuracy, test_auc = best_model.evaluate(X_test, y_test, verbose=0)

In [83]:
print("Test Set Results for Best Model with L2 Regularization:")
print(f"Test AUC: {round(test_auc, 4)}, Accuracy: {round(test_accuracy, 4)}")

Test Set Results for Best Model with L2 Regularization:
Test AUC: 0.8098, Accuracy: 0.6886


### Conclusion

* Compared to the neural network fitted by Xie et al (2019), my NN has only slightly higher test AUC.

### Reference
* Xie, Zidian, et al. "Building risk prediction models for type 2 diabetes using machine learning techniques." Preventing chronic disease 16 (2019): E130.