# EMC-EB Example
This python notebook trains, tests and evaluates the EMC-EB model. In the first piece of code, testing is done on a longitudinal data set (D2) and in the second piece of code, tesing is done on a cross-sectional data set (D3).
Data set used for training and evalution D1 and D4 ADNI data sets respectively.


Train model on data set D1 and test model on longitudinal data set D2

In [1]:
import pandas as pd
import datetime
from pathlib import Path
from tadpole_algorithms.models import EMCEB
from tadpole_algorithms.preprocessing.split import split_test_train_tadpole
# Load D1_D2 evaluation data set
data_path_train_test = Path("data/TADPOLE_D1_D2.csv")
data_df_train_test = pd.read_csv(data_path_train_test)

# Load D4 evaluation data set
data_path_eval = Path("data/TADPOLE_D4_corr.csv")
data_df_eval = pd.read_csv(data_path_eval)

# Split data in test, train and evaluation data
train_df, test_df, eval_df = split_test_train_tadpole(data_df_train_test, data_df_eval)

# Indicate what data set is the training and testing dataset
train = "d1d2"
test = "d1d2"

# Define model
model = EMCEB()

# Preprocess and set data 
model.set_data(train_df, test_df, train, test)

# Train model
# Note to self: number of bootstraps set to 1 for computation speed. Should be 100 to compute CIs.
model.train()

# Predict forecast on the test set
forecast_df_d2 = model.predict()


100%|██████████| 1/1 [00:29<00:00, 29.09s/it]
100%|██████████| 1/1 [00:31<00:00, 31.26s/it]


Train model on data set D1 and test model on cross sectional data set D3

In [1]:
import pandas as pd
import datetime
from pathlib import Path
from tadpole_algorithms.models import EMCEB
from tadpole_algorithms.preprocessing.split import split_test_train_d3
from tadpole_algorithms.preprocessing.rewrite_df import rewrite_d3

# Load D1_D2 train and possible test data set
data_path_train = Path("data/TADPOLE_D1_D2.csv")
data_df_train = pd.read_csv(data_path_train)

# Load D3 possible test set
data_path_test = Path("data/TADPOLE_D3.csv")
data_df_test = pd.read_csv(data_path_test)

# Load D4 evaluation data set
data_path_eval = Path("data/TADPOLE_D4_corr.csv")
data_df_eval = pd.read_csv(data_path_eval)

# Split data in test, train and evulation data
train_df, test_df, eval_df = split_test_train_d3(data_df_train, data_df_test, data_df_eval)
test_df = rewrite_d3(test_df)

# Indicate what data set is the training and testing dataset
train = "d1d2"
test = "d3"

# Define model
model = EMCEB()

# Preprocess and set data 
model.set_data(train_df, test_df, train, test)

# Train model
# Note to self: number of bootstraps set to 1 for computation speed. Should be 100 to compute CIs.
model.train()

# Predict forecast on the test set
forecast_df_d3 = model.predict()

100%|██████████| 1/1 [00:01<00:00,  1.27s/it]
100%|██████████| 1/1 [00:01<00:00,  1.83s/it]


Evaluate model tested on D2 on ADNI data set D4

In [2]:
from tadpole_algorithms.evaluation import evaluate_forecast
from tadpole_algorithms.evaluation import print_metrics

# Evaluate the model 
dictionary = evaluate_forecast(eval_df, forecast_df_d2)

# Print metrics
print_metrics(dictionary)

[[75 11  0]
 [18 72  2]
 [ 2 15 15]]
mAUC (multiclass Area Under Curve):0.8825280973084624
bca (balanced classification accuracy):0.8929811024200491
adasMAE (ADAS13 Mean Aboslute Error):6.782891500645058
ventsMAE (Ventricles Mean Aboslute Error):0.005580346787601667
adasWES (ADAS13 Weighted Error Score):nan
ventsWES (Ventricles Weighted Error Score ):nan
adasCPA (ADAS13 Coverage Probability Accuracy for 50% Confidence Interval:0.5
ventsCPA (Ventricles Coverage Probability Accuracy for 50% Confidence Interval:0.5


Evaluate model tested on D3 on ADNI data set D4

In [2]:
from tadpole_algorithms.evaluation import evaluate_forecast
from tadpole_algorithms.evaluation import print_metrics

# Evaluate the model 
dictionary = evaluate_forecast(eval_df, forecast_df_d3)

# Print metrics
print_metrics(dictionary)

[[69 17  0]
 [16 74  2]
 [ 2 14 16]]
mAUC (multiclass Area Under Curve):0.8745286452059269
bca (balanced classification accuracy):0.8877634141304398
adasMAE (ADAS13 Mean Aboslute Error):6.580346139857704
ventsMAE (Ventricles Mean Aboslute Error):0.008598618430909475
adasWES (ADAS13 Weighted Error Score):nan
ventsWES (Ventricles Weighted Error Score ):nan
adasCPA (ADAS13 Coverage Probability Accuracy for 50% Confidence Interval:0.5
ventsCPA (Ventricles Coverage Probability Accuracy for 50% Confidence Interval:0.5
