List of python package to be installed:

1. numpy
2. pytorch
3. pandas
4. pycox
5. sklearn

In [7]:
import pandas as pd
import torch
import numpy as np

from dataset.syndata.syndata_v01 import DGP
from model.experiment import Experiment

Load dataset

In [8]:
# minimum censored instances are necessary to conduct predictive performance comparison with other methods
synthetic = DGP(lambda_1 = 1.5, lambda_2 = 0.5)
df =  synthetic.generate_samples(N = 3000, random_state = 13, p_censor = 0.1)

Spcify each column in relation to the generative process. Each variable indicates the following:

1. t_cols: survival time
2. s_cols: (failure) event (0: right-censored, 1: death)
3. c_cols: (continuous) physiological measurements
4. x_cols: (binary) morbidity indicators
5. a_cols: (continuous) age
6. b_cols: (both continuous and discrete) personal background

In [9]:
t_cols = ['durations'] 
s_cols = ['events']
c_cols = ['c']
b_cols = ['b']
a_cols = ['a']
x_cols = ['x']

Specify a set of columns, i.e. a set of all continuous variables except for survival time, that we normalise before the analysis.

In [10]:
cols_to_std = ['c']

In [11]:
directory = "syndata_v1/custom/"

Define Experiment clase whose argument includes main dataset (df) and a set of columns defined above.

In [12]:
experiment = Experiment(df, 
                t_cols = t_cols, s_cols = s_cols, c_cols = c_cols,
                x_cols = x_cols, a_cols = a_cols, b_cols = b_cols,
                       directory = directory)

Specify hyperparameters including

1. hidden_dim: # of neurons
2. lr: learning rate
3. n_epochs: # of epochs 
4. batch_size: batch size
5. device: "cpu", "mps" and "gpu"

Performance evaluations

In [13]:
n_epochs = 1000
batch_size = 512
lr = 5e-3
hidden_dim = 64
alpha = 0.7
beta = 1.0
gamma = 1.0
z_dim = 2
k_peaks = 1

Train M4VAE and compare its performance against other baseline methods.

Key arguments:

return_baseline: $\textbf{standard Cox, DeepCox, Deep time-dependent Cox, DeepHit, DeSurv and SuMo-Net}$ 

return_metrics: $\textbf{c-index, brier score, negative binomial log-likelihood and log-likelihood}$, calibration score $\textbf{D-calibration}$, \textbf{silverman test} and \textbf{diptest}

In [14]:
experiment.train(cols_to_std = cols_to_std, z_dim = z_dim, hidden_dim = hidden_dim, alpha =alpha, beta = beta, 
                 gamma = gamma, lr = lr, n_epochs = n_epochs, batch_size = batch_size, 
                 logging_freq = 10, max_wait = 20, device = 'cpu', k_peaks = k_peaks, verbose = True, 
                 return_baseline = False, return_metrics = False)

Encoder: cpu specified, cpu used
Decoder: cpu specified, cpu used
Decoder: cpu specified, cpu used
	Data set size 2400, batch size 512.

	Epoch:  0. Total loss:    50576.93
best_epoch: 0
	Epoch:  0. Total val loss:     5070.46
	Epoch: 10. Total loss:     7773.11
best_epoch: 10
	Epoch: 10. Total val loss:      953.61
	Epoch: 20. Total loss:     7234.74
best_epoch: 20
	Epoch: 20. Total val loss:      934.89
	Epoch: 30. Total loss:     7114.64
best_epoch: 30
	Epoch: 30. Total val loss:      912.70
	Epoch: 40. Total loss:     7084.90
best_epoch: 40
	Epoch: 40. Total val loss:      887.45
	Epoch: 50. Total loss:     7198.91
	Epoch: 50. Total val loss:      899.36
	Epoch: 60. Total loss:     7057.06
best_epoch: 60
	Epoch: 60. Total val loss:      877.59
	Epoch: 70. Total loss:     7052.91
	Epoch: 70. Total val loss:      900.92
	Epoch: 80. Total loss:     7033.14
	Epoch: 80. Total val loss:      890.50
	Epoch: 90. Total loss:     7105.73
	Epoch: 90. Total val loss:      896.16
	Epoch: 100. T

	Epoch: 980. Total loss:     5477.85
	Epoch: 980. Total val loss:      695.91
	Epoch: 990. Total loss:     5475.11
	Epoch: 990. Total val loss:      691.55
loading low_


In [15]:

def summary_data( experiment ):
    obs, cens = experiment.df[experiment.events].value_counts()

    print(f"Observed: {obs} ({np.round(obs / (obs + cens), 2)} )")
    print(f"Censored: {cens}, ({np.round(cens / (obs + cens), 2)})")
    
    b_dim = len(experiment.bidx) + len(experiment.aidx)
    c_dim = experiment.c_dim
    x_dim = len(experiment.xidx)
    
    print(f"(Continuous) covariates: {c_dim}")
    print(f"(Binary) covariates: {x_dim}")
    print(f"Auxiliary covariates: {b_dim}")
    
    event_subset = experiment.df[experiment.df[experiment.events] == 1]
    t_mean, t_max = event_subset[experiment.durations].mean(), event_subset[experiment.durations].max()
    
    print(f"Event time / Mean: {np.round(t_mean, 1)}")
    print(f"Event time / Max: {np.round(t_max, 1)}")
    
    censor_subset = experiment.df[experiment.df[experiment.events] == 0]
    s_mean, s_max = censor_subset[experiment.durations].mean(), censor_subset[experiment.durations].max()
    
    print(f"Censoring time / Mean: {np.round(s_mean, 1)}")
    print(f"Censoring time / Max: {np.round(s_max, 1)}")

In [16]:
summary_data(experiment)

Observed: 2699 (0.9 )
Censored: 301, (0.1)
(Continuous) covariates: 1
(Binary) covariates: 1
Auxiliary covariates: 2
Event time / Mean: 0.9
Event time / Max: 3.8
Censoring time / Mean: 0.5
Censoring time / Max: 2.4
