# Sepsis Classifier

The previous sections - Data exploration, preprocessing and Feature Engineering -- have produced a training and a test data set from the MIT MIMIC-III Patient Database for Sepsis Classification. In this section, we will feed the training data to a series of scikit learning classification algorithms to determine if we can recognize sepsis from a patient's vitals and charts.

This notebook will:
* Upload training and test data.
* Define and train linear binary classification models.
* Compute false omission rate and accuracy, recall, precision result measures. 
* Adjust hyperparameters to maximize results.
* Select the best linear model as benchmark.
* Define and train non-linear binary classifiers. 
* Measure against the benchmark model.
* Evaluate the various classifiers and draw conclusions.

---

## Check notebook environment

In [1]:
from xgboost import XGBClassifier

## if xgboost is not installed, uncomment and run the pip install below. Restart the Kernel.

#!pip install xgboost

## Load Data to S3

The sepsis_data directory created contains two files: a `train_bal.csv` and `test.csv` file with the features and sepsis/non-sepsis labels.

In [2]:
import numpy as np
import pandas as pd
import boto3
import sagemaker
import os

In [3]:
# session and role
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()

# create an S3 bucket
bucket = sagemaker_session.default_bucket()

## Upload the training data to S3

Load the entire directory to S3. 

In [4]:
# should be the name of directory you created to save your features data
data_dir = 'sepsis_data'

# set prefix, a descriptive name for a directory  
prefix = 'sepsis'

# set output path for trainig
output_path = 's3://{}/{}/output'.format(bucket,prefix)

# upload all data to S3
#data_loc = sagemaker_session.upload_data(path=data_dir, bucket=bucket, key_prefix=prefix)
train_loc = sagemaker_session.upload_data(path=os.path.join(data_dir,'train_bal.csv'), bucket=bucket, key_prefix=prefix)
val_loc = sagemaker_session.upload_data(path=os.path.join(data_dir,'validation.csv'), bucket=bucket, key_prefix=prefix)
test_loc = sagemaker_session.upload_data(path=os.path.join(data_dir,'test.csv'), bucket=bucket, key_prefix=prefix)
print(train_loc)
print(val_loc)
print(test_loc)

s3://sagemaker-us-east-1-483763898801/sepsis/train_bal.csv
s3://sagemaker-us-east-1-483763898801/sepsis/validation.csv
s3://sagemaker-us-east-1-483763898801/sepsis/test.csv


In [5]:
s3_train = sagemaker.s3_input(s3_data=train_loc, content_type="text/csv")
s3_val = sagemaker.s3_input(s3_data=val_loc, content_type="text/csv")
s3_test = sagemaker.s3_input(s3_data=test_loc, content_type="text/csv")

#s3_train = sagemaker.s3_input(s3_data=train_loc)
#s3_val = sagemaker.s3_input(s3_data=val_loc)
#s3_test = sagemaker.s3_input(s3_data=test_loc)

### Read in data file

In [6]:
train_file = pd.read_csv(os.path.join(data_dir,'train_bal.csv'), header=None)
test_file = pd.read_csv(os.path.join(data_dir,'test.csv'), header=None)

print(train_file.shape, test_file.shape)

(55760, 13) (14321, 13)


In [7]:
y_train = train_file.iloc[:,0]
X_train = train_file.iloc[:,1:]

y_test = test_file.iloc[:,0]
X_test = test_file.iloc[:,1:]

print("Train set: ", X_train.shape, y_train.shape)
print("Test set: ", X_test.shape, y_test.shape)

Train set:  (55760, 12) (55760,)
Test set:  (14321, 12) (14321,)


### Train a LinearLearner model as a benchmark model

In [8]:
# import LinearLearner
from sagemaker import LinearLearner

# instantiate LinearLearner
linear = LinearLearner(role=role,
                      train_instance_count=1,
                      train_instance_type='ml.c4.xlarge',
                      predictor_type='binary_classifier',
                      output_path = 's3://{}/{}/output'.format(bucket,prefix),
                      sagemaker_session=sagemaker_session,
                      epochs=15)

### Convert data into RecordSet format before training

In [9]:
# create RecordSet of training data
train_features_np = X_train.values.astype('float32')
train_labels_np = y_train.values.astype('float32')

formatted_train_data = linear.record_set(train_features_np, labels=train_labels_np)

In [10]:
%%time
# train the estimator on formatted training data
linear.fit(formatted_train_data)


2020-02-08 08:49:47 Starting - Starting the training job...
2020-02-08 08:49:49 Starting - Launching requested ML instances.........
2020-02-08 08:51:22 Starting - Preparing the instances for training...
2020-02-08 08:52:13 Downloading - Downloading input data...
2020-02-08 08:52:37 Training - Downloading the training image..[34mDocker entrypoint called with argument(s): train[0m
[34m[02/08/2020 08:52:54 INFO 140349995878208] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/resources/default-input.json: {u'loss_insensitivity': u'0.01', u'epochs': u'15', u'feature_dim': u'auto', u'init_bias': u'0.0', u'lr_scheduler_factor': u'auto', u'num_calibration_samples': u'10000000', u'accuracy_top_k': u'3', u'_num_kv_servers': u'auto', u'use_bias': u'true', u'num_point_for_scaler': u'10000', u'_log_level': u'info', u'quantile': u'0.5', u'bias_lr_mult': u'auto', u'lr_scheduler_step': u'auto', u'init_method': u'uniform', u'init_sigma': u'0.01', u'lr_scheduler_


2020-02-08 08:53:14 Uploading - Uploading generated training model
2020-02-08 08:53:14 Completed - Training job completed
Training seconds: 61
Billable seconds: 61
CPU times: user 432 ms, sys: 42.1 ms, total: 474 ms
Wall time: 3min 41s


In [11]:
%%time 
# deploy and create a predictor
linear_predictor = linear.deploy(initial_instance_count=1,
                                instance_type='ml.t2.medium')

---------------------!CPU times: user 314 ms, sys: 33.5 ms, total: 348 ms
Wall time: 10min 32s


## Evaluate the benchmark model
The aim of our model is to optimize for low false ommission rate (FOR), where <br><b>
FOR = (false negative)/(false negative + true negative).</b><br>
<br>
This is to minimize the chance of sepsis cases going undetected.

In [12]:
# test one prediction
test_x_np = X_test.values.astype('float32')
result = linear_predictor.predict(test_x_np[0])

print(result)

[label {
  key: "predicted_label"
  value {
    float32_tensor {
      values: 1.0
    }
  }
}
label {
  key: "score"
  value {
    float32_tensor {
      values: 0.9999866485595703
    }
  }
}
]


In [13]:
def calculate_measures(test_labels, test_preds, verbose=True):
    """
    Calculate the true positive, false positive,
    true negative, false negative, and false omission rate
    """
    # calculate true positives, false positives, true negatives, false negatives
    tp = np.logical_and(test_labels, test_preds).sum()
    fp = np.logical_and(1-test_labels, test_preds).sum()
    tn = np.logical_and(1-test_labels, 1-test_preds).sum()
    fn = np.logical_and(test_labels, 1-test_preds).sum()
    
        # calculate binary classification metrics
    recall = tp / (tp + fn)
    precision = tp / (tp + fp)
    accuracy = (tp + tn) / (tp + fp + tn + fn)
    false_omission_rate = fn / (fn + tn)
    
    # printing a table of metrics
    # calculate binary classification metrics
    recall = tp / (tp + fn)
    precision = tp / (tp + fp)
    accuracy = (tp + tn) / (tp + fp + tn + fn)
    false_omission_rate = fn / (fn + tn)
    
    # printing a table of metrics
    if verbose:
        print(pd.crosstab(test_labels, test_preds, rownames=['actual (row)'], colnames=['prediction (col)']))
        print("\n{:<11} {:.3f}".format('Recall:', recall))
        print("{:<11} {:.3f}".format('Precision:', precision))
        print("{:<11} {:.3f}".format('Accuracy:', accuracy))
        print("{:<11} {:.3f}".format('False omission rate:', false_omission_rate))
        print()
        
       
    return {'TP': tp, 'FP': fp, 'FN': fn, 'TN': tn, 
            'Precision': precision, 'Recall': recall, 'Accuracy': accuracy, 'FOR':false_omission_rate}

In [14]:
# code to evaluate the endpoint on test data
# returns a variety of model metrics
def evaluate(predictor, test_features, test_labels, verbose=True):
    """
    Evaluate a model on a test set given the prediction endpoint.  
    Return binary classification metrics.
    :param predictor: A prediction endpoint
    :param test_features: Test features
    :param test_labels: Class labels for test data
    :param verbose: If True, prints a table of all performance metrics
    :return: A dictionary of performance metrics.
    """
    
    # We have a lot of test data, so we'll split it into batches of 100
    # split the test data set into batches and evaluate using prediction endpoint    
    prediction_batches = [predictor.predict(batch) for batch in np.array_split(test_features, 100)]
    
    # LinearLearner produces a `predicted_label` for each data point in a batch
    # get the 'predicted_label' for every point in a batch
    test_preds = np.concatenate([np.array([x.label['predicted_label'].float32_tensor.values[0] for x in batch]) 
                                 for batch in prediction_batches])
    
    # calculate true positives, false positives, true negatives, false negatives
    tp = np.logical_and(test_labels, test_preds).sum()
    fp = np.logical_and(1-test_labels, test_preds).sum()
    tn = np.logical_and(1-test_labels, 1-test_preds).sum()
    fn = np.logical_and(test_labels, 1-test_preds).sum()
    
    # calculate binary classification metrics
    recall = tp / (tp + fn)
    precision = tp / (tp + fp)
    accuracy = (tp + tn) / (tp + fp + tn + fn)
    false_omission_rate = fn / (fn + tn)
    
    # printing a table of metrics
    if verbose:
        print(pd.crosstab(test_labels, test_preds, rownames=['actual (row)'], colnames=['prediction (col)']))
        print("\n{:<11} {:.3f}".format('Recall:', recall))
        print("{:<11} {:.3f}".format('Precision:', precision))
        print("{:<11} {:.3f}".format('Accuracy:', accuracy))
        print("{:<11} {:.3f}".format('False omission rate:', false_omission_rate))
        print()
        
    return {'TP': tp, 'FP': fp, 'FN': fn, 'TN': tn, 
            'Precision': precision, 'Recall': recall, 'Accuracy': accuracy, 'FOR':false_omission_rate}


In [15]:
print('Metrics for simple, LinearLearner.\n')

# get metrics for linear predictor
metrics = evaluate(linear_predictor, 
                   X_test.values.astype('float32'), 
                   y_test, 
                   verbose=True) # verbose means we'll print out the metrics



Metrics for simple, LinearLearner.

prediction (col)   0.0    1.0
actual (row)                 
0.0               1044  12689
1.0                 37    551

Recall:     0.937
Precision:  0.042
Accuracy:   0.111
False omission rate: 0.034



In [16]:
# Deletes a precictor.endpoint
def delete_endpoint(predictor):
        try:
            boto3.client('sagemaker').delete_endpoint(EndpointName=predictor.endpoint)
            print('Deleted {}'.format(predictor.endpoint))
        except:
            print('Already deleted: {}'.format(predictor.endpoint))

In [17]:
# delete the predictor endpoint 
delete_endpoint(linear_predictor)

Deleted linear-learner-2020-02-08-08-49-47-769


## Tune benchmark model
The precision of the first model is too low.
Tuning the LinearLearner to increase precision


In [18]:
# instantiate a LinearLearner
# tune the model for a higher precision
linear_precision = LinearLearner(role=role,
                              train_instance_count=1, 
                              train_instance_type='ml.c4.xlarge',
                              predictor_type='binary_classifier',
                              output_path=output_path,
                              sagemaker_session=sagemaker_session,
                              epochs=15,
                              binary_classifier_model_selection_criteria='recall_at_target_precision', # target precision
                              target_precision=0.8) # 80% precision

In [19]:
%%time 
# train the estimator on formatted training data
linear_precision.fit(formatted_train_data)

2020-02-08 09:04:07 Starting - Starting the training job...
2020-02-08 09:04:08 Starting - Launching requested ML instances......
2020-02-08 09:05:10 Starting - Preparing the instances for training......
2020-02-08 09:06:23 Downloading - Downloading input data...
2020-02-08 09:07:05 Training - Training image download completed. Training in progress..[34mDocker entrypoint called with argument(s): train[0m
[34m[02/08/2020 09:07:08 INFO 140101488289600] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/resources/default-input.json: {u'loss_insensitivity': u'0.01', u'epochs': u'15', u'feature_dim': u'auto', u'init_bias': u'0.0', u'lr_scheduler_factor': u'auto', u'num_calibration_samples': u'10000000', u'accuracy_top_k': u'3', u'_num_kv_servers': u'auto', u'use_bias': u'true', u'num_point_for_scaler': u'10000', u'_log_level': u'info', u'quantile': u'0.5', u'bias_lr_mult': u'auto', u'lr_scheduler_step': u'auto', u'init_method': u'uniform', u'init_sigma':

[34m[2020-02-08 09:07:15.354] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 9, "duration": 1612, "num_examples": 56, "num_bytes": 5129920}[0m
[34m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.750049296431108, "sum": 0.750049296431108, "min": 0.750049296431108}}, "EndTime": 1581152835.35422, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1581152835.354124}
[0m
[34m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.7641011075106534, "sum": 0.7641011075106534, "min": 0.7641011075106534}}, "EndTime": 1581152835.35431, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1581152835.354293}
[0m
[34m#metrics {"Metrics": {"train_binary_classification_cross_entropy_objective": {"count": 1, "max": 0.


2020-02-08 09:07:28 Uploading - Uploading generated training model
2020-02-08 09:07:28 Completed - Training job completed
Training seconds: 65
Billable seconds: 65
CPU times: user 453 ms, sys: 9.85 ms, total: 463 ms
Wall time: 3min 41s


In [20]:
precision_predictor = linear_precision.deploy(initial_instance_count=1, instance_type='ml.t2.medium')

---------------------!

In [21]:
print('Metrics for precision-tuned LinearLearner model.\n')

# get metrics for tuned predictor
metrics = evaluate(precision_predictor, 
                   X_test.values.astype('float32'), 
                   y_test, 
                   verbose=True) # verbose means we'll print out the metrics


Metrics for precision-tuned LinearLearner model.

prediction (col)  0.0    1.0
actual (row)                
0.0                89  13644
1.0                 8    580

Recall:     0.986
Precision:  0.041
Accuracy:   0.047
False omission rate: 0.082



In [22]:
# delete the predictor endpoint 
delete_endpoint(precision_predictor)

Deleted linear-learner-2020-02-08-09-04-06-893


In [23]:
# instantiate a LinearLearner
# tune the model for a higher recall
linear_recall = LinearLearner(role=role,
                              train_instance_count=1, 
                              train_instance_type='ml.c4.xlarge',
                              predictor_type='binary_classifier',
                              output_path=output_path,
                              sagemaker_session=sagemaker_session,
                              epochs=15,
                              binary_classifier_model_selection_criteria='precision_at_target_recall', # target recall
                              target_recall=0.8) # 80% recall

In [24]:
%%time 
# train the estimator on formatted training data
linear_recall.fit(formatted_train_data)

2020-02-08 09:18:25 Starting - Starting the training job...
2020-02-08 09:18:27 Starting - Launching requested ML instances......
2020-02-08 09:19:31 Starting - Preparing the instances for training......
2020-02-08 09:20:56 Downloading - Downloading input data
2020-02-08 09:20:56 Training - Downloading the training image..[34mDocker entrypoint called with argument(s): train[0m
[34m[02/08/2020 09:21:11 INFO 140617057535808] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/resources/default-input.json: {u'loss_insensitivity': u'0.01', u'epochs': u'15', u'feature_dim': u'auto', u'init_bias': u'0.0', u'lr_scheduler_factor': u'auto', u'num_calibration_samples': u'10000000', u'accuracy_top_k': u'3', u'_num_kv_servers': u'auto', u'use_bias': u'true', u'num_point_for_scaler': u'10000', u'_log_level': u'info', u'quantile': u'0.5', u'bias_lr_mult': u'auto', u'lr_scheduler_step': u'auto', u'init_method': u'uniform', u'init_sigma': u'0.01', u'lr_scheduler_min


2020-02-08 09:21:31 Completed - Training job completed
Training seconds: 41
Billable seconds: 41
CPU times: user 454 ms, sys: 20.3 ms, total: 474 ms
Wall time: 3min 42s


In [25]:
recall_predictor = linear_recall.deploy(initial_instance_count=1, instance_type='ml.t2.medium')

-----------------------!

In [26]:
print('Metrics for tuned LinearLearner - target 80% recall.\n')

# get metrics for tuned predictor
metrics = evaluate(recall_predictor, 
                   X_test.values.astype('float32'), 
                   y_test, 
                   verbose=True) # verbose means we'll print out the metrics


Metrics for tuned LinearLearner - target 80% recall.

prediction (col)   0.0    1.0
actual (row)                 
0.0               1044  12689
1.0                 37    551

Recall:     0.937
Precision:  0.042
Accuracy:   0.111
False omission rate: 0.034



In [27]:
# delete the predictor endpoint 
delete_endpoint(recall_predictor)

Deleted linear-learner-2020-02-08-09-18-25-799


### Support Vector Classifier

In [28]:
from sklearn.svm import SVC

# Instantiate
svc = SVC(kernel="linear", C=0.025)

In [29]:
%%time
svc.fit(X_train, y_train)

CPU times: user 1min 16s, sys: 103 ms, total: 1min 16s
Wall time: 1min 16s


SVC(C=0.025, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='linear', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

In [30]:
svc_y_pred = svc.predict(X_test)

In [31]:
print("Metrics for Support Vector Classifier:\n")
calculate_measures(y_test, svc_y_pred)

Metrics for Support Vector Classifier:

prediction (col)   0.0   1.0
actual (row)                
0.0               6343  7390
1.0                213   375

Recall:     0.638
Precision:  0.048
Accuracy:   0.469
False omission rate: 0.032



{'TP': 375,
 'FP': 7390,
 'FN': 213,
 'TN': 6343,
 'Precision': 0.04829362524146812,
 'Recall': 0.6377551020408163,
 'Accuracy': 0.4691013197402416,
 'FOR': 0.03248932275777913}

### Linear model conclusions
After running the default LinearLearner model, and tuned versions of target precision and target recall, it is observed that the Support Vector Classifier is the best linear model. It is able to detect 375 out of 588 sepsis cases in the test dataset. It has the lowest false admission rate and an accuracy of 46.9%. 

## Train with non-linear binary classifiers
Now we train the dataset with non-linear binary classifiers to see if there are better results.

### K-Neighbor Classifier

In [32]:
from sklearn.neighbors import KNeighborsClassifier

## Instantiate a K-Neighbor classifier
knn = KNeighborsClassifier(weights='distance',leaf_size=500,n_neighbors=2)

In [33]:
%%time
knn.fit(X_train, y_train)

CPU times: user 1.8 s, sys: 36 µs, total: 1.8 s
Wall time: 1.79 s


KNeighborsClassifier(algorithm='auto', leaf_size=500, metric='minkowski',
           metric_params=None, n_jobs=None, n_neighbors=2, p=2,
           weights='distance')

In [34]:
knn_y_pred = knn.predict(X_test)

In [35]:
print("Metrics for KNeighborsClassifier:\n")
calculate_measures(y_test, knn_y_pred)

Metrics for KNeighborsClassifier:

prediction (col)    0.0  1.0
actual (row)                
0.0               12936  797
1.0                 534   54

Recall:     0.092
Precision:  0.063
Accuracy:   0.907
False omission rate: 0.040



{'TP': 54,
 'FP': 797,
 'FN': 534,
 'TN': 12936,
 'Precision': 0.06345475910693302,
 'Recall': 0.09183673469387756,
 'Accuracy': 0.9070595628796871,
 'FOR': 0.03964365256124722}

### XGBoost

In [36]:
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score

xgb = XGBClassifier()
xgb.fit(X_train, y_train)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bynode=1, colsample_bytree=1, gamma=0, learning_rate=0.1,
       max_delta_step=0, max_depth=3, min_child_weight=1, missing=None,
       n_estimators=100, n_jobs=1, nthread=None,
       objective='binary:logistic', random_state=0, reg_alpha=0,
       reg_lambda=1, scale_pos_weight=1, seed=None, silent=None,
       subsample=1, verbosity=1)

In [37]:
# make predictions for test data
xgb_y_pred = xgb.predict(X_test)
predictions = [round(value) for value in xgb_y_pred]

In [38]:
# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))

Accuracy: 51.18%


In [39]:
print("Metrics for XGBClassifier:\n")
calculate_measures(y_test, xgb_y_pred)

Metrics for XGBClassifier:

prediction (col)   0.0   1.0
actual (row)                
0.0               6948  6785
1.0                206   382

Recall:     0.650
Precision:  0.053
Accuracy:   0.512
False omission rate: 0.029



{'TP': 382,
 'FP': 6785,
 'FN': 206,
 'TN': 6948,
 'Precision': 0.05329984651876657,
 'Recall': 0.6496598639455783,
 'Accuracy': 0.5118357656588227,
 'FOR': 0.0287950796757059}

### Radial Basic Function Support Vector Classifier (RBF SVC)
Instantiate a K-Neighbor classifier

In [40]:
from sklearn.svm import SVC
rbfsvc = SVC(gamma=2, C=1)
rbfsvc.fit(X_train, y_train)

SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=2, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [41]:
# make predictions for test data
rbf_y_pred = rbfsvc.predict(X_test)


In [42]:
print("Metrics for RBF SVC:\n")
calculate_measures(y_test, rbf_y_pred)

Metrics for RBF SVC:

prediction (col)   0.0   1.0
actual (row)                
0.0               6763  6970
1.0                201   387

Recall:     0.658
Precision:  0.053
Accuracy:   0.499
False omission rate: 0.029



{'TP': 387,
 'FP': 6970,
 'FN': 201,
 'TN': 6763,
 'Precision': 0.052602963164333286,
 'Recall': 0.6581632653061225,
 'Accuracy': 0.4992668109768871,
 'FOR': 0.028862722573233773}

## Non-linear model conclusions
Of the 3 non-linear models built, the XGBoost Classifier and Radial Basic Function Support Vector Classifier beat the KNeighborClassifier, as well as the benchmark linear model. Between the XGBoost Classifier and Radial Basic Function Support Vector Classifier, XGBoost Classifier has a higher accuracy of 58.4%, and has correctly identified more than half of both the sepsis and non-sepsis cases.

---
## Conclusions

For future iterations to this project, these are features to improve on:
* Further adjust hyperparameters to improve model performance.
* Revisit the dataset and find more features to improve the model training.
