# Comparing Machine Learning Models
Here we take two machine learning models, in this case active SageMaker endpoints, and compare the performance of these models.

From a research perspective, we are interested in generating ROC curves. These will let us understand how the 2 models perform in terms of their false positives and true positives.

From a practical perspective, we want to understand how the model is handling various types of prediction circumstances. We want to understand how the models handle different types of scenarios, and we will use its performance to increase the model over time.

In the best case, both the research and practical perspectives work together to increase the model and its prediction generation in the real world, solving the problem for the best overall outcomes.

In [2]:
import sagemaker as sm

sess = sm.Session()

linear_endpoint = sm.predictor.RealTimePredictor('linear-learner-2018-10-23-15-29-37-619', sess)
balanced_endpoint = sm.predictor.RealTimePredictor("linear-learner-2018-10-23-16-47-29-308", sess)

In [4]:
import pandas as pd

data = pd.read_csv("../Data/fewer_labeled_rows_by_block.csv")

In [5]:
import numpy as np
from sklearn.model_selection import train_test_split

def create_training_sets(data):
    ys = np.array(data["Target"]).astype("float32")
    
    ys -= 1
        
    drop_list = ["Target", "Date", "Primary Type"]
    
    xs = np.array(data.drop(drop_list, axis=1)).astype("float32")
    
    np.random.seed(0)

    train_features, test_features, train_labels, test_labels = train_test_split(
    xs, ys, test_size=0.2)
    
    val_features, test_features, val_labels, test_labels = train_test_split(
    test_features, test_labels, test_size=0.5)
    
    return train_features, test_features, train_labels, test_labels, val_features, val_labels
     
    
train_features, test_features, train_labels, test_labels, val_features, val_labels = create_training_sets(data)

In [7]:
from sklearn.metrics import classification_report, roc_curve, roc_auc_score, precision_recall_fscore_support, accuracy_score

def get_predictions(predictor, test_features, test_labels):
    # split the test dataset into 100 batches and evaluate using prediction endpoint
    prediction_batches = [predictor.predict(batch) for batch in np.array_split(test_features, 100)]

#     # parse protobuf responses to extract predicted labels
#     extract_label = lambda x: x.label['predicted_label'].float32_tensor.values
#     test_preds = np.concatenate([np.array([extract_label(x) for x in batch]) for batch in prediction_batches])
#     test_preds = test_preds.reshape((-1,))
#     return test_preds

linear_predictions = get_predictions(linear_endpoint, test_features, test_labels)
# balanced_predictions = get_predictions(balanced_endpoint, test_features, test_labels)

ParamValidationError: Parameter validation failed:
Invalid type for parameter Body, value: [[81.44444   0.        0.       ...  0.        0.        0.      ]
 [41.333332  0.        0.       ...  0.        0.        0.      ]
 [82.        0.        0.       ...  0.        0.        0.      ]
 ...
 [75.        0.        0.       ...  0.        0.        0.      ]
 [59.88889   0.        0.       ...  0.        0.        0.      ]
 [93.        0.        0.       ...  0.        0.        0.      ]], type: <class 'numpy.ndarray'>, valid types: <class 'bytes'>, <class 'bytearray'>, file-like object