## Summary: Compare model results and final model selection

Using the Titanic dataset from [this](https://www.kaggle.com/c/titanic/overview) Kaggle competition.

In this section, we will do the following:
1. Evaluate all of our saved models on the validation set
2. Select the best model based on performance on the validation set
3. Evaluate that model on the holdout test set

### Read Data for Valuation and Test

In [17]:
import joblib
import pandas as pd
from sklearn.metrics import accuracy_score, precision_score, recall_score
from time import time

val_features = pd.read_csv('../dataset/val_features.csv')
val_labels = pd.read_csv('../dataset/val_labels.csv', header=None)

te_features = pd.read_csv('../dataset/test_features.csv')
te_labels = pd.read_csv('../dataset/test_labels.csv', header=None)

print("Import Completed")

Import Completed


### Read Models

In [19]:
models = {}

#for mdl in ['LR', 'SVM', 'MLP', 'RF', 'GB']:
#    models[mdl] = joblib.load('../../../{}_model.pkl'.format(mdl))

for mdl in ['SVM','RF','GB']:
    models[mdl] = joblib.load('../models/{}_model.pkl'.format(mdl))
        

FileNotFoundError: [Errno 2] No such file or directory: '../models/GB_model.pkl'

In [12]:
models

{'SVM': SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
   decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',
   max_iter=-1, probability=False, random_state=None, shrinking=True,
   tol=0.001, verbose=False),
 'RF': RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
             max_depth=4, max_features='auto', max_leaf_nodes=None,
             min_impurity_decrease=0.0, min_impurity_split=None,
             min_samples_leaf=1, min_samples_split=2,
             min_weight_fraction_leaf=0.0, n_estimators=5, n_jobs=1,
             oob_score=False, random_state=None, verbose=0,
             warm_start=False)}

### Evaluate models on the validation set

- **Accuracy** = # predicted correctly / total # of examples
- **Precision** = # predicted as surviving that actually survived / total # predicted to survive
- **Recall** = # predicted as surviving that actually survived / total # that actually survived

In [13]:
def evaluate_model(name, model, features, labels):
    start = time()
    pred = model.predict(features)
    end = time()
    accuracy = round(accuracy_score(labels, pred), 3)
    precision = round(precision_score(labels, pred), 3)
    recall = round(recall_score(labels, pred), 3)
    print('{} -- Accuracy: {} / Precision: {} / Recall: {} / Latency: {}ms'.format(name,
                                                                                   accuracy,
                                                                                   precision,
                                                                                   recall,
                                                                                   round((end - start)*1000, 1)))
    

In [15]:
for name, mdl in models.items():
    evaluate_model(name, mdl, val_features, val_labels)

SVM -- Accuracy: 0.781 / Precision: 0.732 / Recall: 0.631 / Latency: 0.0ms
RF -- Accuracy: 0.787 / Precision: 0.765 / Recall: 0.6 / Latency: 2.0ms


### Evaluate best model on test set

In [16]:
evaluate_model('Random Forest', models['RF'], te_features, te_labels)

Random Forest -- Accuracy: 0.81 / Precision: 0.85 / Recall: 0.671 / Latency: 2.0ms
