## Pipeline: Evaluate results on validation set

Using the Titanic dataset from [this](https://www.kaggle.com/c/titanic/overview) Kaggle competition.

In last section to fit the best few models on the full training set and then evaluate the model on the validation set.

### Read in data



In [None]:
from google.colab import drive
drive.mount('/gdrive')
%cd/gdrive

Mounted at /gdrive
/gdrive


In [None]:
# Mounting google drive to access data
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [8]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score

tr_features = pd.read_csv('/content/drive/MyDrive/Summer_Learning/Ex_Files_Applied_Machine_Learning/Exercise_Files/train_features.csv')
tr_labels = pd.read_csv('/content/drive/MyDrive/Summer_Learning/Ex_Files_Applied_Machine_Learning/Exercise_Files/train_labels.csv', header=None)

val_features = pd.read_csv('/content/drive/MyDrive/Summer_Learning/Ex_Files_Applied_Machine_Learning/Exercise_Files/val_features.csv')
val_labels = pd.read_csv('/content/drive/MyDrive/Summer_Learning/Ex_Files_Applied_Machine_Learning/Exercise_Files/val_labels.csv', header=None)

te_features = pd.read_csv('/content/drive/MyDrive/Summer_Learning/Ex_Files_Applied_Machine_Learning/Exercise_Files/test_features.csv')
te_labels = pd.read_csv('/content/drive/MyDrive/Summer_Learning/Ex_Files_Applied_Machine_Learning/Exercise_Files/test_labels.csv', header=None)

### Fit best models on full training set

Results from last section:
```
0.76 (+/-0.116) for {'max_depth': 2, 'n_estimators': 5}
0.796 (+/-0.119) for {'max_depth': 2, 'n_estimators': 50}
0.803 (+/-0.117) for {'max_depth': 2, 'n_estimators': 100}
--> 0.828 (+/-0.074) for {'max_depth': 10, 'n_estimators': 5}
0.816 (+/-0.028) for {'max_depth': 10, 'n_estimators': 50}
--> 0.826 (+/-0.046) for {'max_depth': 10, 'n_estimators': 100}
0.785 (+/-0.106) for {'max_depth': 20, 'n_estimators': 5}
0.813 (+/-0.027) for {'max_depth': 20, 'n_estimators': 50}
0.809 (+/-0.029) for {'max_depth': 20, 'n_estimators': 100}
0.794 (+/-0.04) for {'max_depth': None, 'n_estimators': 5}
0.809 (+/-0.037) for {'max_depth': None, 'n_estimators': 50}
--> 0.818 (+/-0.035) for {'max_depth': None, 'n_estimators': 100}
```

In [None]:
rf1 = RandomForestClassifier(n_estimators=5, max_depth=10)
rf1.fit(tr_features, tr_labels.values.ravel())

rf2 = RandomForestClassifier(n_estimators=100, max_depth=10)
rf2.fit(tr_features, tr_labels.values.ravel())

rf3 = RandomForestClassifier(n_estimators=100, max_depth=None)
rf3.fit(tr_features, tr_labels.values.ravel())

### Evaluate models on validation set



In [None]:
for mdl in [rf1, rf2, rf3]:
    y_pred = mdl.predict(val_features)
    accuracy = round(accuracy_score(val_labels, y_pred), 3)
    precision = round(precision_score(val_labels, y_pred), 3)
    recall = round(recall_score(val_labels, y_pred), 3)
    print('MAX DEPTH: {} / # OF EST: {} -- A: {} / P: {} / R: {}'.format(mdl.max_depth,
                                                                         mdl.n_estimators,
                                                                         accuracy,
                                                                         precision,
                                                                         recall))

### Evaluate the best model on the test set


In [None]:
y_pred = rf2.predict(te_features)
accuracy = round(accuracy_score(te_labels, y_pred), 3)
precision = round(precision_score(te_labels, y_pred), 3)
recall = round(recall_score(te_labels, y_pred), 3)
print('MAX DEPTH: {} / # OF EST: {} -- A: {} / P: {} / R: {}'.format(rf2.max_depth,
                                                                     rf2.n_estimators,
                                                                     accuracy,
                                                                     precision,
                                                                     recall))

MAX DEPTH: 10 / # OF EST: 100 -- A: 0.798 / P: 0.764 / R: 0.646
