# Model Tuning

This notebook includes a hyperparameter tuning and feature selection exercise for the top performing MVP model. The objective is to narrow down which TML features and hyperparameters will be used in the next phase of scaling to jurisdictional scale maps.

In [2]:
import matplotlib.pyplot as plt
import sys
sys.path.append('../src/')
import prepare_data as pp
import run_preds as rp
import score_classifier as score
import pandas as pd
import pickle
from catboost import CatBoostClassifier
from sklearn.model_selection import cross_val_score, GridSearchCV, RandomizedSearchCV
from sklearn.metrics import roc_curve, roc_auc_score, precision_recall_curve, f1_score, precision_score, recall_score, confusion_matrix, ConfusionMatrixDisplay

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Feature Selection
Evaluate feature importance for the top performing MVP model (Catboost). This will help to reduce overfitting and training time, while improving accuracy by removing misleading data and noise. Narrow down the 78 features to the top 10-15.  

  
Index 0: slope  
Index 1-3: s1  
Index 3-13: s2  
Index 13-78: TML features (78 is TML tree probability)

In [32]:
# identify top performing model
df = pd.read_csv('../models/mvp_scores.csv')
df.sort_values(by=['test_score','roc_auc'], ascending=False)[:5]

Unnamed: 0,model,cv,train_score,test_score,roc_auc,precision,recall,f1
29,cat_model_v10,0.8884,0.9972,0.8523,0.9291,0.9239,0.7847,0.8487
16,cat_model_v10_np,0.8908,0.9969,0.8518,0.9302,0.921,0.7865,0.8485
27,lgbm_model_v10,0.8846,0.9892,0.85,0.9297,0.9232,0.7806,0.8459
28,xgb_model_v10,0.8858,0.9999,0.849,0.9259,0.9168,0.785,0.8458
25,rfc_model_v10,0.8815,1.0,0.8487,0.9216,0.9322,0.7693,0.8429


In [35]:
# load best model
model_name, v_train_data = 'cat', 'v10'
filename = f'../models/{model_name}_model_{v_train_data}.pkl'
with open(filename, 'rb') as file:
    model = pickle.load(file)

# get initial read of feature importance
feats = model.get_feature_importance()

# get indices of most important features (sorted by importance)
feats_ordered = np.argsort(feats)[::-1]
feats_ordered

top15 = feats_ordered[:15]
top15

array([ 7, 67,  9, 12,  5, 11,  3,  0, 15, 10, 34, 18,  6, 61,  4])

In [4]:
# quick test to see if we can get better performance using only top 15 feats
X, y = pp.create_xy((14, 14), ['v3', 'v4', 'v10'], drop_prob=False, drop_feats=False, verbose=False)
X_train_ss, X_test_ss, y_train, y_test = pp.reshape_and_scale(X, y)

Training data includes 265 plot ids.
Baseline: 0.491


In [5]:
# filter to selected features
X_train_selected = X_train_ss[:, top15]
X_test_selected = X_test_ss[:, top15]
X_train_selected.shape, X_test_selected.shape, y_train.shape, y_test.shape

In [10]:
cat_15feats = CatBoostClassifier(verbose=False, random_state=22)
cat_15feats.fit(X_train_selected, y_train)

# score new model
score.print_scores(cat_15feats, X_train_selected, y_train, X_test_selected, y_test)

cv: 0.8926
train: 0.9916
test: 0.8354
roc_auc: 0.9234
precision: 0.9135
recall: 0.76
f1: 0.8297


In [38]:
# better performance using top 30 feats?
X, y = pp.create_xy((14, 14), ['v3', 'v4', 'v10'], drop_prob=False, drop_feats=False, verbose=False)
X_train_ss, X_test_ss, y_train, y_test = pp.reshape_and_scale(X, y)

# filter to top 30 features
top30 = feats_ordered[:30]
X_train_selected = X_train_ss[:, top30]
X_test_selected = X_test_ss[:, top30]

cat_30feats = CatBoostClassifier(verbose=False, random_state=22)
cat_30feats.fit(X_train_selected, y_train)
score.print_scores(cat_30feats, X_train_selected, X_test_selected, y_train, y_test)

Training data includes 265 plot ids.
Baseline: 0.491
cv: 0.8936
train: 0.996
test: 0.8526
roc_auc: 0.9299
precision: 0.9229
recall: 0.7863
f1: 0.8491


In [52]:
df.sort_values(by=['test_score','roc_auc'], ascending=False)[:1]

Unnamed: 0,model,cv,train_score,test_score,roc_auc,precision,recall,f1
29,cat_model_v10,0.8884,0.9972,0.8523,0.9291,0.9239,0.7847,0.8487


### Check consistency across regions

In [33]:
# are the same features important for this model in west africa (v8)?
model_name, v_train_data = 'cat', 'v8'
with open(filename, 'rb') as file:
    model = pickle.load(file)

# get initial read of feature importance
feats = model.get_feature_importance()
feats_ordered = np.argsort(feats)[::-1]
feats_ordered

array([ 7,  3,  5, 67,  9,  4, 11, 12, 18,  8, 10, 44, 37,  6, 63, 32, 66,
       20, 61, 40, 51, 26, 55,  0, 43, 33, 13, 31, 21, 39, 23, 19, 38, 46,
       34, 17, 16,  2, 73, 41, 45, 59, 47, 29, 35, 64, 42, 54, 14, 30, 24,
       65, 22, 27, 57, 36, 60, 25, 49, 58, 53, 69, 56, 15, 28, 77, 68, 50,
       76, 72, 70, 48, 62, 71, 75, 74, 52,  1])

In [34]:
# how about south america? (v9)
model_name, v_train_data = 'cat', 'v9'
with open(filename, 'rb') as file:
    model = pickle.load(file)

# get initial read of feature importance
feats = model.get_feature_importance()
feats_ordered = np.argsort(feats)[::-1]
feats_ordered

array([ 5, 12, 11,  4, 10,  3,  0,  7, 67, 73, 61, 63, 18,  6, 69,  8, 51,
       46, 55, 66, 44, 20, 64,  9, 19, 36, 13, 37, 40, 39, 26, 38, 65, 56,
       59, 35, 43, 34, 54, 17, 60, 45, 31, 21, 57, 42, 32, 33, 24, 30, 23,
       29, 28, 14, 25, 27, 41, 49, 77, 58, 15, 53, 16, 72, 22, 47, 48, 50,
       70, 71, 52, 62, 68, 76, 75, 74,  2,  1])

## Hyperparameter Tuning

In [24]:
# use central america training data
X, y = pp.create_xy((14, 14), ['v3', 'v4', 'v10'], drop_prob=False, drop_feats=False, verbose=False)
X_train_ss, X_test_ss, y_train, y_test = pp.reshape_and_scale(X, y)

Training data includes 265 plot ids.
Baseline: 0.491


In [25]:
iterations = [int(x) for x in np.linspace(200, 1100, 10)]            # equiv to n_estimators
depth = [int(x) for x in np.linspace(4, 10, 4)]                     # equiv to max_depth (must be <= 16)
l2_reg = [int(x) for x in np.linspace(2, 30, 4)]
learning_rate = [.01, .02, .03]                                      # decrease learning rate if overfitting 

param_dist = {'iterations': iterations,
               'depth': depth,
               'l2_leaf_reg': l2_reg,
               'learning_rate': learning_rate}

In [26]:
cat = CatBoostClassifier(random_seed=42, verbose=False)

rds = RandomizedSearchCV(estimator=cat,
                        param_distributions=param_dist, 
                        n_iter=30,
                        cv=3)
# Achieves 0.892
rds.fit(X_train_ss, y_train)
rds_best = rds.best_params_
print(f"The best parameters are {rds.best_params_} with a score of {rds.best_score_}")

The best parameters are {'learning_rate': 0.03, 'l2_leaf_reg': 2, 'iterations': 1000, 'depth': 6} with a score of 0.891560013836043


In [30]:
# now fit classifier with best params and get all scores
cat_best_params = CatBoostClassifier(random_seed=42,
                                     learning_rate=0.03,
                                     l2_leaf_reg=2,
                                     iterations=1000,
                                     depth=6,
                                     verbose=False)

cat_best_params.fit(X_train_ss, y_train)   
score.print_scores(cat_best_params, X_train_ss, X_test_ss, y_train, y_test)

cv: 0.8916
train: 0.9918
test: 0.8524
roc_auc: 0.9305
precision: 0.9236
recall: 0.7852
f1: 0.8488


In [3]:
scores = pd.read_csv('../models/mvp_scores.csv')
scores[scores.model == 'cat_model_v10']

Unnamed: 0,model,cv,train_score,test_score,roc_auc,precision,recall,f1
29,cat_model_v10,0.8884,0.9972,0.8523,0.9291,0.9239,0.7847,0.8487


**Conclusions**  
- The feature selection exercise revealed that different features are important for different regions.
- Index 7 (s2) and 67 (TML feat) ranked highly across all three regions.
- Index 77 (TML tree probability) had surprisingly low importance.
- Fitting the CatboostClassifier with only the top 15 features had a negative impact on performance. Fitting with the top 30 features was better and more aligned with baseline.
- Hyperparameter tuning was informed by [Catboost Documentation](https://catboost.ai/en/docs/concepts/parameter-tuning#iterations). 
- Generally speaking, Catboost's default parameters will provide a strong result, but some adjustments showed minor improvements.