# Hyperparameter Optimization - WS

Each machine learning model has a set of parameters that are initialized by the human prior to the learning process that have an effect on the learning algorithm of the model. These are called Hyperparameters, and can be tuned and optimized to further improve a models performance.

## Load in the data

Let's first begin by importing the featurized, split, scaled, normalized data sets from notebook 3-modeling classic models. 

In [2]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from time import time

from sklearn.metrics import r2_score
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error

%matplotlib inline
%config InlineBackend.figure_format='retina'

from collections import OrderedDict
from pprint import pprint

# Set a random seed to ensure reproducibility across runs
RNG_SEED = 42
np.random.seed(RNG_SEED)

In [3]:
PATH = os.getcwd()

# X df
train_path = os.path.join(PATH, '../data/cp_train_processed.csv')
val_path = os.path.join(PATH, '../data/cp_val_processed.csv')
test_path = os.path.join(PATH, '../data/cp_test_processed.csv')

df_train = pd.read_csv(train_path)
df_val = pd.read_csv(val_path)
df_test = pd.read_csv(test_path)

# y df
train_path_y = os.path.join(PATH, '../data/cp_train_target.csv')
val_path_y = os.path.join(PATH, '../data/cp_val_target.csv')
test_path_y = os.path.join(PATH, '../data/cp_test_target.csv')

df_train_target = pd.read_csv(train_path_y)
df_val_target = pd.read_csv(val_path_y)
df_test_target = pd.read_csv(test_path_y)


print(f'df_train DataFrame shape: {df_train.shape}')
print(f'df_val DataFrame shape: {df_val.shape}')
print(f'df_test DataFrame shape: {df_test.shape}''\n')

print(f'df_train_target DataFrame shape: {df_train_target.shape}')
print(f'df_val_target DataFrame shape: {df_val_target.shape}')
print(f'df_test_target DataFrame shape: {df_test_target.shape}')

df_train DataFrame shape: (2000, 177)
df_val DataFrame shape: (200, 177)
df_test DataFrame shape: (200, 177)

df_train_target DataFrame shape: (2000, 1)
df_val_target DataFrame shape: (200, 1)
df_test_target DataFrame shape: (200, 1)


Let's quickly see what the train data frame looks like, just for fun:

In [4]:
df_train

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,167,168,169,170,171,172,173,174,175,176
0,-0.049496,-0.049293,-0.038091,-0.033664,-0.036180,-0.033839,-0.018897,-0.013881,-0.045918,-0.049305,...,0.169828,0.087697,-0.030553,-0.020227,0.325807,-0.016071,-0.012299,0.145706,0.136940,-0.055059
1,-0.046658,-0.047568,-0.022087,-0.017007,0.083449,-0.101719,0.111022,-0.011695,0.081329,-0.041540,...,-0.049002,-0.057925,-0.028256,0.212334,-0.024335,-0.047894,-0.034548,0.013040,0.053715,-0.057497
2,-0.032543,-0.032056,0.037335,0.068315,0.073449,0.157559,0.063540,-0.015627,0.065435,0.006373,...,-0.026901,-0.015855,-0.035402,-0.013003,-0.031106,-0.036161,0.024058,-0.020971,-0.024492,-0.093283
3,-0.064185,-0.060748,-0.070468,-0.057946,-0.062284,-0.032347,-0.054070,-0.013269,-0.067776,-0.073402,...,-0.053875,-0.056931,-0.031278,-0.019205,-0.027656,-0.050643,0.004904,-0.046145,-0.047536,0.142243
4,0.016841,0.026828,-0.043320,-0.048758,-0.059175,-0.023704,-0.066010,-0.009723,-0.053324,-0.053789,...,-0.041007,-0.048359,-0.023493,-0.014270,-0.020410,-0.039843,-0.028725,-0.034276,-0.035182,0.013361
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,0.262353,0.267779,0.156791,0.058086,0.062452,-0.032392,0.054026,-0.013287,0.050997,0.058035,...,0.006468,0.072964,-0.030027,-0.019496,-0.024961,0.079500,-0.038153,0.129422,0.120138,-0.079493
1996,-0.051198,-0.053064,-0.033001,-0.076308,-0.075941,0.066787,-0.071204,-0.017473,-0.033392,-0.096661,...,-0.057940,-0.070259,-0.041836,-0.025442,0.044250,-0.034717,-0.051432,0.023407,0.018281,-0.104535
1997,-0.050158,-0.048081,-0.073605,-0.048087,-0.066054,-0.040275,-0.067324,-0.016521,-0.059534,-0.025973,...,-0.026125,-0.007800,0.043091,-0.023667,-0.027401,-0.021094,0.395452,-0.051350,-0.053786,-0.087589
1998,-0.026411,-0.028134,-0.024171,-0.068100,-0.060983,0.089427,-0.095342,-0.023396,-0.066082,0.055865,...,0.056896,-0.001830,0.019166,-0.033358,-0.031977,-0.008546,-0.034781,-0.075362,-0.077117,0.032149


In [5]:
df_train_target

Unnamed: 0,target
0,66.392
1,109.956
2,135.520
3,71.128
4,37.183
...,...
1995,127.800
1996,67.864
1997,46.806
1998,101.504


In [6]:
# we'll need the target values in a 1d array for the model fit function
df_train_target = df_train_target.values.ravel()
df_train_target

array([ 66.392, 109.956, 135.52 , ...,  46.806, 101.504,  74.475])

Now let's set up an empty data frame that we will use to store model results, and a dictionary of the model names

In [7]:
df_classics = pd.DataFrame(columns=['model_name',
                                    'model_name_pretty',
                                    'model_params',
                                    'fit_time',
                                    'r2_train',
                                    'mae_train',
                                    'rmse_train',
                                    'r2_val',
                                    'mae_val',
                                    'rmse_val'])



Also, to be able to feed our data into the learning models, let's rename them here.

In [13]:
X_train = df_train
y_train = df_train_target

# Hyperparameter Optimization

In this notebook, I will use both the Grid Search and Random Search optimization techniques for various models. We can then compare the models performance following hyperparameter tuning.

## Grid Search

Grid Search is method whereby all combination subsets of hyperparameters are stepped through sequnetially and exhuastively. the optimal subset is then identified. This method is computationally heavy and thus can be very time consuming.

## Random Search

Random search does essentialy the same thing as the Grid Search method, but saves computational power by randomly assigning parameter configurations and testing only (n) # of samples that we can define. The downsides to this method is you may not discover the absolute best combination of parameters for your model, but at a lower cost.

In [8]:
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV

## 1 - Random Forest Model

The first thing to do for a specfic model is identify the names and preset values for all the parameters for a given model

### Get Hyperparameters

In [9]:
from sklearn.ensemble import RandomForestRegressor

# Create the base model, then display hyperparameters
rfr = RandomForestRegressor(random_state = 42)
pprint(rfr.get_params())
                                  

{'bootstrap': True,
 'ccp_alpha': 0.0,
 'criterion': 'mse',
 'max_depth': None,
 'max_features': 'auto',
 'max_leaf_nodes': None,
 'max_samples': None,
 'min_impurity_decrease': 0.0,
 'min_impurity_split': None,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'n_estimators': 100,
 'n_jobs': None,
 'oob_score': False,
 'random_state': 42,
 'verbose': 0,
 'warm_start': False}


SciKit-Learn docs tells us that the most important hyperparameters of the 18 available are:
- n_estimators : the # of trees in the forest
- max_features : the # of features to consider for splitting at each node
- max_depth : max # of levels in each decision tree
- min_samples_split : min # of data points placed in node before node is split
- min_samples_leaf : min # of data points allowed in a lead node
- bootstrap : method for sampling datapoints



### Build the Grid

Next we'll create an array of values to vary each hyperparameter across. In the case of a `Grid_Search` each combination of hyperparameter values will be applied and tested. Altogether that amounts to some 4,320 unique sets of hyperparameters based on the arrays below.

In [10]:
# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)]
# Number of features to consider at every split
max_features = ['auto', 'sqrt']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(10, 110, num = 11)]
max_depth.append(None)
# Minimum number of samples required to split a node
min_samples_split = [2, 5, 10]
# Minimum number of samples required at each leaf node
min_samples_leaf = [1, 2, 4]
# Method of selecting samples for training each tree
bootstrap = [True, False]
# Create the random grid

param_grid = {'n_estimators': n_estimators,
               'max_features': max_features,
               'max_depth': max_depth,
               'min_samples_split': min_samples_split,
               'min_samples_leaf': min_samples_leaf,
               'bootstrap': bootstrap}

pprint(param_grid)

{'bootstrap': [True, False],
 'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, None],
 'max_features': ['auto', 'sqrt'],
 'min_samples_leaf': [1, 2, 4],
 'min_samples_split': [2, 5, 10],
 'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000]}


Let's start with a `random_search`, then later compare it to a `grid_search` to see how the model improves vs the time it takes to compute. We may find that a random search saves time and finds nearly the best set of hyperparameters - let's find out.

### Random Search - Function Inputs

Let's tune using the following inputs: 
- `n_iter` = 10 | # of iterations or random sets of hyperparameter configurations
- `cv` = 3 | set a k-fold cross validation with k splits of the training data - higher k reduces overfitting
- `verbose` = 2 | best I can find is this value can be set between 0 - 3 and determines how many output log messages are displayed
- `n_jobs` = -1 | # of concurrent processing jobs - at n = -1 all cores will be utlized

In [10]:
rfr_random = RandomizedSearchCV(estimator = rfr, param_distributions = param_grid, n_iter = 10, cv = 3, verbose = 2, n_jobs = -1)

### Fit the Model

In [11]:
rfr_random.fit(X_train, y_train)

Fitting 3 folds for each of 10 candidates, totalling 30 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  30 out of  30 | elapsed: 24.9min finished


RandomizedSearchCV(cv=3, error_score=nan,
                   estimator=RandomForestRegressor(bootstrap=True,
                                                   ccp_alpha=0.0,
                                                   criterion='mse',
                                                   max_depth=None,
                                                   max_features='auto',
                                                   max_leaf_nodes=None,
                                                   max_samples=None,
                                                   min_impurity_decrease=0.0,
                                                   min_impurity_split=None,
                                                   min_samples_leaf=1,
                                                   min_samples_split=2,
                                                   min_weight_fraction_leaf=0.0,
                                                   n_estimators=100,
                              

### Retrieve Best Parameters from Search & Evaluate

A function in `sklearn` allows us to view the best parameters from the random search:

In [12]:
rfr_random.best_params_

{'n_estimators': 2000,
 'min_samples_split': 5,
 'min_samples_leaf': 2,
 'max_features': 'auto',
 'max_depth': 50,
 'bootstrap': True}

Now we can use an evaluation function to compare the model using the preset hyperparameters vs the Random Search optimized model

In [11]:
def evaluate(model, X, y_act):
    
    y_pred = model.predict(X)
    r2 = r2_score(y_act, y_pred)
    mae = mean_absolute_error(y_act, y_pred)
    rmse_val = mean_squared_error(y_act, y_pred, squared=False)
    errors = abs(y_pred - y_act)
    mape = 100 * np.mean(errors / y_act)
    
    accuracy = 100 - mape
    print('Model Performance')
    print('Average Error: {:0.4f} degrees'.format(np.mean(errors)))
    print('Accuracy = {:0.2f}%'.format(accuracy))
    print('r2: {:0.3f}'.format(r2))
    print('\n')
    
    return accuracy, r2, mae, 

In [22]:
base_model = RandomForestRegressor(n_estimators = 10, random_state = 42)
base_model.fit(X_train, y_train)
base_accuracy = evaluate(base_model, X_train, y_train)

best_random = rfr_random.best_estimator_
random_accuracy = evaluate(best_random, X_train, y_train)

print('Improvement of {:0.2f}%.'.format( 100 * (random_accuracy[0] - base_accuracy[0]) / base_accuracy[0]))

Model Performance
Average Error: 1.5114 degrees
Accuracy = 98.52%
r2: 0.997


Model Performance
Average Error: 1.5662 degrees
Accuracy = 98.48%
r2: 0.997


Improvement of -0.04%.


In this case, the preset hyperparameters lend to a better learning model than the best of the 30 iterations tried.

### Repeat w/ More Iterations

Let's double the number of iterations and see if we can find a configuration of hyperparameters that out performs the preset.

In [14]:
rfr_random = RandomizedSearchCV(estimator = rfr, param_distributions = param_grid, n_iter = 20, cv = 3, verbose = 2, n_jobs = -1)

rfr_random.fit(X_train, y_train)

Fitting 3 folds for each of 20 candidates, totalling 60 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed: 19.0min
[Parallel(n_jobs=-1)]: Done  60 out of  60 | elapsed: 34.6min finished


RandomizedSearchCV(cv=3, error_score=nan,
                   estimator=RandomForestRegressor(bootstrap=True,
                                                   ccp_alpha=0.0,
                                                   criterion='mse',
                                                   max_depth=None,
                                                   max_features='auto',
                                                   max_leaf_nodes=None,
                                                   max_samples=None,
                                                   min_impurity_decrease=0.0,
                                                   min_impurity_split=None,
                                                   min_samples_leaf=1,
                                                   min_samples_split=2,
                                                   min_weight_fraction_leaf=0.0,
                                                   n_estimators=100,
                              

In [15]:
rfr_random.best_params_

{'n_estimators': 600,
 'min_samples_split': 2,
 'min_samples_leaf': 1,
 'max_features': 'auto',
 'max_depth': 70,
 'bootstrap': True}

In [16]:
base_model = RandomForestRegressor(n_estimators = 10, random_state = 42)
base_model.fit(X_train, y_train)
base_accuracy = evaluate(base_model, X_train, y_train)

best_random = rfr_random.best_estimator_
random_accuracy = evaluate(best_random, X_train, y_train)

print('Improvement of {:0.2f}%.'.format( 100 * (random_accuracy[0] - base_accuracy[0]) / base_accuracy[0]))

Model Performance
Average Error: 1.5114 degrees
Accuracy = 98.52%
r2: 0.997


Model Performance
Average Error: 1.1706 degrees
Accuracy = 98.86%
r2: 0.998


Improvement of 0.35%.


### Test models against validation data split

To see the effects of the tuning process, we will take the two models, one with the preset and one with the optimized hyperparameters, and test each one against the validation data set. We should expect to see a similar model performance improvement as what we see above via the `evaluate` function (the difference will be, the `evaluate` function only compared models based on training data split, not including validation data split).