# Grid Search for Optimal Hyperparameter Values

To find the optimal hyperparameter values for the *learning rate*, *minimum amount of data in each leaf*, and *maximum number of leaves on each node*, we'll look through the RMSE values for each of the experiments (averaged of 10 replications for each configuration). We will do this for four types of model: MVN NGBoost trained with and without SST gradients and MVN NGBoost trained with polar velocities or cartesian velocities.

The hyperparameter options for the grid search are:

* Learning rate, $\eta$: $\{0.01, 0.1, 1.0\}$
* Minimum data per leaf: $\{1, 15, 32\}$
* Maximum leaves per branch: $\{8, 15, 32, 64\}$

Along with the fixed model hyperparameters:

* Maximum boosting iterations, $B = 1000$
* Early stopping rounds: $50$
* Maximum tree depth: $15$


In [1]:
# load modules and the RMSE values for each configuration

import pickle
import os
import sys
import numpy as np
sys.path.append(os.path.abspath(os.path.join(os.path.dirname('hpc_scripts'), '..')))
from experiments.hpc_scripts.config_search import model_config

with open(r'full_experiment_grid_search_early_stopping_100.p', 'rb') as pickle_file:
    grid = pickle.load(pickle_file)

# model configurations path
model_filepath = '../model_configuration_ids.p'

In [2]:
# add sst_flag and polar_flag to df

vals = grid.index.map(lambda id: model_config(id,model_filepath=model_filepath)[3:5]).values
grid[['sst_flag','polar_flag']] = np.array([val for val in vals])

In [3]:
print('\nWorst performing hyperparameter configurations')
ids = grid.tail(5).index.values

for id in ids:
    model_config(id,model_filepath,verbose=True)
    print('\n RMSE \n--------------------------------')
    print(f'{grid.loc[id,'RMSE']:.2f} cm/s')


Worst performing hyperparameter configurations

config ID: 139

 Hyperparameter Values
---------------------------
learning rate: 0.01
min. data per leaf: 32
max. leaves per node: 32

 Data Information
---------------------------
Uses SST gradient: True
Uses velocities in polar form: True

 RMSE 
--------------------------------
nan cm/s

config ID: 140

 Hyperparameter Values
---------------------------
learning rate: 0.01
min. data per leaf: 32
max. leaves per node: 64

 Data Information
---------------------------
Uses SST gradient: False
Uses velocities in polar form: False

 RMSE 
--------------------------------
nan cm/s

config ID: 141

 Hyperparameter Values
---------------------------
learning rate: 0.01
min. data per leaf: 32
max. leaves per node: 64

 Data Information
---------------------------
Uses SST gradient: False
Uses velocities in polar form: True

 RMSE 
--------------------------------
nan cm/s

config ID: 142

 Hyperparameter Values
---------------------------
le

In [4]:
# identify the top parameters for each combination of (SST_flag, polar_flag)
# sst flag: False, polar_flag: False
grid_sst_false_polar_false = grid.query('not sst_flag and not polar_flag')[['RMSE']]
# sst flag: False, polar_flag: True
grid_sst_false_polar_true = grid.query('not sst_flag and polar_flag')[['RMSE']]
# sst flag: True, polar_flag: False
grid_sst_true_polar_false = grid.query('sst_flag and not polar_flag')[['RMSE']]
# sst flag: True, polar_flag: True
grid_sst_true_polar_true = grid.query('sst_flag and polar_flag')[['RMSE']]


In [5]:
grid_sst_true_polar_true.head(1).index.values[0]

print('\nOptimal Hyperparameter Configurations')

print(f'\nFor sst_flag: False, polar_flag: False \n Config ID: {grid_sst_false_polar_false.head(1).index.values[0]}')
print(f'\nFor sst_flag: False, polar_flag: True \n Config ID: {grid_sst_false_polar_true.head(1).index.values[0]}')
print(f'\nFor sst_flag: True, polar_flag: False \n Config ID: {grid_sst_true_polar_false.head(1).index.values[0]}')
print(f'\nFor sst_flag: True, polar_flag: True \n Config ID: {grid_sst_true_polar_true.head(1).index.values[0]}')


Optimal Hyperparameter Configurations

For sst_flag: False, polar_flag: False 
 Config ID: 8

For sst_flag: False, polar_flag: True 
 Config ID: 45

For sst_flag: True, polar_flag: False 
 Config ID: 38

For sst_flag: True, polar_flag: True 
 Config ID: 39


In [7]:
for id in [8,45,38,39]:
    model_config(id, model_filepath,verbose=True)

    print('\n RMSE \n --------------------------------')
    print(f'{grid.loc[id,'RMSE']:.2f} cm/s')


config ID: 8

 Hyperparameter Values
---------------------------
learning rate: 0.1
min. data per leaf: 16
max. leaves per node: 32

 Data Information
---------------------------
Uses SST gradient: False
Uses velocities in polar form: False

 RMSE 
 --------------------------------
14.65 cm/s

config ID: 45

 Hyperparameter Values
---------------------------
learning rate: 0.1
min. data per leaf: 32
max. leaves per node: 64

 Data Information
---------------------------
Uses SST gradient: False
Uses velocities in polar form: True

 RMSE 
 --------------------------------
14.58 cm/s

config ID: 38

 Hyperparameter Values
---------------------------
learning rate: 0.1
min. data per leaf: 32
max. leaves per node: 16

 Data Information
---------------------------
Uses SST gradient: True
Uses velocities in polar form: False

 RMSE 
 --------------------------------
14.44 cm/s

config ID: 39

 Hyperparameter Values
---------------------------
learning rate: 0.1
min. data per leaf: 32
max. l