# Grid Search for Optimal Hyperparameter Values

To find the optimal hyperparameter values for the *learning rate*, *minimum amount of data in each leaf*, and *maximum number of leaves on each node*, we'll look through the RMSE values for each of the experiments (averaged of 10 replications for each configuration). We will do this for four types of model: MVN NGBoost trained with and without SST gradients and MVN NGBoost trained with polar velocities or cartesian velocities.

The hyperparameter options for the grid search are:

* Learning rate, $\eta$: $\{0.01, 0.1, 1.0\}$
* Minimum data per leaf: $\{1, 15, 32\}$
* Maximum leaves per branch: $\{8, 15, 32, 64\}$

Along with the fixed model hyperparameters:

* Maximum boosting iterations, $B = 1000$
* Early stopping rounds: $50$
* Maximum tree depth: $15$


In [None]:
# load modules and the RMSE values for each configuration

import pickle
import os
import sys
import numpy as np
sys.path.append(os.path.abspath(os.path.join(os.path.dirname('seaducks'), '..')))
from seaducks.model_selection import get_info_from_config

with open(r'experiment_date_2025-02-03_early_stopping_100_grid_search.p', 'rb') as pickle_file:
    grid = pickle.load(pickle_file)

# model configurations path
model_filepath = r'../model_configuration_ids.p'

In [5]:
with open(model_filepath, 'rb') as pickle_file:
    grid = pickle.load(pickle_file)

In [6]:
grid #figure out a way to only include ones that have all 10 replications present

{0: [0.1, 16, 8, False, False],
 1: [0.1, 16, 8, False, True],
 2: [0.1, 16, 8, True, False],
 3: [0.1, 16, 8, True, True],
 4: [0.1, 16, 16, False, False],
 5: [0.1, 16, 16, False, True],
 6: [0.1, 16, 16, True, False],
 7: [0.1, 16, 16, True, True],
 8: [0.1, 16, 32, False, False],
 9: [0.1, 16, 32, False, True],
 10: [0.1, 16, 32, True, False],
 11: [0.1, 16, 32, True, True],
 12: [0.1, 16, 64, False, False],
 13: [0.1, 16, 64, False, True],
 14: [0.1, 16, 64, True, False],
 15: [0.1, 16, 64, True, True],
 16: [0.1, 1, 8, False, False],
 17: [0.1, 1, 8, False, True],
 18: [0.1, 1, 8, True, False],
 19: [0.1, 1, 8, True, True],
 20: [0.1, 1, 16, False, False],
 21: [0.1, 1, 16, False, True],
 22: [0.1, 1, 16, True, False],
 23: [0.1, 1, 16, True, True],
 24: [0.1, 1, 32, False, False],
 25: [0.1, 1, 32, False, True],
 26: [0.1, 1, 32, True, False],
 27: [0.1, 1, 32, True, True],
 28: [0.1, 1, 64, False, False],
 29: [0.1, 1, 64, False, True],
 30: [0.1, 1, 64, True, False],
 31: [0.1

In [None]:
# add sst_flag and polar_flag to df

vals = grid.index.map(lambda id: get_info_from_config(id,model_filepath=model_filepath)[3:5]).values
grid[['sst_flag','polar_flag']] = np.array([val for val in vals])

In [None]:
print('\nWorst performing hyperparameter configurations')
ids = grid.tail(5).index.values

for id in ids:
    get_info_from_config(id,model_filepath,verbose=True)
    print('\n RMSE \n--------------------------------')
    print(f'{grid.loc[id,'RMSE']:.2f} cm/s')


Worst performing hyperparameter configurations

config ID: 129

 Hyperparameter Values
---------------------------
learning rate: 0.01
min. data per leaf: 32
max. leaves per node: 8

 Data Information
---------------------------
Uses SST gradient: False
Uses velocities in polar form: True

 RMSE 
--------------------------------
nan cm/s

config ID: 130

 Hyperparameter Values
---------------------------
learning rate: 0.01
min. data per leaf: 32
max. leaves per node: 8

 Data Information
---------------------------
Uses SST gradient: True
Uses velocities in polar form: False

 RMSE 
--------------------------------
nan cm/s

config ID: 131

 Hyperparameter Values
---------------------------
learning rate: 0.01
min. data per leaf: 32
max. leaves per node: 8

 Data Information
---------------------------
Uses SST gradient: True
Uses velocities in polar form: True

 RMSE 
--------------------------------
nan cm/s

config ID: 134

 Hyperparameter Values
---------------------------
learni

In [6]:
# identify the top parameters for each combination of (SST_flag, polar_flag)
# sst flag: False, polar_flag: False
grid_sst_false_polar_false = grid.query('not sst_flag and not polar_flag')[['RMSE']]
# sst flag: False, polar_flag: True
grid_sst_false_polar_true = grid.query('not sst_flag and polar_flag')[['RMSE']]
# sst flag: True, polar_flag: False
grid_sst_true_polar_false = grid.query('sst_flag and not polar_flag')[['RMSE']]
# sst flag: True, polar_flag: True
grid_sst_true_polar_true = grid.query('sst_flag and polar_flag')[['RMSE']]


In [7]:
grid_sst_false_polar_false

Unnamed: 0_level_0,RMSE
config ID,Unnamed: 1_level_1
136,14.740642
36,14.80333
132,14.81425
140,14.821918
40,14.822587
108,14.836618
104,14.839591
0,14.840364
4,14.841547
100,14.85225


In [8]:
print('\nOptimal Hyperparameter Configurations')

print(f'\nFor sst_flag: False, polar_flag: False \n Config ID: {grid_sst_false_polar_false.head(1).index.values[0]}')
print(f'\nFor sst_flag: False, polar_flag: True \n Config ID: {grid_sst_false_polar_true.head(1).index.values[0]}')
print(f'\nFor sst_flag: True, polar_flag: False \n Config ID: {grid_sst_true_polar_false.head(1).index.values[0]}')
print(f'\nFor sst_flag: True, polar_flag: True \n Config ID: {grid_sst_true_polar_true.head(1).index.values[0]}')


Optimal Hyperparameter Configurations

For sst_flag: False, polar_flag: False 
 Config ID: 136

For sst_flag: False, polar_flag: True 
 Config ID: 133

For sst_flag: True, polar_flag: False 
 Config ID: 106

For sst_flag: True, polar_flag: True 
 Config ID: 103


In [3]:
for id in [36,37,38,39]:
    model_config(id, model_filepath,verbose=True)

    print('\n RMSE \n --------------------------------')
    print(f'{grid.loc[id,'RMSE']:.2f} cm/s')

with open(model_filepath, 'rb') as filehandler:
    configs = pickle.load(filehandler)


config ID: 36

 Hyperparameter Values
---------------------------
learning rate: 0.1
min. data per leaf: 32
max. leaves per node: 16

 Data Information
---------------------------
Uses SST gradient: False
Uses velocities in polar form: False

 RMSE 
 --------------------------------
14.80 cm/s

config ID: 37

 Hyperparameter Values
---------------------------
learning rate: 0.1
min. data per leaf: 32
max. leaves per node: 16

 Data Information
---------------------------
Uses SST gradient: False
Uses velocities in polar form: True

 RMSE 
 --------------------------------
14.79 cm/s

config ID: 38

 Hyperparameter Values
---------------------------
learning rate: 0.1
min. data per leaf: 32
max. leaves per node: 16

 Data Information
---------------------------
Uses SST gradient: True
Uses velocities in polar form: False

 RMSE 
 --------------------------------
14.58 cm/s

config ID: 39

 Hyperparameter Values
---------------------------
learning rate: 0.1
min. data per leaf: 32
max. 

In [32]:
config_id = int(np.floor((1440-1)/10))

In [33]:
config_id

143