<h1 style="text-align:center;">Combining hyperparameters</h1>

Let's imagine tuning a radio. You're trying to get the clearest sound from your favorite station. Each button or knob you adjust is like a hyperparameter in machine learning. 

**RandomizedSearchCV**: Imagine spinning all the knobs at once, randomly, and hoping to land on the perfect setting. It's quick but might not be precise.

**One Hyperparameter at a Time**: Instead, you could carefully adjust one knob (say, volume) to its best, then move onto the next (say, bass), and so on. It's systematic and you'll know the impact of each knob. 

For instance, even if a low volume (like `n_estimators = 2`) sounds best initially, you'd still want to try the entire volume range to ensure clarity in all conditions. 

In essence, test all variations and keep refining based on feedback, just like slowly refining your radio's sound.


In [1]:
import os
import warnings

os.environ['PYTHONWARNINGS'] = 'ignore::FutureWarning' 

import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import (train_test_split, cross_val_score, 
                        StratifiedKFold, GridSearchCV, RandomizedSearchCV)

from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score

from helper_file import *

warnings.filterwarnings("ignore", category=FutureWarning) 
# export PYTHONWARNINGS="ignore::FutureWarning"

In [2]:
import xgboost
print(xgboost.__version__)

1.7.6


In [3]:
data_path = "data/heart_disease.csv"
df = pd.read_csv(data_path)
df.sample(n=5, random_state=43)

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
242,64,1,0,145,212,0,0,132,0,2.0,1,2,1,0
130,54,0,2,160,201,0,1,163,0,0.0,2,1,2,1
208,49,1,2,120,188,0,1,139,0,2.0,1,3,3,0
160,56,1,1,120,240,0,1,169,0,0.0,0,0,2,1
124,39,0,2,94,199,0,1,179,0,0.0,2,0,2,1


### 1. `n_estimators`

In [4]:
grid_search(df, 'target', params={'n_estimators':[2, 25, 50, 75, 100]})

Best params: {'n_estimators': 100}
Best score: 0.80224
search completed!


### 2. `max_depth`

In [8]:
grid_search(df, 'target', 
            params={'max_depth':[1, 2, 3, 4, 5, 6, 7, 8], 
                    'n_estimators':[50]})

Best params: {'max_depth': 1, 'n_estimators': 50}
Best score: 0.83825
search completed!


Imagine we're planting a garden. We start with just one type of seed (a "decision tree stump") and get a decent crop. But then we try adding just a bit more variety—two kinds of seeds—and our yield improves by a solid margin.

But what if we're missing out on even better harvests? Maybe planting only two or a hundred could work wonders, especially when paired with the right fertilizer (our `max_depth`). Time to experiment and see what combination gives us the most bountiful garden.


In [9]:
grid_search(df, 'target', 
            params={'max_depth':[1, 2, 3, 4, 6, 7, 8],
                    'n_estimators':[2, 50, 100]})

Best params: {'max_depth': 1, 'n_estimators': 50}
Best score: 0.83825
search completed!


`n_estimators=50` and `max_depth=1` still give the best results, so we will use them going forward, returning to our early stopping analysis later.

### 3. `learning_rate`

Since `n_estimators` is reasonably low, adjusting `learning_rate` may improve results. A standard range is provided below:

In [10]:
grid_search(df, 'target', 
            params={'learning_rate':[0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5], 
                    'max_depth':[1], 'n_estimators':[50]})

Best params: {'learning_rate': 0.2, 'max_depth': 1, 'n_estimators': 50}
Best score: 0.84164
search completed!


Our scores are going up. Let us continue on to the next hypeparameter.

### 4. `min_child_weight`


In [13]:
grid_search(df, 'target', 
            params={'min_child_weight':[1, 2, 3, 4, 5], 
                    'max_depth':[1], 'learning_rate': [0.2],
                    'n_estimators':[50]})

Best params: {'learning_rate': 0.2, 'max_depth': 1, 'min_child_weight': 2, 'n_estimators': 50}
Best score: 0.84486
search completed!


We are doing better.

### 5. `subsample`

In [14]:
grid_search(df, 'target', 
            params={'subsample':[0.5, 0.6, 0.7, 0.8, 0.9, 1],
                    'min_child_weight':[2], 
                    'max_depth':[1], 'learning_rate': [0.2],
                    'n_estimators':[50]})

Best params: {'learning_rate': 0.2, 'max_depth': 1, 'min_child_weight': 2, 'n_estimators': 50, 'subsample': 0.5}
Best score: 0.85153
search completed!


This is great so far. Let us go back and try and see if a comprehensive grid search would provide different values and better score

In [15]:
grid_search(df, 'target', 
            
            params={'subsample':[0.5, 0.6, 0.7, 0.8, 0.9, 1],

                    'min_child_weight':[1, 2, 3, 4, 5],

                    'learning_rate':[0.1, 0.2, 0.3, 0.4, 0.5],

                    'max_depth':[1, 2, 3, 4, 5],

                    'n_estimators':[2]})

Best params: {'learning_rate': 0.5, 'max_depth': 4, 'min_child_weight': 5, 'n_estimators': 2, 'subsample': 1}
Best score: 0.83519
search completed!


Our classifier with only two trees performs worse. The initial scores gotten above were better; seems like it does not go through enough iterations for the hyperparameters to make significant adjustments.

Think of choosing a winning lottery ticket from a massive drum filled with options. If we narrow down the numbers based on past wins (our previous knowledge) and only then randomly pick a ticket (using RandomizedSearchCV), we're more likely to hit the jackpot.

Now, if there were 4,500 possible tickets (combinations of hyperparameters from those given below), it'd take ages to try each one (like a grid search). Instead, we smartly select a subset, improving our chances without waiting a lifetime. We will set `random=True` in our function to choose `RandomizedSearchCV`.

In [16]:
grid_search(df, 'target', 
            params={'subsample':[0.5, 0.6, 0.7, 0.8, 0.9, 1],

                    'min_child_weight':[1, 2, 3, 4, 5],

                    'learning_rate':[0.1, 0.2, 0.3, 0.4, 0.5],

                    'max_depth':[1, 2, 3, 4, 5, None],

                    'n_estimators':[2, 25, 50, 75, 100]},

                    random=True)

Best params: {'subsample': 0.5, 'n_estimators': 50, 'min_child_weight': 5, 'max_depth': 1, 'learning_rate': 0.1}
Best score: 0.85481
search completed!


This is interesting. Our scores are up again!
Different values are obtaining good results.


1. `learning_rate`: In the first set, the learning rate is `0.2`, while in the second, it is reduced to `0.1`.
2. `min_child_weight`: The first set has a `min_child_weight` of `2`, whereas the second set has increased it to `5`.

The other parameters (`max_depth`, `n_estimators`, and `subsample`) remain the same across both 

We should use the hyperparameters from the best score we have received thus far going forward. 

Lets pick more hyperparameters.

### 6. `colsample`
We will try `colsample_bytree`, `colsample_bylevel`, and `colsample_bynode`, in that order.sets.

In [18]:
grid_search(df, 'target', 
           params={'subsample': [0.5], 'n_estimators': [50], 
                   'min_child_weight': [5], 'max_depth': [1], 
                   'learning_rate': [0.1], 'colsample_bytree':[0.5, 0.6, 0.7, 0.8, 0.9,1]}
)

Best params: {'colsample_bytree': 0.9, 'learning_rate': 0.1, 'max_depth': 1, 'min_child_weight': 5, 'n_estimators': 50, 'subsample': 0.5}
Best score: 0.85486
search completed!


In [19]:
grid_search(df, 'target', 
           params={'subsample': [0.5], 'n_estimators': [50], 
                   'min_child_weight': [5], 'max_depth': [1], 
                   'learning_rate': [0.1], 'colsample_bytree':[0.9],
                  'colsample_bylevel':[0.5, 0.6, 0.7, 0.8, 0.9, 1]}
)

Best params: {'colsample_bylevel': 1, 'colsample_bytree': 0.9, 'learning_rate': 0.1, 'max_depth': 1, 'min_child_weight': 5, 'n_estimators': 50, 'subsample': 0.5}
Best score: 0.85486
search completed!


Our scores seem to have peaked out here. We will repeat the entire code above as it without selecting the best parameter for `colsample_bylevel` so we see if anything changes when we test out the other with `colsample_bynode`.

In [21]:
grid_search(df, 'target', 
           params={'subsample': [0.5], 'n_estimators': [50], 
                   'min_child_weight': [5], 'max_depth': [1], 
                   'learning_rate': [0.1], 
                  'colsample_bylevel':[0.5, 0.6, 0.7, 0.8, 0.9, 1],
                   'colsample_bynode':[0.5, 0.6, 0.7, 0.8, 0.9, 1],
                   'colsample_bytree':[0.5, 0.6, 0.7, 0.8, 0.9,1]
                  }
)

Best params: {'colsample_bylevel': 0.8, 'colsample_bynode': 0.5, 'colsample_bytree': 0.9, 'learning_rate': 0.1, 'max_depth': 1, 'min_child_weight': 5, 'n_estimators': 50, 'subsample': 0.5}
Best score: 0.85809
search completed!


This is outstanding! Working together, the `colsamples` have provided the best score yet.

### 7. `gamma`

This is going to be the last hyperparameter that we fine-tune.

In [23]:
grid_search(df, 'target',
           params={'subsample': [0.5], 'n_estimators': [50], 
                   'min_child_weight': [5], 'max_depth': [1], 
                   'learning_rate': [0.1], 
                  'colsample_bylevel':[0.8],
                   'colsample_bynode':[0.5],
                   'colsample_bytree':[0.9],
                   'gamma':[0, 0.01, 0.05, 0.1, 0.5, 1, 2, 3],
                  }
)

Best params: {'colsample_bylevel': 0.8, 'colsample_bynode': 0.5, 'colsample_bytree': 0.9, 'gamma': 1, 'learning_rate': 0.1, 'max_depth': 1, 'min_child_weight': 5, 'n_estimators': 50, 'subsample': 0.5}
Best score: 0.86137
search completed!


Our best score is much higher than the original, which is a no small feat with XGBoost. I guess we will stop here.

#### **XGBoost is all you need**