- Generally, the XGBoost hyperparameters have been divided into 4 categories. They are as follows -

  - 1. General parameters
  - 2. Booster parameters
  - 3. Learning task parameters
  - 4. Command line parameters

- Before running a XGBoost model, we must set three types of parameters - **general parameters**, **booster parameters** and **task parameters**.

- The fourth type of parameters are **command line parameters**. They are only used in the console version of XGBoost. So, we will skip these parameters and limit our discussion to the first three type of parameters.

- Link : [Params for XGboost](https://xgboost.readthedocs.io/en/latest/parameter.html#parameters-for-tree-booster)
- General Parameters:
    - booster (gbtree/gblinear/tree)
    - verbosity
    - nthread

- Booster Parameters
    - Tree booster
        - eta (learning rate)
        - gamma
        - max_depth
        - min_child_weight
        - max_delta_step (try in imbalanced data especially with logistic regression)
        - subsample
        - colsample_(bytree, bylevel, bynode) (e.g {'colsample_bytree': 0.5, 'colsample_bylevel': 0.5, 'colsample_bynode': 0.5 } for 64 will reauslt in 8 features per split)
        - lambda (l2 regularization)
        - alpha (l1 regularization)
        - tree_method (auto/ exact(greedy)/ approx/ hist/ gpu_hist)
        - scale_pos_weight
        - max_leaves

- Learning Task Parameters
    - objective (loss function tro be minimized)
    - eval_metric (rmse/mae/acc/auc..,)
    - seed

- k-fold CV in XGBoost
    - nfolds
    - num_boost_round
    - metrics
    - early_stopping_rounds (stop training if hold out metric does not improve for given number of rounds)
    - seed


In [2]:
# imports

# read data - Wholesale customer data.csv [https://archive.ics.uci.edu/ml/datasets/wholesale+customers]

# split feature and target variable

# run basic eda (summary statistics, missing value checks, shape, categprocal feature handling)

# train test split

# make baseline model and get prediction and evaluation metric

# k-fold cross validation

# use xgboost/rf for feature importance


## Bayesian Optimization

- finding the best parameter for any ML/DL model
- constantly learns from previous optimizations
    - surrogate model
    - acquisition function

## Hyperopt

- Search for hyperparameters in a search space optimally
- Used in model tuning

## Parts of the optimiztaion process:

- Initialize domain space
- Define objective function
- Optimization algorithm
- Result evaluation

In [6]:
import os
import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn.metrics import accuracy_score
from hyperopt import tpe, hp, fmin, Trials, STATUS_OK

In [7]:
os.getcwd()

'C:\\Users\\Devansh\\Desktop\\Devansh\\NMIMS\\Machine Learning\\Notebooks'

In [10]:
df = pd.read_csv('C:\\Users\\Devansh\\Desktop\\Devansh\\NMIMS\\Machine Learning\\Kaggle Datasets\\Wholesale customers data.csv')

In [11]:
df.head()

Unnamed: 0,Channel,Region,Fresh,Milk,Grocery,Frozen,Detergents_Paper,Delicassen
0,2,3,12669,9656,7561,214,2674,1338
1,2,3,7057,9810,9568,1762,3293,1776
2,2,3,6353,8808,7684,2405,3516,7844
3,1,3,13265,1196,4221,6404,507,1788
4,2,3,22615,5410,7198,3915,1777,5185


In [12]:
X = df.drop('Channel', axis = 1)
y = df['Channel']

In [13]:
y.value_counts()

1    298
2    142
Name: Channel, dtype: int64

In [14]:
y[y==2] = 0
y[y==1] = 1

In [15]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.3)

The available hyperopt optimization algorithms are -

- **hp.choice(label, options)** — Returns one of the options, which should be a list or tuple.

- **hp.randint(label, upper)** — Returns a random integer between the range [0, upper).

- **hp.uniform(label, low, high)** — Returns a value uniformly between low and high.

- **hp.quniform(label, low, high, q)** — Returns a value round(uniform(low, high) / q) * q, i.e it rounds the decimal values and returns an integer.

- **hp.normal(label, mean, std)** — Returns a real value that’s normally-distributed with mean and standard deviation sigma.

In [16]:
space={'max_depth': hp.quniform("max_depth", 3, 18, 1),
       'gamma': hp.uniform ('gamma', 1,9),
       'reg_alpha' : hp.quniform('reg_alpha', 40,180,1),
       'reg_lambda' : hp.uniform('reg_lambda', 0,1),
       'colsample_bytree' : hp.uniform('colsample_bytree', 0.5,1),
       'min_child_weight' : hp.quniform('min_child_weight', 0, 10, 1),
       'n_estimators': 180,
       'seed': 0
       }

In [22]:
def objective(space):
       clf = xgb.XGBClassifier(
              n_estimators = space['n_estimators'],
              gamma = int(space['gamma']),
              reg_alpha = int(space['reg_alpha']),
              reg_lambda = int(space['reg_lambda']),
              colsample_bytree = int(space['colsample_bytree']),
              min_child_weight  = int(space['min_child_weight']),
              max_depth = int(space['max_depth'])
       )
       evaluation = [(X_train, y_train), (X_test, y_test)]
       clf.fit(X_train, y_train,
               eval_set=evaluation, eval_metric = 'auc',
               early_stopping_rounds = 10, verbose = False)

       pred = clf.predict(X_test)
       accuracy = accuracy_score(y_test, pred>0.5)
       print('SCORE:', accuracy)
       return {'loss': -accuracy, 'status': STATUS_OK}

In [26]:
trials = Trials()
best_hyperparameters = fmin(fn = objective, space = space, algo = tpe.suggest, max_evals = 100, trials = trials)

SCORE:                                                 
0.3560606060606061                                     
SCORE:                                                 
0.3560606060606061                                                                
SCORE:                                                                            
0.3560606060606061                                                                
SCORE:                                                                            
0.8939393939393939                                                                
SCORE:                                                                            
0.3560606060606061                                                                
  4%|▍         | 4/100 [00:00<00:03, 29.40trial/s, best loss: -0.8939393939393939]









SCORE:                                                                            
0.6439393939393939                                                                
SCORE:                                                                            
0.6439393939393939                                                                
SCORE:                                                                            
0.3560606060606061                                                                
SCORE:                                                                            
0.3560606060606061                                                                
SCORE:                                                                            
0.3560606060606061                                                                
  9%|▉         | 9/100 [00:00<00:03, 23.53trial/s, best loss: -0.8939393939393939]








SCORE:                                                                            
0.3560606060606061                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
 15%|█▌        | 15/100 [00:00<00:03, 24.42trial/s, best loss: -0.8939393939393939]








SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.8939393939393939                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.6439393939393939                                                                 
 19%|█▉        | 19/100 [00:00<00:03, 24.48trial/s, best loss: -0.8939393939393939]








SCORE:                                                                             
0.8939393939393939                                                                 
SCORE:                                                                             
0.8939393939393939                                                                 
SCORE:                                                                             
0.6439393939393939                                                                 
SCORE:                                                                             
0.8939393939393939                                                                 
 24%|██▍       | 24/100 [00:01<00:03, 20.38trial/s, best loss: -0.8939393939393939]







SCORE:                                                                             
0.6439393939393939                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.8939393939393939                                                                 
SCORE:                                                                             
0.8939393939393939                                                                 
 27%|██▋       | 27/100 [00:01<00:03, 18.68trial/s, best loss: -0.8939393939393939]







SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.8939393939393939                                                                 
SCORE:                                                                             
0.6439393939393939                                                                 
 31%|███       | 31/100 [00:01<00:03, 17.30trial/s, best loss: -0.8939393939393939]






SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.6439393939393939                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
 35%|███▌      | 35/100 [00:01<00:04, 16.00trial/s, best loss: -0.8939393939393939]






SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.8939393939393939                                                                 
 37%|███▋      | 37/100 [00:01<00:04, 15.55trial/s, best loss: -0.8939393939393939]





SCORE:                                                                             
0.6439393939393939                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.6439393939393939                                                                 
 39%|███▉      | 39/100 [00:02<00:04, 12.93trial/s, best loss: -0.8939393939393939]






SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.8939393939393939                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
 43%|████▎     | 43/100 [00:02<00:03, 14.28trial/s, best loss: -0.8939393939393939]







SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
                                                                                   






SCORE:
0.8939393939393939                                                                 
SCORE:                                                                             
0.6439393939393939                                                                 
SCORE:                                                                             
0.8939393939393939                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
 49%|████▉     | 49/100 [00:02<00:03, 14.24trial/s, best loss: -0.8939393939393939]







SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
 53%|█████▎    | 53/100 [00:03<00:03, 14.08trial/s, best loss: -0.8939393939393939]






SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
 57%|█████▋    | 57/100 [00:03<00:03, 14.26trial/s, best loss: -0.8939393939393939]







SCORE:                                                                             
0.8939393939393939                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.6439393939393939                                                                 
SCORE:                                                                             
0.8939393939393939                                                                 
 61%|██████    | 61/100 [00:03<00:02, 14.54trial/s, best loss: -0.8939393939393939]






SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.6439393939393939                                                                 
SCORE:                                                                             
0.8939393939393939                                                                 
 63%|██████▎   | 63/100 [00:03<00:02, 14.70trial/s, best loss: -0.8939393939393939]







SCORE:                                                                             
0.6439393939393939                                                                 
SCORE:                                                                             
0.8939393939393939                                                                 
SCORE:                                                                             
0.6439393939393939                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
 67%|██████▋   | 67/100 [00:04<00:02, 14.33trial/s, best loss: -0.8939393939393939]






SCORE:                                                                             
0.8939393939393939                                                                 
SCORE:                                                                             
0.8939393939393939                                                                 
SCORE:                                                                             
0.8939393939393939                                                                 
 71%|███████   | 71/100 [00:04<00:02, 14.23trial/s, best loss: -0.8939393939393939]






SCORE:                                                                             
0.8939393939393939                                                                 
SCORE:                                                                             
0.6439393939393939                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
 73%|███████▎  | 73/100 [00:04<00:01, 13.89trial/s, best loss: -0.8939393939393939]






SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.8939393939393939                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
 77%|███████▋  | 77/100 [00:04<00:01, 13.73trial/s, best loss: -0.8939393939393939]






SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.8939393939393939                                                                 
SCORE:                                                                             
0.8939393939393939                                                                 
 79%|███████▉  | 79/100 [00:04<00:01, 13.89trial/s, best loss: -0.8939393939393939]






SCORE:                                                                             
0.8939393939393939                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.6439393939393939                                                                 
 83%|████████▎ | 83/100 [00:05<00:01, 13.68trial/s, best loss: -0.8939393939393939]






SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.8939393939393939                                                                 
SCORE:                                                                             
0.6439393939393939                                                                 
 85%|████████▌ | 85/100 [00:05<00:01, 13.80trial/s, best loss: -0.8939393939393939]






SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.6439393939393939                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
 89%|████████▉ | 89/100 [00:05<00:00, 13.84trial/s, best loss: -0.8939393939393939]






SCORE:                                                                             
0.8939393939393939                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
 91%|█████████ | 91/100 [00:05<00:00, 13.80trial/s, best loss: -0.8939393939393939]






SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.8939393939393939                                                                 
 95%|█████████▌| 95/100 [00:06<00:00, 13.53trial/s, best loss: -0.8939393939393939]






SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.6439393939393939                                                                 
 97%|█████████▋| 97/100 [00:06<00:00, 13.10trial/s, best loss: -0.8939393939393939]






SCORE:                                                                             
0.6439393939393939                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
SCORE:                                                                             
0.3560606060606061                                                                 
100%|██████████| 100/100 [00:06<00:00, 15.40trial/s, best loss: -0.8939393939393939]






In [25]:
print(best_hyperparameters)

{'colsample_bytree': 0.7328418660823692, 'gamma': 5.29200890859719, 'max_depth': 10.0, 'min_child_weight': 2.0, 'reg_alpha': 70.0, 'reg_lambda': 0.1370711487274967}
