<a href="https://colab.research.google.com/github/ustcsteve/XGBoost-Use-Guide/blob/main/XGBoost_011022.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Step1: Install XGBoost

We can use

In [4]:
%pip install xgboost 



In [5]:
# check xgboost version
import xgboost as xgb
print(xgb.__version__)

0.90


Current version is 0.90. 

Step2. Import dataset

Data Set Information:


Import pandas to read csv into dataframe; import numpy for mathmatical operations of arrays

In [6]:
import pandas as pd
import numpy as np

Read csv file into dataframe

In [7]:
df = pd.read_csv('bankmarketing_train.csv', na_values=('unknown', 'NA', 'NaN', 'None', '', ' '))

In [8]:
df.head() #Get an brief overview of the dataset

Unnamed: 0,age,job,marital,education,default,housing,loan,contact,month,day_of_week,duration,campaign,pdays,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,y
0,57,technician,married,high.school,no,no,yes,cellular,may,mon,371,1,999,1,failure,-1.8,92.893,-46.2,1.299,5099.1,no
1,55,,married,,,yes,no,telephone,may,thu,285,2,999,0,nonexistent,1.1,93.994,-36.4,4.86,5191.0,no
2,33,blue-collar,married,basic.9y,no,no,no,cellular,may,fri,52,1,999,1,failure,-1.8,92.893,-46.2,1.313,5099.1,no
3,36,admin.,married,high.school,no,no,no,telephone,jun,fri,355,4,999,0,nonexistent,1.4,94.465,-41.8,4.967,5228.1,no
4,27,housemaid,married,high.school,no,yes,no,cellular,jul,fri,189,2,999,0,nonexistent,1.4,93.918,-42.7,4.963,5228.1,no


In [9]:
df.isnull().mean() # Find the percentage of missing values

age               0.000000
job               0.008346
marital           0.002003
education         0.042367
default           0.208892
housing           0.023733
loan              0.023733
contact           0.000000
month             0.000000
day_of_week       0.000000
duration          0.000000
campaign          0.000000
pdays             0.000000
previous          0.000000
poutcome          0.000000
emp.var.rate      0.000000
cons.price.idx    0.000000
cons.conf.idx     0.000000
euribor3m         0.000000
nr.employed       0.000000
y                 0.000000
dtype: float64

There are about maximum 20.9% missing values in the default column. The missing values will be treated as another category in the one-hot encoding.

In [10]:
df.shape #Find the dimension of data

(32950, 21)

In [11]:
df.dtypes #Find the datatypes to see if we need to convert any to appropriate type

age                 int64
job                object
marital            object
education          object
default            object
housing            object
loan               object
contact            object
month              object
day_of_week        object
duration            int64
campaign            int64
pdays               int64
previous            int64
poutcome           object
emp.var.rate      float64
cons.price.idx    float64
cons.conf.idx     float64
euribor3m         float64
nr.employed       float64
y                  object
dtype: object

Convert categorical data type from object type

In [12]:
df['job']=df['job'].astype("category")
df['marital']=df['marital'].astype("category")
df['education']=df['education'].astype("category")
df['default']=df['default'].astype("category")
df['housing']=df['housing'].astype("category")
df['loan']=df['loan'].astype("category")
df['contact']=df['contact'].astype("category")
df['month']=df['month'].astype("category")
df['day_of_week']=df['day_of_week'].astype("category")
df['poutcome']=df['poutcome'].astype("category")

In [13]:
df.columns #Find all column names

Index(['age', 'job', 'marital', 'education', 'default', 'housing', 'loan',
       'contact', 'month', 'day_of_week', 'duration', 'campaign', 'pdays',
       'previous', 'poutcome', 'emp.var.rate', 'cons.price.idx',
       'cons.conf.idx', 'euribor3m', 'nr.employed', 'y'],
      dtype='object')

In [14]:
df['y'].unique() #Find the classes of the label for mapping

array(['no', 'yes'], dtype=object)

Convert yes and no to int

In [15]:
class_mapper = {'yes':1,'no':0}
df['y']=df['y'].replace(class_mapper)

In [16]:
df['y'].unique()

array([0, 1])

In [17]:
df.dtypes#Check conversion results

age                  int64
job               category
marital           category
education         category
default           category
housing           category
loan              category
contact           category
month             category
day_of_week       category
duration             int64
campaign             int64
pdays                int64
previous             int64
poutcome          category
emp.var.rate       float64
cons.price.idx     float64
cons.conf.idx      float64
euribor3m          float64
nr.employed        float64
y                    int64
dtype: object

In [18]:
from sklearn.preprocessing import OneHotEncoder

One-hot encoding of categorical data

In [19]:
onehot_columns = ['job', 'marital', 'education', 'default', 'housing', 'loan',
       'contact', 'month', 'day_of_week', 'poutcome']
onehot_df = df[onehot_columns]
onehot_df = pd.get_dummies(onehot_df, columns = onehot_columns)
df_onehot_drop = df.drop(onehot_columns, axis = 1)
df_onehot_final = pd.concat([df_onehot_drop, onehot_df], axis = 1)

In [20]:
df_onehot_final.head()#Get a brief overview of the data encoding

Unnamed: 0,age,duration,campaign,pdays,previous,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,y,job_admin.,job_blue-collar,job_entrepreneur,job_housemaid,job_management,job_retired,job_self-employed,job_services,job_student,job_technician,job_unemployed,marital_divorced,marital_married,marital_single,education_basic.4y,education_basic.6y,education_basic.9y,education_high.school,education_illiterate,education_professional.course,education_university.degree,default_no,default_yes,housing_no,housing_yes,loan_no,loan_yes,contact_cellular,contact_telephone,month_apr,month_aug,month_dec,month_jul,month_jun,month_mar,month_may,month_nov,month_oct,month_sep,day_of_week_fri,day_of_week_mon,day_of_week_thu,day_of_week_tue,day_of_week_wed,poutcome_failure,poutcome_nonexistent,poutcome_success
0,57,371,1,999,1,-1.8,92.893,-46.2,1.299,5099.1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,1,0,1,0,0,1,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0
1,55,285,2,999,0,1.1,93.994,-36.4,4.86,5191.0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,0
2,33,52,1,999,1,-1.8,92.893,-46.2,1.313,5099.1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,1,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0
3,36,355,4,999,0,1.4,94.465,-41.8,4.967,5228.1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,1,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0
4,27,189,2,999,0,1.4,93.918,-42.7,4.963,5228.1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0


In [21]:
df_onehot_final.describe()

Unnamed: 0,age,duration,campaign,pdays,previous,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,y,job_admin.,job_blue-collar,job_entrepreneur,job_housemaid,job_management,job_retired,job_self-employed,job_services,job_student,job_technician,job_unemployed,marital_divorced,marital_married,marital_single,education_basic.4y,education_basic.6y,education_basic.9y,education_high.school,education_illiterate,education_professional.course,education_university.degree,default_no,default_yes,housing_no,housing_yes,loan_no,loan_yes,contact_cellular,contact_telephone,month_apr,month_aug,month_dec,month_jul,month_jun,month_mar,month_may,month_nov,month_oct,month_sep,day_of_week_fri,day_of_week_mon,day_of_week_thu,day_of_week_tue,day_of_week_wed,poutcome_failure,poutcome_nonexistent,poutcome_success
count,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0,32950.0
mean,40.040212,257.335205,2.56173,962.17478,0.17478,0.076228,93.574243,-40.51868,3.615654,5166.859608,0.112049,0.25308,0.223247,0.036206,0.025918,0.071108,0.042124,0.034598,0.095933,0.020698,0.164674,0.024067,0.111563,0.605948,0.280486,0.101153,0.056055,0.147496,0.229226,0.000455,0.128346,0.294901,0.791017,9.1e-05,0.453293,0.522974,0.824461,0.151806,0.63569,0.36431,0.063976,0.148164,0.00437,0.174598,0.129014,0.012777,0.336055,0.10003,0.017269,0.013748,0.189954,0.206434,0.210531,0.196783,0.196297,0.104734,0.86173,0.033536
std,10.432313,257.3317,2.763646,187.646785,0.496503,1.572242,0.578636,4.623004,1.735748,72.208448,0.315431,0.434783,0.416429,0.186806,0.158893,0.257009,0.200876,0.182762,0.294504,0.142374,0.370891,0.153259,0.314833,0.488653,0.449243,0.301536,0.230031,0.354605,0.420341,0.021332,0.33448,0.456005,0.406589,0.009542,0.497821,0.499479,0.380433,0.358838,0.481243,0.481243,0.244713,0.355268,0.065964,0.379629,0.33522,0.112312,0.472365,0.300045,0.130272,0.116445,0.392271,0.404752,0.407692,0.397573,0.397202,0.306216,0.345188,0.180033
min,17.0,0.0,1.0,0.0,0.0,-3.4,92.201,-50.8,0.634,4963.6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,32.0,102.0,1.0,999.0,0.0,-1.8,93.075,-42.7,1.344,5099.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
50%,38.0,179.0,2.0,999.0,0.0,1.1,93.749,-41.8,4.857,5191.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
75%,47.0,318.0,3.0,999.0,0.0,1.4,93.994,-36.4,4.961,5228.1,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
max,98.0,4918.0,56.0,999.0,7.0,1.4,94.767,-26.9,5.045,5228.1,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


Exploratory Data Analysis by Pandas_Profiling

In [33]:
%pip install pandas-profiling --upgrade

Collecting pandas-profiling
  Downloading pandas_profiling-3.1.0-py2.py3-none-any.whl (261 kB)
[K     |████████████████████████████████| 261 kB 8.7 MB/s 
[?25hCollecting joblib~=1.0.1
  Downloading joblib-1.0.1-py3-none-any.whl (303 kB)
[K     |████████████████████████████████| 303 kB 67.8 MB/s 
Collecting phik>=0.11.1
  Downloading phik-0.12.0-cp37-cp37m-manylinux2010_x86_64.whl (675 kB)
[K     |████████████████████████████████| 675 kB 67.8 MB/s 
[?25hCollecting pydantic>=1.8.1
  Downloading pydantic-1.9.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.9 MB)
[K     |████████████████████████████████| 10.9 MB 68.5 MB/s 
[?25hCollecting PyYAML>=5.0.0
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 79.6 MB/s 
[?25hCollecting htmlmin>=0.1.12
  Downloading htmlmin-0.1.12.tar.gz (19 kB)
Collecting visions[type_image_path]==0.7.4
  Downloa

In [22]:
from pandas_profiling import ProfileReport
prof = ProfileReport(df_onehot_final)
prof.to_file(output_file='output.html')

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

Get the data dimensions

In [20]:
df_onehot_final.shape

(32950, 58)

Get the names of all columns 

In [21]:
df_onehot_final.columns

Index(['age', 'duration', 'campaign', 'pdays', 'previous', 'emp.var.rate',
       'cons.price.idx', 'cons.conf.idx', 'euribor3m', 'nr.employed', 'y',
       'job_admin.', 'job_blue-collar', 'job_entrepreneur', 'job_housemaid',
       'job_management', 'job_retired', 'job_self-employed', 'job_services',
       'job_student', 'job_technician', 'job_unemployed', 'marital_divorced',
       'marital_married', 'marital_single', 'education_basic.4y',
       'education_basic.6y', 'education_basic.9y', 'education_high.school',
       'education_illiterate', 'education_professional.course',
       'education_university.degree', 'default_no', 'default_yes',
       'housing_no', 'housing_yes', 'loan_no', 'loan_yes', 'contact_cellular',
       'contact_telephone', 'month_apr', 'month_aug', 'month_dec', 'month_jul',
       'month_jun', 'month_mar', 'month_may', 'month_nov', 'month_oct',
       'month_sep', 'day_of_week_fri', 'day_of_week_mon', 'day_of_week_thu',
       'day_of_week_tue', 'day_of_wee

Step 3. Isolate features from the label for classification

This example is going to use cross validation to show how to use XGBoost in classification analysis using accuracy as the metric.

In [22]:
train_columns=['age', 'duration', 'campaign', 'pdays', 'previous', 'emp.var.rate',
       'cons.price.idx', 'cons.conf.idx', 'euribor3m', 'nr.employed', 'y',
       'job_admin.', 'job_blue-collar', 'job_entrepreneur', 'job_housemaid',
       'job_management', 'job_retired', 'job_self-employed', 'job_services',
       'job_student', 'job_technician', 'job_unemployed', 'marital_divorced',
       'marital_married', 'marital_single', 'education_basic.4y',
       'education_basic.6y', 'education_basic.9y', 'education_high.school',
       'education_illiterate', 'education_professional.course',
       'education_university.degree', 'default_no', 'default_yes',
       'housing_no', 'housing_yes', 'loan_no', 'loan_yes', 'contact_cellular',
       'contact_telephone', 'month_apr', 'month_aug', 'month_dec', 'month_jul',
       'month_jun', 'month_mar', 'month_may', 'month_nov', 'month_oct',
       'month_sep', 'day_of_week_fri', 'day_of_week_mon', 'day_of_week_thu',
       'day_of_week_tue', 'day_of_week_wed', 'poutcome_failure',
       'poutcome_nonexistent', 'poutcome_success']

Assign the training and testing datasets

In [23]:
X, y = df_onehot_final[train_columns],df_onehot_final['y']

Convert both training and testing dataset to Dmatrix structure that provides best memory efficiency and training spped.

In [24]:
df_dmatrix = xgb.DMatrix(data=X,label=y) 

In [30]:
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold

In [31]:
# define the model
model = xgb.XGBClassifier() #Use XGBClassifier for classification study
# evaluate the model
cv = RepeatedStratifiedKFold(n_splits=10, #number of folds
                             n_repeats=3, #number of times cross-validator needs to be repeated
                             random_state=1 #controls the generation of the random states for each repetition
                             )
n_scores = cross_val_score(model, #XGB classifier
                           X, #training dataset
                           y, #testing dataset
                           scoring='neg_log_loss',#metrics
                           cv=cv, #using cross-validation generator splitting information
                           n_jobs=-1#use all processors to parallelly train the model
                           )
print(n_scores)

[-1.00382481e-04 -9.93990386e-05 -1.00073864e-04 -9.86905928e-05
 -9.90573518e-05 -9.90329883e-05 -9.89251613e-05 -1.00362367e-04
 -9.97381468e-05 -9.93463573e-05 -9.94924401e-05 -9.93770584e-05
 -9.97668469e-05 -9.96284902e-05 -9.90384856e-05 -9.95095667e-05
 -9.95575715e-05 -1.00437642e-04 -9.94130412e-05 -9.89680082e-05
 -9.84672759e-05 -1.00195921e-04 -1.00240263e-04 -9.90077693e-05
 -9.90675899e-05 -9.92643278e-05 -1.00026531e-04 -9.93953646e-05
 -9.90233022e-05 -1.00094786e-04]


In [32]:
cv = RepeatedStratifiedKFold(n_splits=10, #number of folds
                             n_repeats=1, #number of times cross-validator needs to be repeated
                             random_state=1 #controls the generation of the random states for each repetition
                             )
n_scores = cross_val_score(model, X, y, scoring='neg_log_loss', cv=cv, n_jobs=-1)
print(n_scores)

[-1.00382481e-04 -9.93990386e-05 -1.00073864e-04 -9.86905928e-05
 -9.90573518e-05 -9.90329883e-05 -9.89251613e-05 -1.00362367e-04
 -9.97381468e-05 -9.93463573e-05]


In [29]:
xgb.cv({'max_depth': 5, #tree depth
        'eta': 0.3, #learning rate
        'objective': 'binary:logitraw'#logistic binary
        },
       df_dmatrix, #training data
       stratified=y, #perform stratified sampling
       nfold=5, #number of folds for cv
       metrics=["auc", "logloss", "error"]#metrics: ROC_AUC, log loss, binary error
       )

Unnamed: 0,train-auc-mean,train-auc-std,train-error-mean,train-error-std,train-logloss-mean,train-logloss-std,test-auc-mean,test-auc-std,test-error-mean,test-error-std,test-logloss-mean,test-logloss-std
0,1.0,0.0,0.0,0.0,0.057391,1e-05,1.0,0.0,0.0,0.0,0.057389,3.8e-05
1,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
2,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
3,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
4,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
5,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
6,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
7,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
8,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
9,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0


Step 4. Tune XGBoost Hyperparameters

XGBoost is an ensemble of weak tree models. To optimize its performance, its number of trees, tree depth, learning rate, number of samples, number of features may need to adjust.

a. Tuning parameters using GridSearchCV

In [27]:
%%time
from sklearn.model_selection import GridSearchCV

params = { 'max_depth': [3,6,10], # Xgboost tree Depth
           'learning_rate': [0.01, 0.05, 0.1], # Xgboost learning rate
           'n_estimators': [100, 500, 1000], # Xgboost number of trees effect on performance
           'subsample': np.arange(0.5, 1.0, 0.1), # Subsample ratio of the training instances
           'colsample_bytree': [0.25, 0.5, 0.75]} # Subsample ratio of columns when constructing each tree
xgbc = xgb.XGBClassifier(seed = 20)
clf = GridSearchCV(estimator=xgbc, 
                   param_grid=params,
                   scoring='neg_log_loss', 
                   verbose=1)
clf.fit(X, y)
print("GridSearchCV")
print("Best parameters:", clf.best_params_)
print("Lowest log_loss: ", -clf.best_score_)

Fitting 5 folds for each of 54 candidates, totalling 270 fits
GridSearchCV
Best parameters: {'colsample_bytree': 0.7, 'learning_rate': 0.05, 'max_depth': 3, 'n_estimators': 500}
Lowest log_loss:  7.583694924156665e-05


b. Tuning parameters using RandomizedSearchCV

In [28]:
from sklearn.model_selection import RandomizedSearchCV

params = { 'max_depth': [3, 5, 6, 10, 15, 20], # Xgboost tree Depth
           'learning_rate': [0.01, 0.1, 0.2, 0.3], # Xgboost learning rate
           'subsample': np.arange(0.5, 1.0, 0.1), # Subsample ratio of the training instances
           'colsample_bytree': np.arange(0.4, 1.0, 0.1), #Subsample ratio of columns when constructing each tree
           'colsample_bylevel': np.arange(0.4, 1.0, 0.1),#Subsample ratio of columns for each level
           'n_estimators': [100, 500, 1000]} # Xgboost number of trees effect on performance
xgbc = xgb.XGBClassifier(seed = 20)
clf = RandomizedSearchCV(estimator=xgbc, 
                         param_distributions=params,
                         scoring='neg_log_loss',
                         n_iter=25,
                         verbose=1)
clf.fit(X, y)
print("RandomizedSearchCV")
print("Best parameters:", clf.best_params_)
print("Lowest log_loss: ", -clf.best_score_)

Fitting 5 folds for each of 25 candidates, totalling 125 fits
RandomizedSearchCV
Best parameters: {'subsample': 0.8999999999999999, 'n_estimators': 1000, 'max_depth': 20, 'learning_rate': 0.3, 'colsample_bytree': 0.8999999999999999, 'colsample_bylevel': 0.7999999999999999}
Lowest log_loss:  7.706141963914056e-05
