# HackerEarth ML 1:  Predict the Road Sign
### XGBoost solution: 98%

Download the dataset from the following link.

[Dataset Download](https://he-s3.s3.amazonaws.com/media/hackathon/hackerearth/predict-the-road-sign/4b699168-4-here_dataset.zip)


In [1]:
import os
import sys
import pandas as pd
import numpy as np
from datetime import datetime
from xgboost import XGBClassifier as XGBC
from sklearn.externals import joblib
from sklearn.model_selection import GridSearchCV

Defining folder paths and reading train and test data.

In [2]:
model_name = "XGBC"
obj_dir = "./obj"
data_dir = "./data"
output_dir = "./output"

train_file = os.path.join(data_dir, "train.csv")
test_file = os.path.join(data_dir, "test.csv")
op_file = os.path.join(output_dir, model_name+"_"+datetime.now().strftime('%Y%m%d%H%M%S')+".csv")
model_file = os.path.join(obj_dir, model_name+"_"+datetime.now().strftime('%Y%m%d')+".pkl")

print("Output file: {}\nModel file: {}".format(op_file, model_file))

Output file: ./output/XGBC_20180901193012.csv
Model file: ./obj/XGBC_20180901.pkl


In [3]:
train = pd.read_csv(train_file)
print("Train data info: ", train.info())

test = pd.read_csv(test_file)
print("Test data info: ", test.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 38485 entries, 0 to 38484
Data columns (total 7 columns):
Id                     38485 non-null object
DetectedCamera         38485 non-null object
AngleOfSign            38485 non-null int64
SignAspectRatio        38485 non-null float64
SignWidth              38485 non-null int64
SignHeight             38485 non-null int64
SignFacing (Target)    38485 non-null object
dtypes: float64(1), int64(3), object(3)
memory usage: 2.1+ MB
Train data info:  None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31485 entries, 0 to 31484
Data columns (total 6 columns):
Id                 31485 non-null object
DetectedCamera     31485 non-null object
AngleOfSign        31485 non-null int64
SignAspectRatio    31485 non-null float64
SignWidth          31485 non-null int64
SignHeight         31485 non-null int64
dtypes: float64(1), int64(3), object(2)
memory usage: 1.4+ MB
Test data info:  None


Create a common mapping dictionary for the camera and target columns. Apply the mapping and dropping Aspect Ratio feature.

In [4]:
mapping = {'Front':0, 'Right':1, 'Left':2, 'Rear':3}

train = train.replace({'DetectedCamera':mapping})
test = test.replace({'DetectedCamera':mapping})
train = train.replace({'SignFacing (Target)':mapping})

y_train = train['SignFacing (Target)']
test_id = test['Id']

df = train.append(test, sort=False)
print(df.shape, train.shape, test.shape)

drop_cols = ['SignAspectRatio', 'SignFacing (Target)', 'Id']
df.drop(columns=drop_cols, inplace=True)
X_train = df.iloc[:len(train), :]
print(X_train.shape)
X_test = df.iloc[len(train):, :]
print(X_test.shape)

(69970, 7) (38485, 7) (31485, 6)
(38485, 6)
(31485, 6)


Building the random forest classifier model. Model file will be saved to the obj directory.

In [5]:
xgbc = XGBC(max_depth=3, learning_rate=0.01, n_estimators=300, silent=True, objective='multi:softprob', booster='gbtree', n_jobs=1, nthread=3, seed=10, eval_metric='mlogloss')
xgbc.fit(X_train, y_train)
param_grid = {
    'max_depth': [3, 4, 5],
    'learning_rate': [0.01, 0.02, 0.03, 0.05],
    'n_estimators': [300, 350, 400]
}
joblib.dump(xgbc, "./obj/xgbc_base.pkl")
grid_search = GridSearchCV(estimator = xgbc, param_grid = param_grid, cv = 5, n_jobs = -1, verbose = 2)
grid_search.fit(X_train, y_train);

Fitting 5 folds for each of 36 candidates, totalling 180 fits
[CV] learning_rate=0.01, max_depth=3, n_estimators=300 ...............
[CV] learning_rate=0.01, max_depth=3, n_estimators=300 ...............
[CV] learning_rate=0.01, max_depth=3, n_estimators=300 ...............
[CV] learning_rate=0.01, max_depth=3, n_estimators=300 ...............
[CV] learning_rate=0.01, max_depth=3, n_estimators=300 ...............
[CV] learning_rate=0.01, max_depth=3, n_estimators=350 ...............
[CV] learning_rate=0.01, max_depth=3, n_estimators=350 ...............
[CV] learning_rate=0.01, max_depth=3, n_estimators=350 ...............
[CV] learning_rate=0.01, max_depth=3, n_estimators=350 ...............
[CV] learning_rate=0.01, max_depth=3, n_estimators=350 ...............
[CV] learning_rate=0.01, max_depth=3, n_estimators=400 ...............
[CV] learning_rate=0.01, max_depth=3, n_estimators=400 ...............


  if diff:
  if diff:
  if diff:
  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.01, max_depth=3, n_estimators=300, total=  11.8s
[CV] learning_rate=0.01, max_depth=3, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.01, max_depth=3, n_estimators=300, total=  11.9s
[CV] learning_rate=0.01, max_depth=3, n_estimators=400 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.01, max_depth=3, n_estimators=300, total=  11.9s
[CV]  learning_rate=0.01, max_depth=3, n_estimators=300, total=  11.9s
[CV] learning_rate=0.01, max_depth=3, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.01, max_depth=3, n_estimators=300, total=  11.9s
[CV] learning_rate=0.01, max_depth=4, n_estimators=300 ...............
[CV] learning_rate=0.01, max_depth=4, n_estimators=300 ...............


  if diff:
  if diff:
  if diff:
  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.01, max_depth=3, n_estimators=350, total=  14.1s
[CV] learning_rate=0.01, max_depth=4, n_estimators=300 ...............


  if diff:


[CV]  learning_rate=0.01, max_depth=3, n_estimators=350, total=  14.1s
[CV] learning_rate=0.01, max_depth=4, n_estimators=300 ...............


  if diff:


[CV]  learning_rate=0.01, max_depth=3, n_estimators=350, total=  14.2s
[CV] learning_rate=0.01, max_depth=4, n_estimators=300 ...............


  if diff:


[CV]  learning_rate=0.01, max_depth=3, n_estimators=350, total=  14.1s
[CV] learning_rate=0.01, max_depth=4, n_estimators=350 ...............


  if diff:


[CV]  learning_rate=0.01, max_depth=3, n_estimators=350, total=  14.2s
[CV] learning_rate=0.01, max_depth=4, n_estimators=350 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.01, max_depth=3, n_estimators=400, total=  16.3s


  if diff:


[CV]  learning_rate=0.01, max_depth=3, n_estimators=400, total=  16.3s
[CV] learning_rate=0.01, max_depth=4, n_estimators=350 ...............
[CV] learning_rate=0.01, max_depth=4, n_estimators=350 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.01, max_depth=4, n_estimators=300, total=  14.1s
[CV] learning_rate=0.01, max_depth=4, n_estimators=350 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.01, max_depth=4, n_estimators=300, total=  15.3s
[CV] learning_rate=0.01, max_depth=4, n_estimators=400 ...............


  if diff:
  if diff:
  if diff:
  if diff:
  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.01, max_depth=4, n_estimators=300, total=  15.0s
[CV] learning_rate=0.01, max_depth=4, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.01, max_depth=4, n_estimators=300, total=  15.2s
[CV] learning_rate=0.01, max_depth=4, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.01, max_depth=3, n_estimators=400, total=  17.7s
[CV] learning_rate=0.01, max_depth=4, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.01, max_depth=3, n_estimators=400, total=  17.7s
[CV] learning_rate=0.01, max_depth=4, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.01, max_depth=4, n_estimators=300, total=  15.4s


  if diff:


[CV]  learning_rate=0.01, max_depth=3, n_estimators=400, total=  17.7s
[CV] learning_rate=0.01, max_depth=5, n_estimators=300 ...............
[CV] learning_rate=0.01, max_depth=5, n_estimators=300 ...............


  if diff:
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:   31.9s
  if diff:


[CV]  learning_rate=0.01, max_depth=4, n_estimators=350, total=  16.5s
[CV] learning_rate=0.01, max_depth=5, n_estimators=300 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.01, max_depth=4, n_estimators=350, total=  18.0s
[CV] learning_rate=0.01, max_depth=5, n_estimators=300 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.01, max_depth=4, n_estimators=350, total=  17.7s
[CV] learning_rate=0.01, max_depth=5, n_estimators=300 ...............


  if diff:


[CV]  learning_rate=0.01, max_depth=4, n_estimators=350, total=  17.8s
[CV] learning_rate=0.01, max_depth=5, n_estimators=350 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.01, max_depth=4, n_estimators=350, total=  18.6s
[CV] learning_rate=0.01, max_depth=5, n_estimators=350 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.01, max_depth=5, n_estimators=300, total=  15.8s
[CV] learning_rate=0.01, max_depth=5, n_estimators=350 ...............


  if diff:
  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.01, max_depth=5, n_estimators=300, total=  17.5s
[CV] learning_rate=0.01, max_depth=5, n_estimators=350 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.01, max_depth=5, n_estimators=300, total=  17.1s
[CV] learning_rate=0.01, max_depth=5, n_estimators=350 ...............


  if diff:


[CV]  learning_rate=0.01, max_depth=4, n_estimators=400, total=  21.2s
[CV] learning_rate=0.01, max_depth=5, n_estimators=400 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.01, max_depth=4, n_estimators=400, total=  19.6s
[CV] learning_rate=0.01, max_depth=5, n_estimators=400 ...............


  if diff:
  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.01, max_depth=5, n_estimators=300, total=  17.6s
[CV] learning_rate=0.01, max_depth=5, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.01, max_depth=4, n_estimators=400, total=  21.0s
[CV] learning_rate=0.01, max_depth=5, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.01, max_depth=4, n_estimators=400, total=  21.0s
[CV] learning_rate=0.01, max_depth=5, n_estimators=400 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.01, max_depth=4, n_estimators=400, total=  21.5s
[CV] learning_rate=0.02, max_depth=3, n_estimators=300 ...............


  if diff:


[CV]  learning_rate=0.01, max_depth=5, n_estimators=300, total=  18.0s
[CV] learning_rate=0.02, max_depth=3, n_estimators=300 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.01, max_depth=5, n_estimators=350, total=  20.8s
[CV] learning_rate=0.02, max_depth=3, n_estimators=300 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.01, max_depth=5, n_estimators=350, total=  18.9s
[CV] learning_rate=0.02, max_depth=3, n_estimators=300 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=3, n_estimators=300, total=  13.7s
[CV] learning_rate=0.02, max_depth=3, n_estimators=300 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=3, n_estimators=300, total=  13.8s
[CV]  learning_rate=0.01, max_depth=5, n_estimators=350, total=  20.5s
[CV] learning_rate=0.02, max_depth=3, n_estimators=350 ...............
[CV] learning_rate=0.02, max_depth=3, n_estimators=350 ...............


  if diff:
  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.01, max_depth=5, n_estimators=350, total=  20.6s
[CV] learning_rate=0.02, max_depth=3, n_estimators=350 ...............


  if diff:


[CV]  learning_rate=0.02, max_depth=3, n_estimators=300, total=  13.7s
[CV] learning_rate=0.02, max_depth=3, n_estimators=350 ...............


  if diff:


[CV]  learning_rate=0.01, max_depth=5, n_estimators=350, total=  20.8s
[CV] learning_rate=0.02, max_depth=3, n_estimators=350 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.01, max_depth=5, n_estimators=400, total=  21.5s
[CV] learning_rate=0.02, max_depth=3, n_estimators=400 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.01, max_depth=5, n_estimators=400, total=  23.6s
[CV] learning_rate=0.02, max_depth=3, n_estimators=400 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.01, max_depth=5, n_estimators=400, total=  23.0s
[CV] learning_rate=0.02, max_depth=3, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.01, max_depth=5, n_estimators=400, total=  23.3s
[CV] learning_rate=0.02, max_depth=3, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.01, max_depth=5, n_estimators=400, total=  23.6s
[CV] learning_rate=0.02, max_depth=3, n_estimators=400 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=3, n_estimators=300, total=  14.1s
[CV] learning_rate=0.02, max_depth=4, n_estimators=300 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=3, n_estimators=300, total=  14.1s
[CV] learning_rate=0.02, max_depth=4, n_estimators=300 ...............


  if diff:
  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=3, n_estimators=350, total=  16.2s
[CV]  learning_rate=0.02, max_depth=3, n_estimators=350, total=  16.2s
[CV] learning_rate=0.02, max_depth=4, n_estimators=300 ...............
[CV] learning_rate=0.02, max_depth=4, n_estimators=300 ...............


  if diff:
  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=3, n_estimators=350, total=  16.3s
[CV] learning_rate=0.02, max_depth=4, n_estimators=300 ...............


  if diff:


[CV]  learning_rate=0.02, max_depth=3, n_estimators=350, total=  16.3s
[CV] learning_rate=0.02, max_depth=4, n_estimators=350 ...............


  if diff:


[CV]  learning_rate=0.02, max_depth=3, n_estimators=350, total=  16.3s
[CV] learning_rate=0.02, max_depth=4, n_estimators=350 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=3, n_estimators=400, total=  18.4s
[CV] learning_rate=0.02, max_depth=4, n_estimators=350 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=3, n_estimators=400, total=  18.3s
[CV] learning_rate=0.02, max_depth=4, n_estimators=350 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=3, n_estimators=400, total=  18.3s
[CV] learning_rate=0.02, max_depth=4, n_estimators=350 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=3, n_estimators=400, total=  18.4s
[CV] learning_rate=0.02, max_depth=4, n_estimators=400 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=3, n_estimators=400, total=  18.4s
[CV] learning_rate=0.02, max_depth=4, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.02, max_depth=4, n_estimators=300, total=  15.9s


  if diff:


[CV] learning_rate=0.02, max_depth=4, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.02, max_depth=4, n_estimators=300, total=  15.3s
[CV] learning_rate=0.02, max_depth=4, n_estimators=400 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=4, n_estimators=300, total=  15.9s
[CV] learning_rate=0.02, max_depth=4, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.02, max_depth=4, n_estimators=300, total=  16.0s
[CV] learning_rate=0.02, max_depth=5, n_estimators=300 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=4, n_estimators=300, total=  16.0s
[CV] learning_rate=0.02, max_depth=5, n_estimators=300 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=4, n_estimators=350, total=  17.7s
[CV] learning_rate=0.02, max_depth=5, n_estimators=300 ...............


  if diff:


[CV]  learning_rate=0.02, max_depth=4, n_estimators=350, total=  18.4s
[CV] learning_rate=0.02, max_depth=5, n_estimators=300 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=4, n_estimators=350, total=  18.4s
[CV] learning_rate=0.02, max_depth=5, n_estimators=300 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=4, n_estimators=350, total=  18.4s
[CV] learning_rate=0.02, max_depth=5, n_estimators=350 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=4, n_estimators=350, total=  18.5s
[CV] learning_rate=0.02, max_depth=5, n_estimators=350 ...............


  if diff:
  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=4, n_estimators=400, total=  20.2s
[CV] learning_rate=0.02, max_depth=5, n_estimators=350 ...............


  if diff:


[CV]  learning_rate=0.02, max_depth=4, n_estimators=400, total=  20.8s
[CV] learning_rate=0.02, max_depth=5, n_estimators=350 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=4, n_estimators=400, total=  20.7s
[CV] learning_rate=0.02, max_depth=5, n_estimators=350 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=4, n_estimators=400, total=  20.7s
[CV] learning_rate=0.02, max_depth=5, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.02, max_depth=5, n_estimators=300, total=  17.2s
[CV] learning_rate=0.02, max_depth=5, n_estimators=400 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=5, n_estimators=300, total=  16.4s
[CV] learning_rate=0.02, max_depth=5, n_estimators=400 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=4, n_estimators=400, total=  20.9s
[CV] learning_rate=0.02, max_depth=5, n_estimators=400 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=5, n_estimators=300, total=  17.3s
[CV] learning_rate=0.02, max_depth=5, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.02, max_depth=5, n_estimators=300, total=  17.3s
[CV] learning_rate=0.03, max_depth=3, n_estimators=300 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=5, n_estimators=300, total=  17.3s
[CV] learning_rate=0.03, max_depth=3, n_estimators=300 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=5, n_estimators=350, total=  19.8s
[CV] learning_rate=0.03, max_depth=3, n_estimators=300 ...............


  if diff:


[CV]  learning_rate=0.02, max_depth=5, n_estimators=350, total=  18.7s
[CV] learning_rate=0.03, max_depth=3, n_estimators=300 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=5, n_estimators=350, total=  19.5s
[CV] learning_rate=0.03, max_depth=3, n_estimators=300 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=5, n_estimators=350, total=  19.6s
[CV] learning_rate=0.03, max_depth=3, n_estimators=350 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=5, n_estimators=350, total=  19.7s
[CV] learning_rate=0.03, max_depth=3, n_estimators=350 ...............


  if diff:


[CV]  learning_rate=0.03, max_depth=3, n_estimators=300, total=  13.6s
[CV] learning_rate=0.03, max_depth=3, n_estimators=350 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=5, n_estimators=400, total=  21.0s
[CV] learning_rate=0.03, max_depth=3, n_estimators=350 ...............


  if diff:


[CV]  learning_rate=0.02, max_depth=5, n_estimators=400, total=  21.7s
[CV] learning_rate=0.03, max_depth=3, n_estimators=350 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=5, n_estimators=400, total=  21.8s
[CV] learning_rate=0.03, max_depth=3, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.03, max_depth=3, n_estimators=300, total=  13.9s
[CV] learning_rate=0.03, max_depth=3, n_estimators=400 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=5, n_estimators=400, total=  21.9s
[CV] learning_rate=0.03, max_depth=3, n_estimators=400 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.02, max_depth=5, n_estimators=400, total=  21.8s
[CV] learning_rate=0.03, max_depth=3, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.03, max_depth=3, n_estimators=300, total=  13.7s
[CV] learning_rate=0.03, max_depth=3, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.03, max_depth=3, n_estimators=300, total=  14.0s
[CV] learning_rate=0.03, max_depth=4, n_estimators=300 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=3, n_estimators=300, total=  13.9s
[CV] learning_rate=0.03, max_depth=4, n_estimators=300 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=3, n_estimators=350, total=  15.8s
[CV] learning_rate=0.03, max_depth=4, n_estimators=300 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=3, n_estimators=350, total=  15.8s
[CV] learning_rate=0.03, max_depth=4, n_estimators=300 ...............


  if diff:


[CV]  learning_rate=0.03, max_depth=3, n_estimators=350, total=  15.8s
[CV] learning_rate=0.03, max_depth=4, n_estimators=300 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=3, n_estimators=350, total=  15.8s
[CV] learning_rate=0.03, max_depth=4, n_estimators=350 ...............


  if diff:


[CV]  learning_rate=0.03, max_depth=3, n_estimators=350, total=  15.9s
[CV] learning_rate=0.03, max_depth=4, n_estimators=350 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=3, n_estimators=400, total=  17.5s
[CV] learning_rate=0.03, max_depth=4, n_estimators=350 ...............


  if diff:


[CV]  learning_rate=0.03, max_depth=3, n_estimators=400, total=  17.6s
[CV] learning_rate=0.03, max_depth=4, n_estimators=350 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=4, n_estimators=300, total=  15.1s
[CV] learning_rate=0.03, max_depth=4, n_estimators=350 ...............


  if diff:


[CV]  learning_rate=0.03, max_depth=3, n_estimators=400, total=  17.5s
[CV] learning_rate=0.03, max_depth=4, n_estimators=400 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=3, n_estimators=400, total=  17.7s
[CV] learning_rate=0.03, max_depth=4, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.03, max_depth=3, n_estimators=400, total=  17.6s
[CV] learning_rate=0.03, max_depth=4, n_estimators=400 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=4, n_estimators=300, total=  14.7s
[CV] learning_rate=0.03, max_depth=4, n_estimators=400 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=4, n_estimators=300, total=  15.2s
[CV] learning_rate=0.03, max_depth=4, n_estimators=400 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=4, n_estimators=300, total=  15.2s
[CV] learning_rate=0.03, max_depth=5, n_estimators=300 ...............


  if diff:


[CV]  learning_rate=0.03, max_depth=4, n_estimators=300, total=  15.4s
[CV] learning_rate=0.03, max_depth=5, n_estimators=300 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=4, n_estimators=350, total=  16.9s
[CV] learning_rate=0.03, max_depth=5, n_estimators=300 ...............


  if diff:


[CV]  learning_rate=0.03, max_depth=4, n_estimators=350, total=  17.4s
[CV] learning_rate=0.03, max_depth=5, n_estimators=300 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=4, n_estimators=350, total=  17.5s
[CV] learning_rate=0.03, max_depth=5, n_estimators=300 ...............


  if diff:


[CV]  learning_rate=0.03, max_depth=4, n_estimators=350, total=  17.7s
[CV] learning_rate=0.03, max_depth=5, n_estimators=350 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=4, n_estimators=350, total=  17.8s
[CV] learning_rate=0.03, max_depth=5, n_estimators=350 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=4, n_estimators=400, total=  19.5s
[CV] learning_rate=0.03, max_depth=5, n_estimators=350 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=4, n_estimators=400, total=  19.0s
[CV] learning_rate=0.03, max_depth=5, n_estimators=350 ...............


  if diff:


[CV]  learning_rate=0.03, max_depth=4, n_estimators=400, total=  19.4s
[CV] learning_rate=0.03, max_depth=5, n_estimators=350 ...............


  if diff:
  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=4, n_estimators=400, total=  19.5s
[CV] learning_rate=0.03, max_depth=5, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.03, max_depth=5, n_estimators=300, total=  16.3s
[CV] learning_rate=0.03, max_depth=5, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.03, max_depth=5, n_estimators=300, total=  15.8s
[CV] learning_rate=0.03, max_depth=5, n_estimators=400 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=4, n_estimators=400, total=  19.5s
[CV] learning_rate=0.03, max_depth=5, n_estimators=400 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=5, n_estimators=300, total=  16.4s
[CV] learning_rate=0.03, max_depth=5, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.03, max_depth=5, n_estimators=300, total=  16.4s
[CV] learning_rate=0.05, max_depth=3, n_estimators=300 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=5, n_estimators=300, total=  16.3s
[CV] learning_rate=0.05, max_depth=3, n_estimators=300 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=5, n_estimators=350, total=  18.2s
[CV] learning_rate=0.05, max_depth=3, n_estimators=300 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=5, n_estimators=350, total=  17.7s
[CV] learning_rate=0.05, max_depth=3, n_estimators=300 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=5, n_estimators=350, total=  18.4s
[CV] learning_rate=0.05, max_depth=3, n_estimators=300 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=5, n_estimators=350, total=  18.3s
[CV] learning_rate=0.05, max_depth=3, n_estimators=350 ...............


  if diff:


[CV]  learning_rate=0.03, max_depth=5, n_estimators=350, total=  18.4s
[CV] learning_rate=0.05, max_depth=3, n_estimators=350 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=3, n_estimators=300, total=  12.4s
[CV] learning_rate=0.05, max_depth=3, n_estimators=350 ...............


  if diff:
  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=5, n_estimators=400, total=  19.5s
[CV] learning_rate=0.05, max_depth=3, n_estimators=350 ...............


  if diff:


[CV]  learning_rate=0.03, max_depth=5, n_estimators=400, total=  20.1s
[CV] learning_rate=0.05, max_depth=3, n_estimators=350 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=5, n_estimators=400, total=  20.2s
[CV] learning_rate=0.05, max_depth=3, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.05, max_depth=3, n_estimators=300, total=  12.6s
[CV] learning_rate=0.05, max_depth=3, n_estimators=400 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.03, max_depth=5, n_estimators=400, total=  20.3s
[CV] learning_rate=0.05, max_depth=3, n_estimators=400 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=3, n_estimators=300, total=  12.5s
[CV] learning_rate=0.05, max_depth=3, n_estimators=400 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=3, n_estimators=300, total=  12.5s
[CV] learning_rate=0.05, max_depth=3, n_estimators=400 ...............


[Parallel(n_jobs=-1)]: Done 138 tasks      | elapsed:  3.7min
  if diff:


[CV]  learning_rate=0.03, max_depth=5, n_estimators=400, total=  20.3s
[CV] learning_rate=0.05, max_depth=4, n_estimators=300 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=3, n_estimators=300, total=  12.5s
[CV] learning_rate=0.05, max_depth=4, n_estimators=300 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=3, n_estimators=350, total=  13.7s
[CV] learning_rate=0.05, max_depth=4, n_estimators=300 ...............


  if diff:


[CV]  learning_rate=0.05, max_depth=3, n_estimators=350, total=  13.6s
[CV] learning_rate=0.05, max_depth=4, n_estimators=300 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=3, n_estimators=350, total=  13.6s
[CV] learning_rate=0.05, max_depth=4, n_estimators=300 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=3, n_estimators=350, total=  13.6s
[CV] learning_rate=0.05, max_depth=4, n_estimators=350 ...............


  if diff:


[CV]  learning_rate=0.05, max_depth=3, n_estimators=350, total=  13.7s
[CV] learning_rate=0.05, max_depth=4, n_estimators=350 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=3, n_estimators=400, total=  14.8s
[CV] learning_rate=0.05, max_depth=4, n_estimators=350 ...............


  if diff:


[CV]  learning_rate=0.05, max_depth=3, n_estimators=400, total=  14.6s
[CV] learning_rate=0.05, max_depth=4, n_estimators=350 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=3, n_estimators=400, total=  14.6s
[CV] learning_rate=0.05, max_depth=4, n_estimators=350 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=3, n_estimators=400, total=  14.6s
[CV] learning_rate=0.05, max_depth=4, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.05, max_depth=4, n_estimators=300, total=  13.2s
[CV] learning_rate=0.05, max_depth=4, n_estimators=400 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=3, n_estimators=400, total=  15.1s
[CV] learning_rate=0.05, max_depth=4, n_estimators=400 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=4, n_estimators=300, total=  12.9s
[CV] learning_rate=0.05, max_depth=4, n_estimators=400 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=4, n_estimators=300, total=  13.2s
[CV] learning_rate=0.05, max_depth=4, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.05, max_depth=4, n_estimators=300, total=  13.2s
[CV] learning_rate=0.05, max_depth=5, n_estimators=300 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=4, n_estimators=300, total=  13.1s
[CV] learning_rate=0.05, max_depth=5, n_estimators=300 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=4, n_estimators=350, total=  14.5s


  if diff:


[CV]  learning_rate=0.05, max_depth=4, n_estimators=350, total=  14.2s
[CV] learning_rate=0.05, max_depth=5, n_estimators=300 ...............
[CV] learning_rate=0.05, max_depth=5, n_estimators=300 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=4, n_estimators=350, total=  14.4s
[CV] learning_rate=0.05, max_depth=5, n_estimators=300 ...............


  if diff:


[CV]  learning_rate=0.05, max_depth=4, n_estimators=350, total=  14.3s
[CV] learning_rate=0.05, max_depth=5, n_estimators=350 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=4, n_estimators=350, total=  14.5s
[CV] learning_rate=0.05, max_depth=5, n_estimators=350 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=4, n_estimators=400, total=  15.7s
[CV] learning_rate=0.05, max_depth=5, n_estimators=350 ...............


  if diff:


[CV]  learning_rate=0.05, max_depth=4, n_estimators=400, total=  15.3s
[CV] learning_rate=0.05, max_depth=5, n_estimators=350 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=4, n_estimators=400, total=  15.5s
[CV] learning_rate=0.05, max_depth=5, n_estimators=350 ...............


  if diff:


[CV]  learning_rate=0.05, max_depth=4, n_estimators=400, total=  15.5s
[CV] learning_rate=0.05, max_depth=5, n_estimators=400 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=5, n_estimators=300, total=  13.5s
[CV] learning_rate=0.05, max_depth=5, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.05, max_depth=5, n_estimators=300, total=  13.3s
[CV] learning_rate=0.05, max_depth=5, n_estimators=400 ...............


  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=4, n_estimators=400, total=  15.9s
[CV] learning_rate=0.05, max_depth=5, n_estimators=400 ...............


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=5, n_estimators=300, total=  13.7s
[CV] learning_rate=0.05, max_depth=5, n_estimators=400 ...............


  if diff:


[CV]  learning_rate=0.05, max_depth=5, n_estimators=300, total=  13.7s


  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=5, n_estimators=300, total=  13.7s


  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=5, n_estimators=350, total=  14.7s


  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=5, n_estimators=350, total=  14.1s


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=5, n_estimators=350, total=  13.6s


  if diff:


[CV]  learning_rate=0.05, max_depth=5, n_estimators=350, total=  13.6s


  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=5, n_estimators=350, total=  13.2s


  if diff:
  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=5, n_estimators=400, total=  13.4s


  if diff:


[CV]  learning_rate=0.05, max_depth=5, n_estimators=400, total=  12.7s


  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=5, n_estimators=400, total=  12.5s


  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=5, n_estimators=400, total=  11.7s


  if diff:
  if diff:


[CV]  learning_rate=0.05, max_depth=5, n_estimators=400, total=  10.1s


[Parallel(n_jobs=-1)]: Done 180 out of 180 | elapsed:  4.5min finished


Predicting the target probabilities for the test dataset.

In [6]:
print(grid_search.best_params_)
best_grid = grid_search.best_estimator_
joblib.dump(xgbc, model_file)
y_test = grid_search.predict_proba(X_test)

{'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 300}


Saving the submission file.

In [7]:
submission = pd.DataFrame(data=y_test, columns=['Front','Left','Rear','Right'])
submission['Id'] = test_id
submission = submission[['Id','Front','Left','Rear','Right']]
submission.to_csv(op_file, index=False)