# SimpliLearn Machine Learning Assessment

## Project 1: Mercedes-Benz Greener Manufacturing

**Author:** Yaseen Moolla

**Date:** 2021-02-10

# Instructions

DESCRIPTION

Reduce the time a Mercedes-Benz spends on the test bench.

Problem Statement Scenario:
Since the first automobile, the Benz Patent Motor Car in 1886, Mercedes-Benz has stood for important automotive innovations. These include the passenger safety cell with a crumple zone, the airbag, and intelligent assistance systems. Mercedes-Benz applies for nearly 2000 patents per year, making the brand the European leader among premium carmakers. Mercedes-Benz is the leader in the premium car industry. With a huge selection of features and options, customers can choose the customized Mercedes-Benz of their dreams.

To ensure the safety and reliability of every unique car configuration before they hit the road, the company’s engineers have developed a robust testing system. As one of the world’s biggest manufacturers of premium cars, safety and efficiency are paramount on Mercedes-Benz’s production lines. However, optimizing the speed of their testing system for many possible feature combinations is complex and time-consuming without a powerful algorithmic approach.

You are required to reduce the time that cars spend on the test bench. Others will work with a dataset representing different permutations of features in a Mercedes-Benz car to predict the time it takes to pass testing. Optimal algorithms will contribute to faster testing, resulting in lower carbon dioxide emissions without reducing Mercedes-Benz’s standards.

Following actions should be performed:

    If for any column(s), the variance is equal to zero, then you need to remove those variable(s).
    Check for null and unique values for test and train sets.
    Apply label encoder.
    Perform dimensionality reduction.
    Predict your test_df values using XGBoost.


# Import packages

In [2]:
import numpy as np
import pandas as pd
import math
import time

# load packages for variance tests
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.multicomp import pairwise_tukeyhsd

#training libraries
import sklearn
import xgboost
from xgboost import XGBRegressor

#import libraries for dimensionality reduction
from sklearn.decomposition import PCA

# import libraries for model

from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV

# import libraries for metrics and reporting
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score
from sklearn.metrics import r2_score

from sklearn import metrics
from sklearn.metrics import classification_report
from sklearn.metrics import roc_curve, auc
from sklearn.metrics import adjusted_rand_score

# Load and Analyse Dataset

In [4]:
dfTrain = pd.read_csv('train.csv') #import train set
dfTest = pd.read_csv('test.csv') #inport test set
dfTrain.head() #view a sampling of train set

Unnamed: 0,ID,y,X0,X1,X2,X3,X4,X5,X6,X8,...,X375,X376,X377,X378,X379,X380,X382,X383,X384,X385
0,0,130.81,k,v,at,a,d,u,j,o,...,0,0,1,0,0,0,0,0,0,0
1,6,88.53,k,t,av,e,d,y,l,o,...,1,0,0,0,0,0,0,0,0,0
2,7,76.26,az,w,n,c,d,x,j,x,...,0,0,0,0,0,0,1,0,0,0
3,9,80.62,az,t,n,f,d,x,l,e,...,0,0,0,0,0,0,0,0,0,0
4,13,78.02,az,v,n,f,d,h,d,n,...,0,0,0,0,0,0,0,0,0,0


In [5]:
dfTest.head() #view a sampling of test set

Unnamed: 0,ID,X0,X1,X2,X3,X4,X5,X6,X8,X10,...,X375,X376,X377,X378,X379,X380,X382,X383,X384,X385
0,1,az,v,n,f,d,t,a,w,0,...,0,0,0,1,0,0,0,0,0,0
1,2,t,b,ai,a,d,b,g,y,0,...,0,0,1,0,0,0,0,0,0,0
2,3,az,v,as,f,d,a,j,j,0,...,0,0,0,1,0,0,0,0,0,0
3,4,az,l,n,f,d,z,l,n,0,...,0,0,0,1,0,0,0,0,0,0
4,5,w,s,as,c,d,y,i,m,0,...,1,0,0,0,0,0,0,0,0,0


In [6]:
# to view all columns in train test
pd.set_option('display.max_columns', 379) 
dfTrain.head()

Unnamed: 0,ID,y,X0,X1,X2,X3,X4,X5,X6,X8,X10,X11,X12,X13,X14,X15,X16,X17,X18,X19,X20,X21,X22,X23,X24,X26,X27,X28,X29,X30,X31,X32,X33,X34,X35,X36,X37,X38,X39,X40,X41,X42,X43,X44,X45,X46,X47,X48,X49,X50,X51,X52,X53,X54,X55,X56,X57,X58,X59,X60,X61,X62,X63,X64,X65,X66,X67,X68,X69,X70,X71,X73,X74,X75,X76,X77,X78,X79,X80,X81,X82,X83,X84,X85,X86,X87,X88,X89,X90,X91,X92,X93,X94,X95,X96,X97,X98,X99,X100,X101,X102,X103,X104,X105,X106,X107,X108,X109,X110,X111,X112,X113,X114,X115,X116,X117,X118,X119,X120,X122,X123,X124,X125,X126,X127,X128,X129,X130,X131,X132,X133,X134,X135,X136,X137,X138,X139,X140,X141,X142,X143,X144,X145,X146,X147,X148,X150,X151,X152,X153,X154,X155,X156,X157,X158,X159,X160,X161,X162,X163,X164,X165,X166,X167,X168,X169,X170,X171,X172,X173,X174,X175,X176,X177,X178,X179,X180,X181,X182,X183,X184,X185,X186,X187,X189,X190,X191,X192,X194,X195,X196,X197,X198,X199,X200,X201,X202,X203,X204,X205,X206,X207,X208,X209,X210,X211,X212,X213,X214,X215,X216,X217,X218,X219,X220,X221,X222,X223,X224,X225,X226,X227,X228,X229,X230,X231,X232,X233,X234,X235,X236,X237,X238,X239,X240,X241,X242,X243,X244,X245,X246,X247,X248,X249,X250,X251,X252,X253,X254,X255,X256,X257,X258,X259,X260,X261,X262,X263,X264,X265,X266,X267,X268,X269,X270,X271,X272,X273,X274,X275,X276,X277,X278,X279,X280,X281,X282,X283,X284,X285,X286,X287,X288,X289,X290,X291,X292,X293,X294,X295,X296,X297,X298,X299,X300,X301,X302,X304,X305,X306,X307,X308,X309,X310,X311,X312,X313,X314,X315,X316,X317,X318,X319,X320,X321,X322,X323,X324,X325,X326,X327,X328,X329,X330,X331,X332,X333,X334,X335,X336,X337,X338,X339,X340,X341,X342,X343,X344,X345,X346,X347,X348,X349,X350,X351,X352,X353,X354,X355,X356,X357,X358,X359,X360,X361,X362,X363,X364,X365,X366,X367,X368,X369,X370,X371,X372,X373,X374,X375,X376,X377,X378,X379,X380,X382,X383,X384,X385
0,0,130.81,k,v,at,a,d,u,j,o,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,1,1,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,1,1,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
1,6,88.53,k,t,av,e,d,y,l,o,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,1,1,0,0,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
2,7,76.26,az,w,n,c,d,x,j,x,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,1,1,0,1,0,0,0,1,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,1,1,0,1,1,1,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,1,1,0,0,0,1,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,0,0,0,0,0,0,0,1,1,1,0,0,1,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
3,9,80.62,az,t,n,f,d,x,l,e,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,1,0,0,0,1,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,1,0,1,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
4,13,78.02,az,v,n,f,d,h,d,n,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,1,0,0,0,1,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0


In [7]:
# to view all data types in train set
pd.set_option('display.max_rows', 379)
dfTrain.dtypes

ID        int64
y       float64
X0       object
X1       object
X2       object
X3       object
X4       object
X5       object
X6       object
X8       object
X10       int64
X11       int64
X12       int64
X13       int64
X14       int64
X15       int64
X16       int64
X17       int64
X18       int64
X19       int64
X20       int64
X21       int64
X22       int64
X23       int64
X24       int64
X26       int64
X27       int64
X28       int64
X29       int64
X30       int64
X31       int64
X32       int64
X33       int64
X34       int64
X35       int64
X36       int64
X37       int64
X38       int64
X39       int64
X40       int64
X41       int64
X42       int64
X43       int64
X44       int64
X45       int64
X46       int64
X47       int64
X48       int64
X49       int64
X50       int64
X51       int64
X52       int64
X53       int64
X54       int64
X55       int64
X56       int64
X57       int64
X58       int64
X59       int64
X60       int64
X61       int64
X62       int64
X63     

In [8]:
# view information on all columns in train set
dfTrain.describe(include="all")

Unnamed: 0,ID,y,X0,X1,X2,X3,X4,X5,X6,X8,X10,X11,X12,X13,X14,X15,X16,X17,X18,X19,X20,X21,X22,X23,X24,X26,X27,X28,X29,X30,X31,X32,X33,X34,X35,X36,X37,X38,X39,X40,X41,X42,X43,X44,X45,X46,X47,X48,X49,X50,X51,X52,X53,X54,X55,X56,X57,X58,X59,X60,X61,X62,X63,X64,X65,X66,X67,X68,X69,X70,X71,X73,X74,X75,X76,X77,X78,X79,X80,X81,X82,X83,X84,X85,X86,X87,X88,X89,X90,X91,X92,X93,X94,X95,X96,X97,X98,X99,X100,X101,X102,X103,X104,X105,X106,X107,X108,X109,X110,X111,X112,X113,X114,X115,X116,X117,X118,X119,X120,X122,X123,X124,X125,X126,X127,X128,X129,X130,X131,X132,X133,X134,X135,X136,X137,X138,X139,X140,X141,X142,X143,X144,X145,X146,X147,X148,X150,X151,X152,X153,X154,X155,X156,X157,X158,X159,X160,X161,X162,X163,X164,X165,X166,X167,X168,X169,X170,X171,X172,X173,X174,X175,X176,X177,X178,X179,X180,X181,X182,X183,X184,X185,X186,X187,X189,X190,X191,X192,X194,X195,X196,X197,X198,X199,X200,X201,X202,X203,X204,X205,X206,X207,X208,X209,X210,X211,X212,X213,X214,X215,X216,X217,X218,X219,X220,X221,X222,X223,X224,X225,X226,X227,X228,X229,X230,X231,X232,X233,X234,X235,X236,X237,X238,X239,X240,X241,X242,X243,X244,X245,X246,X247,X248,X249,X250,X251,X252,X253,X254,X255,X256,X257,X258,X259,X260,X261,X262,X263,X264,X265,X266,X267,X268,X269,X270,X271,X272,X273,X274,X275,X276,X277,X278,X279,X280,X281,X282,X283,X284,X285,X286,X287,X288,X289,X290,X291,X292,X293,X294,X295,X296,X297,X298,X299,X300,X301,X302,X304,X305,X306,X307,X308,X309,X310,X311,X312,X313,X314,X315,X316,X317,X318,X319,X320,X321,X322,X323,X324,X325,X326,X327,X328,X329,X330,X331,X332,X333,X334,X335,X336,X337,X338,X339,X340,X341,X342,X343,X344,X345,X346,X347,X348,X349,X350,X351,X352,X353,X354,X355,X356,X357,X358,X359,X360,X361,X362,X363,X364,X365,X366,X367,X368,X369,X370,X371,X372,X373,X374,X375,X376,X377,X378,X379,X380,X382,X383,X384,X385
count,4209.0,4209.0,4209,4209,4209,4209,4209,4209,4209,4209,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0
unique,,,47,27,44,7,4,29,12,25,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
top,,,z,aa,as,c,d,v,g,j,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
freq,,,360,833,1659,1942,4205,231,1042,277,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
mean,4205.960798,100.669318,,,,,,,,,0.013305,0.0,0.075077,0.057971,0.42813,0.000475,0.002613,0.007603,0.00784,0.099549,0.142789,0.002613,0.086957,0.02067,0.001901,0.004989,0.682585,0.032549,0.043003,0.004514,0.232359,0.011167,0.000238,0.005464,0.232359,0.004514,0.232359,0.033262,0.000238,0.000713,0.011404,0.000238,0.072226,0.011404,0.253267,0.597292,0.01283,0.022333,0.122119,0.214065,0.721787,0.04229,0.00689,0.043478,0.005227,0.021145,0.013305,0.574958,0.000713,0.001426,0.953908,0.00594,0.011404,0.375148,0.002138,0.027085,0.001901,0.073414,0.029936,0.919933,0.103588,0.019957,0.999287,0.036113,0.043478,0.012592,0.005702,0.025184,0.947018,0.229033,0.017106,0.001188,0.103588,0.408173,0.001426,0.00095,0.007128,0.000713,0.007365,0.001663,0.00095,0.0,0.007365,0.000238,0.758137,0.004277,0.942504,0.008553,0.690188,0.935614,0.00689,0.784509,0.001901,0.002376,0.013067,0.0,0.01473,0.04039,0.00095,0.974816,0.002851,0.022333,0.146115,0.285579,0.196721,0.04918,0.622238,0.622238,0.95771,0.007128,0.002613,0.000475,0.003089,0.038964,0.495129,0.958422,0.122594,0.041578,0.02661,0.688525,0.124258,0.022333,0.027085,0.956522,0.581848,0.040865,0.09052,0.04039,0.014255,0.770254,0.038251,0.80803,0.001426,0.040865,0.022333,0.044904,0.79235,0.085531,0.032312,0.000713,0.208838,0.076503,0.717273,0.282727,0.229746,0.013542,0.001188,0.197672,0.040865,0.303397,0.062485,0.004514,0.033262,0.00095,0.270848,0.006652,0.024234,0.657401,0.00594,0.009741,0.017344,0.022333,0.017106,0.050131,0.557377,0.047992,0.157995,0.093847,0.106201,0.004039,0.001426,0.018769,0.535994,0.420527,0.915419,0.000238,0.470896,0.002376,0.464006,0.011642,0.010216,0.032312,0.023046,0.002851,0.006652,0.177714,0.241388,0.016869,0.000238,0.999762,0.019244,0.000238,0.06296,0.898551,0.000238,0.014968,0.005464,0.001901,0.00689,0.098598,0.00594,0.007365,0.312426,0.067474,0.561178,0.008078,0.022333,0.555239,0.317415,0.096935,0.032312,0.003089,0.038964,0.960086,0.005227,0.016156,0.043003,0.0,0.201711,0.0,0.000475,0.006652,0.916132,0.00689,0.002851,0.097173,0.007365,0.007128,0.103588,0.000713,0.409361,0.241388,0.001426,0.007603,0.552863,0.394155,0.000713,0.001426,0.005227,0.019482,0.073177,0.000238,0.002376,0.000238,0.000238,0.419577,0.001426,0.956997,0.039439,0.905441,0.001426,0.009028,0.0,0.000475,0.000238,0.002138,0.037539,0.720124,0.009979,0.726776,0.038489,0.001426,0.000475,0.043003,0.000238,0.002613,0.004039,0.140651,0.041102,0.20575,0.054645,0.015918,0.000238,0.0,0.0,0.010454,0.009028,0.0,0.12497,0.000238,0.000238,0.0,0.004514,0.004514,0.206938,0.046804,0.011404,0.92421,0.013305,0.043716,0.002138,0.009503,0.007128,0.002613,0.598479,0.004277,0.301022,0.431694,0.028748,0.195533,0.007603,0.000713,0.000475,0.007128,0.238774,0.021858,0.009266,0.574958,0.005702,0.032312,0.128297,0.040152,0.435258,0.0,0.05607,0.000713,0.023996,0.46258,0.003564,0.127346,0.516512,0.00689,0.000238,0.022333,0.008078,0.022333,0.078403,0.008553,0.022333,0.047517,0.0,0.947256,0.044904,0.338798,0.29722,0.05417,0.002138,0.202899,0.380375,0.179853,0.001188,0.426942,0.031837,0.076503,0.966025,0.520314,0.753861,0.002851,0.002851,0.001188,0.051794,0.062723,0.000475,0.006652,0.014255,0.000475,0.019244,0.22737,0.318841,0.057258,0.314802,0.02067,0.009503,0.008078,0.007603,0.001663,0.000475,0.001426
std,2437.608688,12.679381,,,,,,,,,0.11459,0.0,0.263547,0.233716,0.494867,0.021796,0.051061,0.086872,0.088208,0.299433,0.349899,0.051061,0.281805,0.142294,0.043561,0.070467,0.465526,0.177475,0.202888,0.067043,0.422387,0.105093,0.015414,0.073729,0.422387,0.067043,0.422387,0.179341,0.015414,0.026691,0.106192,0.015414,0.258893,0.106192,0.434934,0.490501,0.112552,0.147782,0.327462,0.410221,0.448172,0.201275,0.082729,0.203955,0.072117,0.143885,0.11459,0.494408,0.026691,0.037734,0.209709,0.076849,0.106192,0.484219,0.046198,0.16235,0.043561,0.260846,0.170431,0.271428,0.304761,0.13987,0.026691,0.186594,0.203955,0.111519,0.075305,0.156703,0.224024,0.42026,0.129683,0.03445,0.304761,0.491554,0.037734,0.030817,0.084134,0.026691,0.085514,0.040752,0.030817,0.0,0.085514,0.015414,0.428262,0.065263,0.232815,0.092098,0.462471,0.245468,0.082729,0.411211,0.043561,0.048691,0.113576,0.0,0.120486,0.196895,0.030817,0.156703,0.053325,0.147782,0.353264,0.451743,0.397567,0.21627,0.484885,0.484885,0.201275,0.084134,0.051061,0.021796,0.055496,0.193532,0.500036,0.199646,0.32801,0.199646,0.160959,0.463152,0.329914,0.147782,0.16235,0.203955,0.493314,0.198,0.28696,0.196895,0.118555,0.420719,0.191825,0.393896,0.037734,0.198,0.147782,0.207117,0.405673,0.279703,0.176848,0.026691,0.406527,0.265832,0.450379,0.450379,0.420719,0.115595,0.03445,0.39829,0.198,0.45978,0.242063,0.067043,0.179341,0.030817,0.444451,0.0813,0.153792,0.474635,0.076849,0.098226,0.130564,0.147782,0.129683,0.21824,0.496756,0.213776,0.364779,0.29165,0.308131,0.063432,0.037734,0.135725,0.498762,0.493702,0.27829,0.015414,0.499212,0.048691,0.498762,0.10728,0.10057,0.176848,0.150067,0.053325,0.0813,0.382318,0.427976,0.128794,0.015414,0.015414,0.137399,0.015414,0.24292,0.301959,0.015414,0.121439,0.073729,0.043561,0.082729,0.298157,0.076849,0.085514,0.463537,0.250872,0.496302,0.089524,0.147782,0.496998,0.465526,0.295905,0.176848,0.055496,0.193532,0.195782,0.072117,0.12609,0.202888,0.0,0.401325,0.0,0.021796,0.0813,0.277223,0.082729,0.053325,0.296228,0.085514,0.084134,0.304761,0.026691,0.491774,0.427976,0.037734,0.086872,0.497257,0.488727,0.026691,0.037734,0.072117,0.138228,0.260457,0.015414,0.048691,0.015414,0.015414,0.493548,0.037734,0.202888,0.194661,0.29264,0.037734,0.094599,0.0,0.021796,0.015414,0.046198,0.1901,0.448992,0.099405,0.445668,0.192396,0.037734,0.021796,0.202888,0.015414,0.051061,0.063432,0.347702,0.198551,0.404296,0.227313,0.125174,0.015414,0.0,0.0,0.10172,0.094599,0.0,0.330725,0.015414,0.015414,0.0,0.067043,0.067043,0.405158,0.211245,0.106192,0.264693,0.11459,0.204486,0.046198,0.097033,0.084134,0.051061,0.490264,0.065263,0.458757,0.495371,0.167117,0.396658,0.086872,0.026691,0.021796,0.084134,0.426385,0.146237,0.095824,0.494408,0.075305,0.176848,0.334459,0.196339,0.49585,0.0,0.230085,0.026691,0.153055,0.498657,0.059598,0.3334,0.499787,0.082729,0.015414,0.147782,0.089524,0.147782,0.268837,0.092098,0.147782,0.212768,0.0,0.223549,0.207117,0.473357,0.457089,0.226379,0.046198,0.402205,0.485537,0.38411,0.03445,0.494693,0.175586,0.265832,0.181186,0.499647,0.430812,0.053325,0.053325,0.03445,0.221637,0.242492,0.021796,0.0813,0.118555,0.021796,0.137399,0.419183,0.466082,0.232363,0.464492,0.142294,0.097033,0.089524,0.086872,0.040752,0.021796,0.037734
min,0.0,72.11,,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,2095.0,90.82,,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,4220.0,99.15,,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,6314.0,109.01,,,,,,,,,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [9]:
#view information on all columns in test set
dfTest.describe(include="all")

Unnamed: 0,ID,X0,X1,X2,X3,X4,X5,X6,X8,X10,X11,X12,X13,X14,X15,X16,X17,X18,X19,X20,X21,X22,X23,X24,X26,X27,X28,X29,X30,X31,X32,X33,X34,X35,X36,X37,X38,X39,X40,X41,X42,X43,X44,X45,X46,X47,X48,X49,X50,X51,X52,X53,X54,X55,X56,X57,X58,X59,X60,X61,X62,X63,X64,X65,X66,X67,X68,X69,X70,X71,X73,X74,X75,X76,X77,X78,X79,X80,X81,X82,X83,X84,X85,X86,X87,X88,X89,X90,X91,X92,X93,X94,X95,X96,X97,X98,X99,X100,X101,X102,X103,X104,X105,X106,X107,X108,X109,X110,X111,X112,X113,X114,X115,X116,X117,X118,X119,X120,X122,X123,X124,X125,X126,X127,X128,X129,X130,X131,X132,X133,X134,X135,X136,X137,X138,X139,X140,X141,X142,X143,X144,X145,X146,X147,X148,X150,X151,X152,X153,X154,X155,X156,X157,X158,X159,X160,X161,X162,X163,X164,X165,X166,X167,X168,X169,X170,X171,X172,X173,X174,X175,X176,X177,X178,X179,X180,X181,X182,X183,X184,X185,X186,X187,X189,X190,X191,X192,X194,X195,X196,X197,X198,X199,X200,X201,X202,X203,X204,X205,X206,X207,X208,X209,X210,X211,X212,X213,X214,X215,X216,X217,X218,X219,X220,X221,X222,X223,X224,X225,X226,X227,X228,X229,X230,X231,X232,X233,X234,X235,X236,X237,X238,X239,X240,X241,X242,X243,X244,X245,X246,X247,X248,X249,X250,X251,X252,X253,X254,X255,X256,X257,X258,X259,X260,X261,X262,X263,X264,X265,X266,X267,X268,X269,X270,X271,X272,X273,X274,X275,X276,X277,X278,X279,X280,X281,X282,X283,X284,X285,X286,X287,X288,X289,X290,X291,X292,X293,X294,X295,X296,X297,X298,X299,X300,X301,X302,X304,X305,X306,X307,X308,X309,X310,X311,X312,X313,X314,X315,X316,X317,X318,X319,X320,X321,X322,X323,X324,X325,X326,X327,X328,X329,X330,X331,X332,X333,X334,X335,X336,X337,X338,X339,X340,X341,X342,X343,X344,X345,X346,X347,X348,X349,X350,X351,X352,X353,X354,X355,X356,X357,X358,X359,X360,X361,X362,X363,X364,X365,X366,X367,X368,X369,X370,X371,X372,X373,X374,X375,X376,X377,X378,X379,X380,X382,X383,X384,X385
count,4209.0,4209,4209,4209,4209,4209,4209,4209,4209,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0
unique,,49,27,45,7,4,32,12,25,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
top,,ak,aa,as,c,d,v,g,e,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
freq,,432,826,1658,1900,4203,246,1073,274,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
mean,4211.039202,,,,,,,,,0.019007,0.000238,0.074364,0.06106,0.427893,0.000713,0.002613,0.008791,0.010216,0.111665,0.139463,0.001663,0.082442,0.020195,0.002376,0.009028,0.681159,0.026847,0.038964,0.005464,0.237349,0.010454,0.000713,0.003564,0.237349,0.002851,0.237349,0.031124,0.000713,0.001901,0.011404,0.001426,0.076978,0.00689,0.252079,0.593728,0.011167,0.023759,0.133523,0.224044,0.714659,0.042528,0.00594,0.039914,0.006415,0.024947,0.015681,0.584462,0.001188,0.001663,0.953433,0.00594,0.013067,0.360181,0.001901,0.027085,0.002138,0.079829,0.030649,0.920409,0.107864,0.019007,0.997862,0.038727,0.039914,0.011167,0.007365,0.033262,0.949632,0.21953,0.02067,0.001188,0.107864,0.420527,0.001426,0.00095,0.008316,0.001901,0.008316,0.002613,0.002376,0.000475,0.00784,0.000713,0.747684,0.001426,0.942742,0.009979,0.683298,0.936802,0.00594,0.788311,0.002613,0.000238,0.012354,0.00095,0.011642,0.039439,0.001426,0.975053,0.001188,0.023759,0.144928,0.293894,0.196246,0.044429,0.622238,0.622238,0.957234,0.008316,0.003326,0.000238,0.004752,0.039914,0.486101,0.961511,0.133761,0.038489,0.027322,0.687812,0.123307,0.023759,0.026372,0.960323,0.590402,0.039439,0.095747,0.039439,0.016393,0.76574,0.039677,0.801378,0.000713,0.039439,0.023759,0.04039,0.788073,0.087432,0.031361,0.001188,0.213352,0.079591,0.709907,0.290093,0.234498,0.01473,0.00095,0.195058,0.035638,0.314326,0.067712,0.003564,0.027085,0.001188,0.263721,0.007365,0.026847,0.666429,0.00594,0.010216,0.019482,0.022808,0.014493,0.051794,0.560466,0.045141,0.15847,0.092659,0.097648,0.002138,0.000713,0.018057,0.531955,0.428368,0.921834,0.000238,0.476123,0.001663,0.468282,0.012117,0.012354,0.029461,0.024947,0.001188,0.006652,0.176289,0.239724,0.011879,0.000713,0.999287,0.018769,0.001426,0.061772,0.897125,0.000238,0.017106,0.007128,0.002138,0.00594,0.109765,0.00594,0.009028,0.321216,0.067237,0.557377,0.008316,0.023759,0.556189,0.313376,0.086244,0.031361,0.004752,0.046804,0.95082,0.003801,0.017106,0.039677,0.000238,0.206225,0.000238,0.000238,0.005464,0.920884,0.00594,0.003564,0.1043,0.008316,0.008316,0.107864,0.00095,0.416964,0.239487,0.002138,0.009503,0.555952,0.393918,0.000238,0.001663,0.003801,0.014255,0.076503,0.0,0.0,0.000238,0.000713,0.437871,0.000713,0.960323,0.040865,0.912093,0.00095,0.008078,0.000238,0.000713,0.000713,0.001426,0.035163,0.712283,0.011167,0.725113,0.035876,0.001426,0.000238,0.039677,0.000238,0.003564,0.002138,0.144452,0.042053,0.210501,0.046804,0.015443,0.000238,0.000475,0.000238,0.010691,0.009266,0.000238,0.122119,0.0,0.0,0.000238,0.003089,0.003089,0.21264,0.05512,0.00689,0.929437,0.008316,0.045141,0.003326,0.007128,0.006652,0.004277,0.597529,0.003326,0.292231,0.448325,0.025422,0.194583,0.005702,0.00095,0.000475,0.00784,0.242813,0.018294,0.008078,0.584462,0.005464,0.031361,0.130435,0.036351,0.438109,0.000238,0.051794,0.000713,0.019482,0.451651,0.002138,0.127584,0.526966,0.00784,0.000238,0.020432,0.009979,0.020195,0.069613,0.009503,0.024709,0.044904,0.000475,0.950107,0.041815,0.341174,0.287242,0.050131,0.002613,0.203849,0.377762,0.184129,0.002376,0.433595,0.036826,0.079591,0.962461,0.517938,0.745308,0.003564,0.003326,0.001663,0.04823,0.063198,0.0,0.005702,0.011404,0.000238,0.014493,0.235923,0.325968,0.049656,0.311951,0.019244,0.011879,0.008078,0.008791,0.000475,0.000713,0.001663
std,2423.078926,,,,,,,,,0.136565,0.015414,0.262394,0.239468,0.494832,0.026691,0.051061,0.093357,0.10057,0.314992,0.34647,0.040752,0.27507,0.140683,0.048691,0.094599,0.466082,0.161656,0.193532,0.073729,0.425508,0.10172,0.026691,0.059598,0.425508,0.053325,0.425508,0.173673,0.026691,0.043561,0.106192,0.037734,0.266588,0.082729,0.434258,0.491195,0.105093,0.152314,0.34018,0.417001,0.45163,0.201814,0.076849,0.195782,0.079845,0.155981,0.124252,0.492873,0.03445,0.040752,0.210734,0.076849,0.113576,0.48011,0.043561,0.16235,0.046198,0.27106,0.172384,0.270692,0.310246,0.136565,0.046198,0.192965,0.195782,0.105093,0.085514,0.179341,0.21873,0.413977,0.142294,0.03445,0.310246,0.493702,0.037734,0.030817,0.09082,0.043561,0.09082,0.051061,0.048691,0.021796,0.088208,0.026691,0.434393,0.037734,0.232363,0.099405,0.465246,0.243347,0.076849,0.408554,0.051061,0.015414,0.110475,0.030817,0.10728,0.194661,0.037734,0.155981,0.03445,0.152314,0.35207,0.455598,0.397204,0.20607,0.484885,0.484885,0.202352,0.09082,0.057584,0.015414,0.068777,0.195782,0.499866,0.192396,0.340436,0.192396,0.163041,0.463441,0.328829,0.152314,0.160258,0.195222,0.491818,0.194661,0.294279,0.194661,0.126998,0.423586,0.195222,0.39901,0.026691,0.194661,0.152314,0.196895,0.408722,0.2825,0.174313,0.03445,0.409723,0.270692,0.453859,0.453859,0.423735,0.120486,0.030817,0.396293,0.185408,0.464302,0.251281,0.059598,0.16235,0.03445,0.440702,0.085514,0.161656,0.471544,0.076849,0.10057,0.138228,0.14931,0.119525,0.221637,0.496389,0.207639,0.365224,0.289988,0.296873,0.046198,0.026691,0.133172,0.499037,0.494901,0.268464,0.015414,0.499489,0.040752,0.499052,0.109421,0.110475,0.169114,0.155981,0.03445,0.0813,0.381111,0.426966,0.108356,0.026691,0.026691,0.135725,0.037734,0.24077,0.303831,0.015414,0.129683,0.084134,0.046198,0.076849,0.312633,0.076849,0.094599,0.466999,0.250462,0.496756,0.09082,0.152314,0.496892,0.463921,0.280757,0.174313,0.068777,0.211245,0.21627,0.061545,0.129683,0.195222,0.015414,0.404642,0.015414,0.015414,0.073729,0.269952,0.076849,0.059598,0.305686,0.09082,0.09082,0.310246,0.030817,0.493115,0.426821,0.046198,0.097033,0.496919,0.488675,0.015414,0.040752,0.061545,0.118555,0.265832,0.0,0.0,0.015414,0.026691,0.496184,0.026691,0.195222,0.198,0.283193,0.030817,0.089524,0.015414,0.026691,0.026691,0.037734,0.184213,0.452752,0.105093,0.44651,0.186002,0.037734,0.015414,0.195222,0.015414,0.059598,0.046198,0.35159,0.200733,0.407713,0.211245,0.123322,0.015414,0.021796,0.015414,0.102857,0.095824,0.015414,0.327462,0.0,0.0,0.015414,0.055496,0.055496,0.409223,0.228241,0.082729,0.256124,0.09082,0.207639,0.057584,0.084134,0.0813,0.065263,0.490454,0.057584,0.454842,0.497382,0.157421,0.395926,0.075305,0.030817,0.021796,0.088208,0.428834,0.134029,0.089524,0.492873,0.073729,0.174313,0.336821,0.187183,0.496214,0.015414,0.221637,0.026691,0.138228,0.497716,0.046198,0.333665,0.499332,0.088208,0.015414,0.141491,0.099405,0.140683,0.254523,0.097033,0.155255,0.207117,0.021796,0.21775,0.20019,0.47416,0.452529,0.21824,0.051061,0.402906,0.484885,0.387636,0.048691,0.49563,0.188356,0.270692,0.1901,0.499738,0.43574,0.059598,0.057584,0.040752,0.214277,0.243347,0.0,0.075305,0.106192,0.015414,0.119525,0.424625,0.468791,0.217258,0.463345,0.137399,0.108356,0.089524,0.093357,0.021796,0.026691,0.040752
min,1.0,,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,2115.0,,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,4202.0,,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,6310.0,,,,,,,,,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Interpretation

ID: Unique identifier

y: continuous numerical values (float64). Therefore, this is a regression problem. Not a classfication problem.

Categorical columns: X0, X1, X2, X3, X4, X5, X6, X8 (X7 and X9 do not exist)

Numerical columns: X10 to X385 (int64). All are binary 0/1. (X11 is all 0. Where else? Select for StdDev=0. This is performed later.)

Test set does not have y values. Blind test. Can't use Test set to assess training. Use k-fold cross validation on Train set.

# Check Variance

Instructions: If for any column(s), the variance is equal to zero, then you need to remove those variable(s).

**For categorical columns**

If we assume categorical-nominal/continuous, the options are: Kendall Tau, T-test 

If we assume categorical-ordinal, the options are: Spearman, chi2, Tukey, OLS

Tukey and OLS were chosen.


**For numerical columns**

Options are: Pearsons, OLS, Tukey Test

Tukey was chosen.

In [11]:
#Tukey Test on one chosen categorical column
m_comp = pairwise_tukeyhsd(endog=dfTrain['y'], groups=dfTrain['X4'], alpha=0.05)
print(m_comp)

 Multiple Comparison of Means - Tukey HSD, FWER=0.05 
group1 group2 meandiff p-adj   lower    upper  reject
-----------------------------------------------------
     a      b     11.7  0.862 -28.1874 51.5874  False
     a      c    32.94 0.1461  -6.9474 72.8274  False
     a      d  -0.4417    0.9 -23.4762 22.5927  False
     b      c    21.24 0.6199  -24.818  67.298  False
     b      d -12.1417 0.7474 -44.7135 20.4301  False
     c      d -33.3817 0.0421 -65.9535 -0.8099   True
-----------------------------------------------------


In [12]:
#Tukey Test - for all categorical columns

# perform multiple pairwise comparison (Tukey HSD)
# get a count of:
# how many combinations Reject the null hypothesis and have a high variance (True), and
# how many combinations do not reject the null hypothesis and have a low variance (False)

#columnList = ['X0', 'X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X8']
columnListCategorical = dfTrain.select_dtypes(include=['object']) 
columnListCategoricalTest = dfTest.select_dtypes(include=['object'])
for column in columnListCategorical:
    m_comp = pairwise_tukeyhsd(endog=dfTrain['y'], groups=dfTrain[column], alpha=0.05)
    #print(m_comp)
    unique, counts = np.unique(m_comp.reject, return_counts=True)
    print(column)
    print(dict(zip(unique, counts)))
    

X0
{False: 553, True: 528}
X1
{False: 312, True: 39}
X2
{False: 781, True: 165}
X3
{False: 13, True: 8}
X4
{False: 5, True: 1}
X5
{False: 406}
X6
{False: 62, True: 4}
X8
{False: 275, True: 25}


#### Interpretation - Tukey Test - categorical

X4: 1 combination rejects the null hypothesis out of 6.

X5: 0 combinations reject the null hypothesis out of 406.

X6: 4 combinations reject the null hypothesis out of 66.
    
These columns have low variance and should be considered for removal.

In [13]:
#OLS test on categorical data

for column in columnListCategorical:
    modelOLS = ols('y ~ '+column, dfTrain).fit()
    print(f"{column} F({modelOLS.df_model: .0f},{modelOLS.df_resid: .0f}) = {modelOLS.fvalue: .3f}, p = {modelOLS.f_pvalue: .4f}")


X0 F( 46, 4162) =  122.314, p =  0.0000
X1 F( 26, 4182) =  6.988, p =  0.0000
X2 F( 43, 4165) =  28.257, p =  0.0000
X3 F( 6, 4202) =  30.992, p =  0.0000
X4 F( 3, 4205) =  2.619, p =  0.0492
X5 F( 28, 4180) =  2.153, p =  0.0004
X6 F( 11, 4197) =  4.175, p =  0.0000
X8 F( 24, 4184) =  5.031, p =  0.0000


#### Interpretation - OLS Test - categorical

Null hypothesis is accepted when p>0.05, i.e. insufficience variance.

X4 is very close. It should be considered for rejection.

X5 and X6 are more likely to have sufficient variance.

We will test dropping these columns when fitting the model.

In [14]:
#### Variance tests on numerical data: OLS
columnListNumerical = dfTrain.loc[:, dfTrain.columns != 'ID']
columnListNumerical = columnListNumerical.select_dtypes(exclude=['object'])
#print(columnListNumerical.head())
count = 0
forRemovalOLSList = []
for column in columnListNumerical:
    modelOLS = ols('y ~ '+column, dfTrain).fit()    
    if (modelOLS.f_pvalue > 0.05 or math.isnan(modelOLS.f_pvalue)):
    #if math.isnan(modelOLS.f_pvalue):
        count+=1
        print(f"{column} F({modelOLS.df_model: .0f},{modelOLS.df_resid: .0f}) = {modelOLS.fvalue: .3f}, p = {modelOLS.f_pvalue: .4f}")
        forRemovalOLSList.append(column)
print (f"Number of columns with low variance: {count}")
print(forRemovalOLSList)



X10 F( 1, 4207) =  3.066, p =  0.0800
X11 F( 0, 4208) =  nan, p =  nan
X15 F( 1, 4207) =  2.249, p =  0.1338
X18 F( 1, 4207) =  0.013, p =  0.9077
X24 F( 1, 4207) =  0.052, p =  0.8191

  if (modelOLS.f_pvalue > 0.05 or math.isnan(modelOLS.f_pvalue)):



X26 F( 1, 4207) =  1.887, p =  0.1696
X32 F( 1, 4207) =  0.013, p =  0.9096
X33 F( 1, 4207) =  0.905, p =  0.3416
X36 F( 1, 4207) =  1.216, p =  0.2702
X38 F( 1, 4207) =  0.669, p =  0.4133
X39 F( 1, 4207) =  0.905, p =  0.3416
X40 F( 1, 4207) =  0.004, p =  0.9525
X41 F( 1, 4207) =  0.704, p =  0.4016
X42 F( 1, 4207) =  0.211, p =  0.6463
X49 F( 1, 4207) =  0.345, p =  0.5572
X57 F( 1, 4207) =  2.223, p =  0.1360
X58 F( 1, 4207) =  2.148, p =  0.1429
X59 F( 1, 4207) =  1.954, p =  0.1623
X60 F( 1, 4207) =  2.089, p =  0.1484
X63 F( 1, 4207) =  3.364, p =  0.0667
X65 F( 1, 4207) =  3.408, p =  0.0650
X67 F( 1, 4207) =  3.722, p =  0.0538
X70 F( 1, 4207) =  1.895, p =  0.1687
X73 F( 1, 4207) =  2.574, p =  0.1087
X74 F( 1, 4207) =  0.855, p =  0.3551
X83 F( 1, 4207) =  0.441, p =  0.5067
X86 F( 1, 4207) =  0.521, p =  0.4706
X87 F( 1, 4207) =  0.836, p =  0.3606
X89 F( 1, 4207) =  0.404, p =  0.5253
X92 F( 1, 4207) =  0.046, p =  0.8299
X93 F( 0, 4208) =  nan, p =  nan
X95 F( 1, 4207) 

#### Interpretation - OLS Test - Numerical

126 columns have low variance. 

['X10', 'X11', 'X15', 'X18', 'X24', 'X26', 'X32', 'X33', 'X36', 'X38', 'X39', 'X40', 'X41', 'X42', 'X49', 'X57', 'X58', 'X59', 'X60', 'X63', 'X65', 'X67', 'X70', 'X73', 'X74', 'X83', 'X86', 'X87', 'X89', 'X92', 'X93', 'X95', 'X103', 'X104', 'X105', 'X107', 'X114', 'X117', 'X123', 'X124', 'X129', 'X133', 'X138', 'X139', 'X140', 'X141', 'X143', 'X145', 'X146', 'X152', 'X153', 'X160', 'X161', 'X164', 'X168', 'X173', 'X175', 'X181', 'X182', 'X184', 'X186', 'X190', 'X192', 'X194', 'X195', 'X196', 'X200', 'X203', 'X206', 'X207', 'X210', 'X213', 'X220', 'X226', 'X230', 'X233', 'X235', 'X240', 'X245', 'X246', 'X248', 'X253', 'X254', 'X257', 'X258', 'X259', 'X260', 'X262', 'X266', 'X268', 'X280', 'X288', 'X289', 'X290', 'X292', 'X293', 'X294', 'X295', 'X296', 'X297', 'X307', 'X318', 'X319', 'X323', 'X324', 'X326', 'X330', 'X332', 'X338', 'X340', 'X345', 'X347', 'X353', 'X356', 'X357', 'X358', 'X359', 'X361', 'X364', 'X365', 'X366', 'X369', 'X374', 'X375', 'X384', 'X385']

We will test dropping these columns when fitting the model.

# Check for null and uniques

Instructions: Check for null and unique values for test and train sets.

## Check for unique values in numerical columns

When standard deviation = 0, then all values in this column are the same.

In [16]:
# find the numerical columns where there's complete uniformity / no variance in the data. All values are 1 or all are 0.
# these are the values that return NaN in the OLS variance test.
numericalDesc = columnListNumerical.std()
numericalNoVariance = numericalDesc[numericalDesc == 0]
#print(numericalNoVariance.columns)
numericalNoVariance_df = numericalNoVariance.to_frame()
#numericalNoVariance_df
numericalNoVariance.index

Index(['X11', 'X93', 'X107', 'X233', 'X235', 'X268', 'X289', 'X290', 'X293',
       'X297', 'X330', 'X347'],
      dtype='object')

#### Interpretation:
    
These columns have standard deviation = 0.

['X11', 'X93', 'X107', 'X233', 'X235', 'X268', 'X289', 'X290', 'X293',
       'X297', 'X330', 'X347']

We will test dropping these columns when fitting the model.

## Check for  null values

In [17]:
dfTrain.isnull().sum().sum()

0

#### Interpretation

No null values in Train set.

In [18]:
dfTest.isnull().sum().sum()

0

#### Interpretation

No null values in Test set

## Check for unique categorical values between train set and test set

In [19]:
trainUniques = columnListCategorical.nunique() # the number of unique values in catergorical data for the train set
testUniques = dfTest.select_dtypes(include=['object']).nunique() # the number of different values in catergorical data for the test set
print("trainUniques")
print(trainUniques) 
print("testUniques")
print(testUniques)

uniqueMismatch = trainUniques[trainUniques!=testUniques] # columns where there are a different number of unique values
print("uniqueMismatch")
print(uniqueMismatch)

uniqueMismatch_df = uniqueMismatch.to_frame()
uniqueMismatch_df.index



trainUniques
X0    47
X1    27
X2    44
X3     7
X4     4
X5    29
X6    12
X8    25
dtype: int64
testUniques
X0    49
X1    27
X2    45
X3     7
X4     4
X5    32
X6    12
X8    25
dtype: int64
uniqueMismatch
X0    47
X2    44
X5    29
dtype: int64


Index(['X0', 'X2', 'X5'], dtype='object')

#### Interpretation
    
X0, X2 and X5 have a different number of unique values in the train and test sets. These columns should have an extra dummy column during one-hot encoding.

(Note that during the one-hot-encoding process and the joining of train and test columns below, it was discovered that a larger number of columns were produced than was expected by the interpretation above. This suggests that simply counting the number of unique values is not sufficient. The sets of unique values should be compared to find disparities.)

# Label encoding

Instructions: Apply label encoder.

In [20]:
print(f"dfTrain size: {dfTrain.info()}\n")
oheTrain = pd.get_dummies(dfTrain) #one hot encoding for categorical columns in train set
print(f"oheTrain size: {oheTrain.info()}\n")
oheTest = pd.get_dummies(dfTest)#one hot encoding for categorical columns in test set
print(f"oheTest size: {oheTest.info()}\n")
oheTrainFull, oheTestFull = oheTrain.align(oheTest, join='outer', fill_value=0, axis=1) 
#join outer adds missing columns to both train and test sets
#fill_value=0 prevents filling by NaN, which is default. 
#If filling by NaN, int columns are turned into float columns, which would cause trouble if running the model against the test set to produce predictions.


print(f"final_oheTrain size: {oheTrainFull.info()}\n")
print(f"final_oheTest size: {oheTestFull.info()}\n") #sanity check to see that train and test sets have the same size and types

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4209 entries, 0 to 4208
Columns: 378 entries, ID to X385
dtypes: float64(1), int64(369), object(8)
memory usage: 12.1+ MB
dfTrain size: None

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4209 entries, 0 to 4208
Columns: 565 entries, ID to X8_y
dtypes: float64(1), int64(369), uint8(195)
memory usage: 12.7 MB
oheTrain size: None

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4209 entries, 0 to 4208
Columns: 570 entries, ID to X8_y
dtypes: int64(369), uint8(201)
memory usage: 12.7 MB
oheTest size: None

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4209 entries, 0 to 4208
Columns: 581 entries, ID to y
dtypes: float64(1), int64(385), uint8(195)
memory usage: 13.2 MB
final_oheTrain size: None

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4209 entries, 0 to 4208
Columns: 581 entries, ID to y
dtypes: int64(380), uint8(201)
memory usage: 13.0 MB
final_oheTest size: None



In [21]:
pd.set_option('display.max_rows', 581) # view details of all the columns
oheTrainFull.describe(include="all")

Unnamed: 0,ID,X0_a,X0_aa,X0_ab,X0_ac,X0_ad,X0_ae,X0_af,X0_ag,X0_ai,X0_aj,X0_ak,X0_al,X0_am,X0_an,X0_ao,X0_ap,X0_aq,X0_as,X0_at,X0_au,X0_av,X0_aw,X0_ax,X0_ay,X0_az,X0_b,X0_ba,X0_bb,X0_bc,X0_c,X0_d,X0_e,X0_f,X0_g,X0_h,X0_i,X0_j,X0_k,X0_l,X0_m,X0_n,X0_o,X0_p,X0_q,X0_r,X0_s,X0_t,X0_u,X0_v,X0_w,X0_x,X0_y,X0_z,X10,X100,X101,X102,X103,X104,X105,X106,X107,X108,X109,X11,X110,X111,X112,X113,X114,X115,X116,X117,X118,X119,X12,X120,X122,X123,X124,X125,X126,X127,X128,X129,X13,X130,X131,X132,X133,X134,X135,X136,X137,X138,X139,X14,X140,X141,X142,X143,X144,X145,X146,X147,X148,X15,X150,X151,X152,X153,X154,X155,X156,X157,X158,X159,X16,X160,X161,X162,X163,X164,X165,X166,X167,X168,X169,X17,X170,X171,X172,X173,X174,X175,X176,X177,X178,X179,X18,X180,X181,X182,X183,X184,X185,X186,X187,X189,X19,X190,X191,X192,X194,X195,X196,X197,X198,X199,X1_a,X1_aa,X1_ab,X1_b,X1_c,X1_d,X1_e,X1_f,X1_g,X1_h,X1_i,X1_j,X1_k,X1_l,X1_m,X1_n,X1_o,X1_p,X1_q,X1_r,X1_s,X1_t,X1_u,X1_v,X1_w,X1_y,X1_z,X20,X200,...,X342,X343,X344,X345,X346,X347,X348,X349,X35,X350,X351,X352,X353,X354,X355,X356,X357,X358,X359,X36,X360,X361,X362,X363,X364,X365,X366,X367,X368,X369,X37,X370,X371,X372,X373,X374,X375,X376,X377,X378,X379,X38,X380,X382,X383,X384,X385,X39,X3_a,X3_b,X3_c,X3_d,X3_e,X3_f,X3_g,X40,X41,X42,X43,X44,X45,X46,X47,X48,X49,X4_a,X4_b,X4_c,X4_d,X50,X51,X52,X53,X54,X55,X56,X57,X58,X59,X5_a,X5_aa,X5_ab,X5_ac,X5_ad,X5_ae,X5_af,X5_ag,X5_ah,X5_b,X5_c,X5_d,X5_f,X5_g,X5_h,X5_i,X5_j,X5_k,X5_l,X5_m,X5_n,X5_o,X5_p,X5_q,X5_r,X5_s,X5_t,X5_u,X5_v,X5_w,X5_x,X5_y,X5_z,X60,X61,X62,X63,X64,X65,X66,X67,X68,X69,X6_a,X6_b,X6_c,X6_d,X6_e,X6_f,X6_g,X6_h,X6_i,X6_j,X6_k,X6_l,X70,X71,X73,X74,X75,X76,X77,X78,X79,X80,X81,X82,X83,X84,X85,X86,X87,X88,X89,X8_a,X8_b,X8_c,X8_d,X8_e,X8_f,X8_g,X8_h,X8_i,X8_j,X8_k,X8_l,X8_m,X8_n,X8_o,X8_p,X8_q,X8_r,X8_s,X8_t,X8_u,X8_v,X8_w,X8_x,X8_y,X90,X91,X92,X93,X94,X95,X96,X97,X98,X99,y
count,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,...,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0
mean,4205.960798,0.004989,0.000475,0.000238,0.000238,0.003326,0.0,0.008316,0.0,0.008078,0.035876,0.082918,0.015918,0.004277,0.0,0.00095,0.024471,0.004277,0.002376,0.00594,0.002613,0.0,0.003801,0.004514,0.074364,0.041578,0.002613,0.006415,0.0,0.001426,0.000713,0.017344,0.007603,0.053932,0.000238,0.017819,0.004277,0.043003,0.002613,0.003801,0.008078,0.046329,0.063911,0.0,0.000475,0.002376,0.025184,0.072701,0.004039,0.008553,0.043241,0.071276,0.076978,0.085531,0.013305,0.690188,0.935614,0.00689,0.784509,0.001901,0.002376,0.013067,0.0,0.01473,0.04039,0.0,0.00095,0.974816,0.002851,0.022333,0.146115,0.285579,0.196721,0.04918,0.622238,0.622238,0.075077,0.95771,0.007128,0.002613,0.000475,0.003089,0.038964,0.495129,0.958422,0.122594,0.057971,0.041578,0.02661,0.688525,0.124258,0.022333,0.027085,0.956522,0.581848,0.040865,0.09052,0.42813,0.04039,0.014255,0.770254,0.038251,0.80803,0.001426,0.040865,0.022333,0.044904,0.000475,0.79235,0.085531,0.032312,0.000713,0.208838,0.076503,0.717273,0.282727,0.229746,0.013542,0.002613,0.001188,0.197672,0.040865,0.303397,0.062485,0.004514,0.033262,0.00095,0.270848,0.006652,0.007603,0.024234,0.657401,0.00594,0.009741,0.017344,0.022333,0.017106,0.050131,0.557377,0.047992,0.00784,0.157995,0.093847,0.106201,0.004039,0.001426,0.018769,0.535994,0.420527,0.915419,0.099549,0.000238,0.470896,0.002376,0.464006,0.011642,0.010216,0.032312,0.023046,0.002851,0.033975,0.197909,0.000713,0.140651,0.028748,0.000713,0.00784,0.005464,0.001426,0.00689,0.04823,0.005227,0.004039,0.140176,0.007603,0.004514,0.019482,0.002138,0.000713,0.059634,0.142077,0.007365,0.008791,0.096935,0.012354,0.005464,0.010929,0.142789,0.006652,...,0.022333,0.078403,0.008553,0.022333,0.047517,0.0,0.947256,0.044904,0.232359,0.338798,0.29722,0.05417,0.002138,0.202899,0.380375,0.179853,0.001188,0.426942,0.031837,0.004514,0.076503,0.966025,0.520314,0.753861,0.002851,0.002851,0.001188,0.051794,0.062723,0.000475,0.232359,0.006652,0.014255,0.000475,0.019244,0.22737,0.318841,0.057258,0.314802,0.02067,0.009503,0.033262,0.008078,0.007603,0.001663,0.000475,0.001426,0.000238,0.104538,0.013542,0.461392,0.0689,0.038727,0.255643,0.057258,0.000713,0.011404,0.000238,0.072226,0.011404,0.253267,0.597292,0.01283,0.022333,0.122119,0.000475,0.000238,0.000238,0.99905,0.214065,0.721787,0.04229,0.00689,0.043478,0.005227,0.021145,0.013305,0.574958,0.000713,0.0,0.02661,0.046804,0.047517,0.043953,0.048705,0.044666,0.048468,0.023046,0.0,0.031124,0.050843,0.001663,0.000238,0.000238,0.04918,0.029698,0.042053,0.046329,0.049418,0.050368,0.004752,0.049418,0.052269,0.051081,0.050843,0.0,0.000238,0.054882,0.054882,0.000475,0.000238,0.0,0.001426,0.953908,0.00594,0.011404,0.375148,0.002138,0.027085,0.001901,0.073414,0.029936,0.048943,0.006652,0.009028,0.148491,0.002851,0.004752,0.247565,0.045141,0.115942,0.246852,0.010216,0.113566,0.919933,0.103588,0.019957,0.999287,0.036113,0.043478,0.012592,0.005702,0.025184,0.947018,0.229033,0.017106,0.001188,0.103588,0.408173,0.001426,0.00095,0.007128,0.000713,0.049893,0.045141,0.023759,0.024471,0.053457,0.057733,0.030886,0.027798,0.056308,0.065811,0.041815,0.023996,0.036826,0.057496,0.038727,0.023759,0.027798,0.052031,0.060584,0.028273,0.028273,0.046092,0.046567,0.024947,0.02756,0.007365,0.001663,0.00095,0.0,0.007365,0.000238,0.758137,0.004277,0.942504,0.008553,100.669318
std,2437.608688,0.070467,0.021796,0.015414,0.015414,0.057584,0.0,0.09082,0.0,0.089524,0.186002,0.27579,0.125174,0.065263,0.0,0.030817,0.154526,0.065263,0.048691,0.076849,0.051061,0.0,0.061545,0.067043,0.262394,0.199646,0.051061,0.079845,0.0,0.037734,0.026691,0.130564,0.086872,0.22591,0.015414,0.132309,0.065263,0.202888,0.051061,0.061545,0.089524,0.210222,0.244623,0.0,0.021796,0.048691,0.156703,0.259677,0.063432,0.092098,0.203423,0.257316,0.266588,0.279703,0.11459,0.462471,0.245468,0.082729,0.411211,0.043561,0.048691,0.113576,0.0,0.120486,0.196895,0.0,0.030817,0.156703,0.053325,0.147782,0.353264,0.451743,0.397567,0.21627,0.484885,0.484885,0.263547,0.201275,0.084134,0.051061,0.021796,0.055496,0.193532,0.500036,0.199646,0.32801,0.233716,0.199646,0.160959,0.463152,0.329914,0.147782,0.16235,0.203955,0.493314,0.198,0.28696,0.494867,0.196895,0.118555,0.420719,0.191825,0.393896,0.037734,0.198,0.147782,0.207117,0.021796,0.405673,0.279703,0.176848,0.026691,0.406527,0.265832,0.450379,0.450379,0.420719,0.115595,0.051061,0.03445,0.39829,0.198,0.45978,0.242063,0.067043,0.179341,0.030817,0.444451,0.0813,0.086872,0.153792,0.474635,0.076849,0.098226,0.130564,0.147782,0.129683,0.21824,0.496756,0.213776,0.088208,0.364779,0.29165,0.308131,0.063432,0.037734,0.135725,0.498762,0.493702,0.27829,0.299433,0.015414,0.499212,0.048691,0.498762,0.10728,0.10057,0.176848,0.150067,0.053325,0.181186,0.398471,0.026691,0.347702,0.167117,0.026691,0.088208,0.073729,0.037734,0.082729,0.214277,0.072117,0.063432,0.347211,0.086872,0.067043,0.138228,0.046198,0.026691,0.236836,0.34917,0.085514,0.093357,0.295905,0.110475,0.073729,0.103981,0.349899,0.0813,...,0.147782,0.268837,0.092098,0.147782,0.212768,0.0,0.223549,0.207117,0.422387,0.473357,0.457089,0.226379,0.046198,0.402205,0.485537,0.38411,0.03445,0.494693,0.175586,0.067043,0.265832,0.181186,0.499647,0.430812,0.053325,0.053325,0.03445,0.221637,0.242492,0.021796,0.422387,0.0813,0.118555,0.021796,0.137399,0.419183,0.466082,0.232363,0.464492,0.142294,0.097033,0.179341,0.089524,0.086872,0.040752,0.021796,0.037734,0.015414,0.305993,0.115595,0.498566,0.253314,0.192965,0.436274,0.232363,0.026691,0.106192,0.015414,0.258893,0.106192,0.434934,0.490501,0.112552,0.147782,0.327462,0.021796,0.015414,0.015414,0.030817,0.410221,0.448172,0.201275,0.082729,0.203955,0.072117,0.143885,0.11459,0.494408,0.026691,0.0,0.160959,0.211245,0.212768,0.205016,0.215277,0.206594,0.214778,0.150067,0.0,0.173673,0.219704,0.040752,0.015414,0.015414,0.21627,0.169774,0.200733,0.210222,0.216765,0.21873,0.068777,0.216765,0.222595,0.220189,0.219704,0.0,0.015414,0.227778,0.227778,0.021796,0.015414,0.0,0.037734,0.209709,0.076849,0.106192,0.484219,0.046198,0.16235,0.043561,0.260846,0.170431,0.215774,0.0813,0.094599,0.355629,0.053325,0.068777,0.431649,0.207639,0.320193,0.431231,0.10057,0.317321,0.271428,0.304761,0.13987,0.026691,0.186594,0.203955,0.111519,0.075305,0.156703,0.224024,0.42026,0.129683,0.03445,0.304761,0.491554,0.037734,0.030817,0.084134,0.026691,0.21775,0.207639,0.152314,0.154526,0.224969,0.233266,0.17303,0.164412,0.230543,0.247982,0.20019,0.153055,0.188356,0.232815,0.192965,0.152314,0.164412,0.222117,0.238595,0.165771,0.165771,0.209709,0.210734,0.155981,0.163728,0.085514,0.040752,0.030817,0.0,0.085514,0.015414,0.428262,0.065263,0.232815,0.092098,12.679381
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,72.11
25%,2095.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,90.82
50%,4220.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,99.15
75%,6314.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,109.01
max,8417.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,265.32


In [22]:
oheTestFull.describe(include="all")

Unnamed: 0,ID,X0_a,X0_aa,X0_ab,X0_ac,X0_ad,X0_ae,X0_af,X0_ag,X0_ai,X0_aj,X0_ak,X0_al,X0_am,X0_an,X0_ao,X0_ap,X0_aq,X0_as,X0_at,X0_au,X0_av,X0_aw,X0_ax,X0_ay,X0_az,X0_b,X0_ba,X0_bb,X0_bc,X0_c,X0_d,X0_e,X0_f,X0_g,X0_h,X0_i,X0_j,X0_k,X0_l,X0_m,X0_n,X0_o,X0_p,X0_q,X0_r,X0_s,X0_t,X0_u,X0_v,X0_w,X0_x,X0_y,X0_z,X10,X100,X101,X102,X103,X104,X105,X106,X107,X108,X109,X11,X110,X111,X112,X113,X114,X115,X116,X117,X118,X119,X12,X120,X122,X123,X124,X125,X126,X127,X128,X129,X13,X130,X131,X132,X133,X134,X135,X136,X137,X138,X139,X14,X140,X141,X142,X143,X144,X145,X146,X147,X148,X15,X150,X151,X152,X153,X154,X155,X156,X157,X158,X159,X16,X160,X161,X162,X163,X164,X165,X166,X167,X168,X169,X17,X170,X171,X172,X173,X174,X175,X176,X177,X178,X179,X18,X180,X181,X182,X183,X184,X185,X186,X187,X189,X19,X190,X191,X192,X194,X195,X196,X197,X198,X199,X1_a,X1_aa,X1_ab,X1_b,X1_c,X1_d,X1_e,X1_f,X1_g,X1_h,X1_i,X1_j,X1_k,X1_l,X1_m,X1_n,X1_o,X1_p,X1_q,X1_r,X1_s,X1_t,X1_u,X1_v,X1_w,X1_y,X1_z,X20,X200,...,X342,X343,X344,X345,X346,X347,X348,X349,X35,X350,X351,X352,X353,X354,X355,X356,X357,X358,X359,X36,X360,X361,X362,X363,X364,X365,X366,X367,X368,X369,X37,X370,X371,X372,X373,X374,X375,X376,X377,X378,X379,X38,X380,X382,X383,X384,X385,X39,X3_a,X3_b,X3_c,X3_d,X3_e,X3_f,X3_g,X40,X41,X42,X43,X44,X45,X46,X47,X48,X49,X4_a,X4_b,X4_c,X4_d,X50,X51,X52,X53,X54,X55,X56,X57,X58,X59,X5_a,X5_aa,X5_ab,X5_ac,X5_ad,X5_ae,X5_af,X5_ag,X5_ah,X5_b,X5_c,X5_d,X5_f,X5_g,X5_h,X5_i,X5_j,X5_k,X5_l,X5_m,X5_n,X5_o,X5_p,X5_q,X5_r,X5_s,X5_t,X5_u,X5_v,X5_w,X5_x,X5_y,X5_z,X60,X61,X62,X63,X64,X65,X66,X67,X68,X69,X6_a,X6_b,X6_c,X6_d,X6_e,X6_f,X6_g,X6_h,X6_i,X6_j,X6_k,X6_l,X70,X71,X73,X74,X75,X76,X77,X78,X79,X80,X81,X82,X83,X84,X85,X86,X87,X88,X89,X8_a,X8_b,X8_c,X8_d,X8_e,X8_f,X8_g,X8_h,X8_i,X8_j,X8_k,X8_l,X8_m,X8_n,X8_o,X8_p,X8_q,X8_r,X8_s,X8_t,X8_u,X8_v,X8_w,X8_x,X8_y,X90,X91,X92,X93,X94,X95,X96,X97,X98,X99,y
count,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,...,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0
mean,4211.039202,0.004277,0.0,0.0,0.0,0.002851,0.000238,0.008078,0.000238,0.009028,0.038489,0.102637,0.020908,0.006652,0.000238,0.001188,0.025659,0.002613,0.001426,0.004989,0.001188,0.000238,0.002613,0.001901,0.071038,0.038251,0.003089,0.004514,0.000238,0.001426,0.001426,0.014493,0.011404,0.050606,0.000713,0.015206,0.00594,0.040627,0.002851,0.001426,0.008078,0.039677,0.058446,0.000238,0.0,0.002376,0.02756,0.069613,0.004752,0.009503,0.047042,0.071751,0.08268,0.079591,0.019007,0.683298,0.936802,0.00594,0.788311,0.002613,0.000238,0.012354,0.00095,0.011642,0.039439,0.000238,0.001426,0.975053,0.001188,0.023759,0.144928,0.293894,0.196246,0.044429,0.622238,0.622238,0.074364,0.957234,0.008316,0.003326,0.000238,0.004752,0.039914,0.486101,0.961511,0.133761,0.06106,0.038489,0.027322,0.687812,0.123307,0.023759,0.026372,0.960323,0.590402,0.039439,0.095747,0.427893,0.039439,0.016393,0.76574,0.039677,0.801378,0.000713,0.039439,0.023759,0.04039,0.000713,0.788073,0.087432,0.031361,0.001188,0.213352,0.079591,0.709907,0.290093,0.234498,0.01473,0.002613,0.00095,0.195058,0.035638,0.314326,0.067712,0.003564,0.027085,0.001188,0.263721,0.007365,0.008791,0.026847,0.666429,0.00594,0.010216,0.019482,0.022808,0.014493,0.051794,0.560466,0.045141,0.010216,0.15847,0.092659,0.097648,0.002138,0.000713,0.018057,0.531955,0.428368,0.921834,0.111665,0.000238,0.476123,0.001663,0.468282,0.012117,0.012354,0.029461,0.024947,0.001188,0.036351,0.196246,0.001188,0.141601,0.033737,0.000238,0.00689,0.002851,0.002138,0.006415,0.044904,0.005227,0.002851,0.142314,0.006415,0.003801,0.019244,0.002376,0.000713,0.059872,0.143027,0.004277,0.009503,0.103588,0.011879,0.004989,0.007365,0.139463,0.006652,...,0.020195,0.069613,0.009503,0.024709,0.044904,0.000475,0.950107,0.041815,0.237349,0.341174,0.287242,0.050131,0.002613,0.203849,0.377762,0.184129,0.002376,0.433595,0.036826,0.002851,0.079591,0.962461,0.517938,0.745308,0.003564,0.003326,0.001663,0.04823,0.063198,0.0,0.237349,0.005702,0.011404,0.000238,0.014493,0.235923,0.325968,0.049656,0.311951,0.019244,0.011879,0.031124,0.008078,0.008791,0.000475,0.000713,0.001663,0.000713,0.113091,0.010929,0.451414,0.065099,0.037539,0.257306,0.064623,0.001901,0.011404,0.001426,0.076978,0.00689,0.252079,0.593728,0.011167,0.023759,0.133523,0.000238,0.00095,0.000238,0.998574,0.224044,0.714659,0.042528,0.00594,0.039914,0.006415,0.024947,0.015681,0.584462,0.001188,0.000238,0.024947,0.042528,0.050368,0.050606,0.046567,0.051556,0.047755,0.019007,0.000238,0.028748,0.045617,0.001426,0.001901,0.000475,0.042766,0.032549,0.045854,0.048943,0.046804,0.049656,0.003801,0.053932,0.046804,0.056783,0.048705,0.000238,0.0,0.058446,0.051794,0.000475,0.000238,0.000238,0.001663,0.953433,0.00594,0.013067,0.360181,0.001901,0.027085,0.002138,0.079829,0.030649,0.046567,0.004514,0.009503,0.139938,0.004039,0.00594,0.25493,0.051794,0.116417,0.238061,0.015918,0.112378,0.920409,0.107864,0.019007,0.997862,0.038727,0.039914,0.011167,0.007365,0.033262,0.949632,0.21953,0.02067,0.001188,0.107864,0.420527,0.001426,0.00095,0.008316,0.001901,0.047992,0.040865,0.023996,0.025659,0.065099,0.057258,0.032549,0.030411,0.055595,0.060822,0.036588,0.023759,0.038489,0.05607,0.040152,0.025184,0.027085,0.05417,0.057971,0.027798,0.034212,0.04134,0.045617,0.026134,0.025184,0.008316,0.002613,0.002376,0.000475,0.00784,0.000713,0.747684,0.001426,0.942742,0.009979,0.0
std,2423.078926,0.065263,0.0,0.0,0.0,0.053325,0.015414,0.089524,0.015414,0.094599,0.192396,0.303521,0.143092,0.0813,0.015414,0.03445,0.158136,0.051061,0.037734,0.070467,0.03445,0.015414,0.051061,0.043561,0.256919,0.191825,0.055496,0.067043,0.015414,0.037734,0.037734,0.119525,0.106192,0.219217,0.026691,0.122384,0.076849,0.197449,0.053325,0.037734,0.089524,0.195222,0.234613,0.015414,0.0,0.048691,0.163728,0.254523,0.068777,0.097033,0.211754,0.258106,0.275431,0.270692,0.136565,0.465246,0.243347,0.076849,0.408554,0.051061,0.015414,0.110475,0.030817,0.10728,0.194661,0.015414,0.037734,0.155981,0.03445,0.152314,0.35207,0.455598,0.397204,0.20607,0.484885,0.484885,0.262394,0.202352,0.09082,0.057584,0.015414,0.068777,0.195782,0.499866,0.192396,0.340436,0.239468,0.192396,0.163041,0.463441,0.328829,0.152314,0.160258,0.195222,0.491818,0.194661,0.294279,0.494832,0.194661,0.126998,0.423586,0.195222,0.39901,0.026691,0.194661,0.152314,0.196895,0.026691,0.408722,0.2825,0.174313,0.03445,0.409723,0.270692,0.453859,0.453859,0.423735,0.120486,0.051061,0.030817,0.396293,0.185408,0.464302,0.251281,0.059598,0.16235,0.03445,0.440702,0.085514,0.093357,0.161656,0.471544,0.076849,0.10057,0.138228,0.14931,0.119525,0.221637,0.496389,0.207639,0.10057,0.365224,0.289988,0.296873,0.046198,0.026691,0.133172,0.499037,0.494901,0.268464,0.314992,0.015414,0.499489,0.040752,0.499052,0.109421,0.110475,0.169114,0.155981,0.03445,0.187183,0.397204,0.03445,0.348682,0.180573,0.015414,0.082729,0.053325,0.046198,0.079845,0.207117,0.072117,0.053325,0.349414,0.079845,0.061545,0.137399,0.048691,0.026691,0.237277,0.350142,0.065263,0.097033,0.304761,0.108356,0.070467,0.085514,0.34647,0.0813,...,0.140683,0.254523,0.097033,0.155255,0.207117,0.021796,0.21775,0.20019,0.425508,0.47416,0.452529,0.21824,0.051061,0.402906,0.484885,0.387636,0.048691,0.49563,0.188356,0.053325,0.270692,0.1901,0.499738,0.43574,0.059598,0.057584,0.040752,0.214277,0.243347,0.0,0.425508,0.075305,0.106192,0.015414,0.119525,0.424625,0.468791,0.217258,0.463345,0.137399,0.108356,0.173673,0.089524,0.093357,0.021796,0.026691,0.040752,0.026691,0.316742,0.103981,0.497693,0.246729,0.1901,0.437201,0.245889,0.043561,0.106192,0.037734,0.266588,0.082729,0.434258,0.491195,0.105093,0.152314,0.34018,0.015414,0.030817,0.015414,0.037734,0.417001,0.45163,0.201814,0.076849,0.195782,0.079845,0.155981,0.124252,0.492873,0.03445,0.015414,0.155981,0.201814,0.21873,0.219217,0.210734,0.221156,0.213272,0.136565,0.015414,0.167117,0.208677,0.037734,0.043561,0.021796,0.202352,0.177475,0.209193,0.215774,0.211245,0.217258,0.061545,0.22591,0.211245,0.231455,0.215277,0.015414,0.0,0.234613,0.221637,0.021796,0.015414,0.015414,0.040752,0.210734,0.076849,0.113576,0.48011,0.043561,0.16235,0.046198,0.27106,0.172384,0.210734,0.067043,0.097033,0.346964,0.063432,0.076849,0.435874,0.221637,0.320763,0.425947,0.125174,0.315869,0.270692,0.310246,0.136565,0.046198,0.192965,0.195782,0.105093,0.085514,0.179341,0.21873,0.413977,0.142294,0.03445,0.310246,0.493702,0.037734,0.030817,0.09082,0.043561,0.213776,0.198,0.153055,0.158136,0.246729,0.232363,0.177475,0.171736,0.229165,0.239032,0.187771,0.152314,0.192396,0.230085,0.196339,0.156703,0.16235,0.226379,0.233716,0.164412,0.181796,0.199099,0.208677,0.159554,0.156703,0.09082,0.051061,0.048691,0.021796,0.088208,0.026691,0.434393,0.037734,0.232363,0.099405,0.0
min,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,2115.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
50%,4202.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0
75%,6310.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0
max,8416.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0


#### Interpretation
    
For categorical columns, some values exist in the train set, which don't occur in the test set, and vice versa. 

pandas.get_dummies() therefore creates datasets of two different sizes for the train set and test set.

pandas.align() can be used to ensure that there are equal columns in both the train and test sets.

Note that, when a column did not exist and was added to a set, the default fill_value = NaN, which will convert uint8 columns to float64. To rectify, and prevent crashing when running the test set, set pd.align(fill_value=0)

Note: when dropping encoded categorical columns, e.g. X4 or X5, then ensure that all associated columns are dropped correctly.

# Dimensionality Reduction

Instructions: Perform dimensionality reduction.

In [26]:
#split X and y for dimensionality reduction, training and test
X_oheTrainFull = oheTrainFull.drop(['ID','y'], axis=1) # features without the ID and score value (y) columns
y_oheTrainFull = oheTrainFull['y'] #score column

In [29]:
#dimensionality reduction
#Use PCA for regression. (LDA is for classification only.)
pca = PCA(0.95)
pca.fit(X_oheTrainFull)
print("Explained variance ratios")
print(pca.explained_variance_ratio_)
print("singular values")
print(pca.singular_values_)
pca.n_components_
pca95 = pca #pca using columns with top 95% of information 

pca99 = PCA(0.99)
pca99.fit(X_oheTrainFull)#pca using columns with top 99% of information
print(f"PCA95 number of components: {pca95.n_components_}")
print(f"PCA99 number of components: {pca99.n_components_}")

Explained variance ratios
[0.11327864 0.07799109 0.07358181 0.05848106 0.04943089 0.04191889
 0.03310021 0.0282729  0.02515469 0.02153505 0.02077602 0.01725079
 0.01505285 0.01435206 0.01385206 0.01296764 0.01205455 0.01092876
 0.00984218 0.00913215 0.00883422 0.0084378  0.00823221 0.00772747
 0.00743423 0.00697398 0.00693436 0.00657264 0.00638708 0.00629629
 0.00576382 0.00554613 0.0052066  0.0048151  0.004741   0.00442392
 0.00436865 0.00419933 0.00410023 0.00404169 0.00378884 0.00377674
 0.00371885 0.0035104  0.00347097 0.00334591 0.003219   0.00313439
 0.00301544 0.00291734 0.00275398 0.00271791 0.00267481 0.0025818
 0.00252452 0.00244265 0.00240757 0.00238739 0.00228922 0.00225378
 0.00221898 0.00217386 0.00213455 0.0020884  0.00205367 0.00204302
 0.00201987 0.00198637 0.00196944 0.00195864 0.00194545 0.0019259
 0.00189743 0.00188001 0.00185116 0.00182295 0.00181173 0.0017733
 0.00176402 0.00175546 0.00173673 0.00171772 0.00171186 0.00169201
 0.00168048 0.00165293 0.00163654 0.001

#### Interpretation

PCA is used for dimensionality reduction.

Keeping columns for 95% of variance, results in a dataset with 136 columns.

Keeping columns for 99% of variance, results in a dataset with 220 columns.

These data sets with reduced dimensionality will be tested with XGBoost.

Later, we will also try removing some features with low variance and then performing dimensionality reduction and training of the model.

# Training and Predictions

Instructions: Predict your test_df values using XGBoost.

In [None]:
model = XGBRegressor() # create model

## A first run for baseline, with 80/20 train/test split

Get a baseline score with default parameters for future comparisons.

In [83]:
#run XGBoost on full data set
#label encoding: Yes
#dimensionality reduction: no
#removal by variance: no
#grid search: no - default parameters
#split: simple 80/20 train/validation split

X_train, X_test, y_train, y_test = train_test_split(X_oheTrainFull, 
                                                    y_oheTrainFull, 
                                                    test_size=0.2,
                                                    random_state=1)

model.fit(X_train, y_train)
# parameters used in a trained model
print(f"model: {model}")
# make predictions for test data
y_pred = model.predict(X_test)
print(f"y_pred: {y_pred}")

model: XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
             importance_type='gain', interaction_constraints='',
             learning_rate=0.300000012, max_delta_step=0, max_depth=6,
             min_child_weight=1, missing=nan, monotone_constraints='()',
             n_estimators=100, n_jobs=8, num_parallel_tree=1,
             objective='reg:squarederror', random_state=0, reg_alpha=0,
             reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact',
             validate_parameters=1, verbosity=None)
y_pred: [ 74.718506 110.313576 109.3314   101.77607   76.53174   92.10126
  93.25013   94.35698   91.66725   92.9058    77.08907   91.475975
 109.99924   90.20202   97.01984   97.62739  110.913086 105.03322
  98.50395  113.67814  111.19591   97.97922  106.89985   93.50572
 103.56139  114.67858   89.98076  121.781746  93.27406   94.00175
 113.18389  115.007416  97.56092  108

In [87]:
# evaluate predictions

r2 = sklearn.metrics.r2_score(y_test, y_pred)
print("R2: %.5f%%" % (r2)) 

R2: 0.49147%


#### Interpretation

The R2 score was used by the original Kaggle competition. We will use this scoring metric to determine how close we get to real world rankings.

For training of basic model with 80/20 train/test split, R2 = 0.49147

Estimated Kaggle rank would be: 3537/3824

Default parameters used: base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
             importance_type='gain', interaction_constraints='',
             learning_rate=0.300000012, max_delta_step=0, max_depth=6,
             min_child_weight=1, missing=nan, monotone_constraints='()',
             n_estimators=100, n_jobs=8, num_parallel_tree=1,
             objective='reg:squarederror', random_state=0, reg_alpha=0,
             reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact',
             validate_parameters=1, verbosity=None

## Introduce 5-fold cross validation, default parameters

In [93]:
#run XGBoost on full data set
#label encoding: Yes
#dimensionality reduction: no
#removal by variance: no
#grid search: no - default parameters
#split: 5-fold cross validation split
X_oheTrainFull = oheTrainFull.drop(['ID','y'], axis=1)
y_oheTrainFull = oheTrainFull['y']
model = XGBRegressor()
scores = cross_val_score(model, X_oheTrainFull, y_oheTrainFull, cv=5, scoring="r2")
# parameters used in a trained model
print(f"model: {model}")

model: XGBRegressor(base_score=None, booster=None, colsample_bylevel=None,
             colsample_bynode=None, colsample_bytree=None, gamma=None,
             gpu_id=None, importance_type='gain', interaction_constraints=None,
             learning_rate=None, max_delta_step=None, max_depth=None,
             min_child_weight=None, missing=nan, monotone_constraints=None,
             n_estimators=100, n_jobs=None, num_parallel_tree=None,
             objective='reg:squarederror', random_state=None, reg_alpha=None,
             reg_lambda=None, scale_pos_weight=None, subsample=None,
             tree_method=None, validate_parameters=None, verbosity=None)


In [94]:
print("%0.5f accuracy with a standard deviation of %0.5f" % (scores.mean(), scores.std()))

0.50370 accuracy with a standard deviation of 0.06147


#### Interpretation

By using 5-fold cross validation instead of a simple train/test split, R2 improved from 0.49147 to 0.50370.

Parameters: default

Estimated Kaggle rank rose: 3501/3824

## Introduce grid search for hyperparameter tuning

In [108]:
#run XGBoost on full data set
#label encoding: Yes
#dimensionality reduction: no
#removal by variance: no
#grid search: yes
#split: 5-fold cross validation split

#testing auto tree method
X_oheTrainFull = oheTrainFull.drop(['ID','y'], axis=1)
y_oheTrainFull = oheTrainFull['y']
model = XGBRegressor()

parameters = {'gamma': [0],
              'objective':['reg:squarederror'], #reg:linear deprecated. reg:squarederror is default.
              'learning_rate': [0.01, 0.03, 0.05], #so called `eta` value
              'max_depth': [4, 5, 6],
              'min_child_weight': [1, 2, 4],
              'subsample': [0.5, 1],
              'sampling_method': ['uniform'],
              'lambda': [1, 10],
              'alpha': [0, 1],
              'tree_method': ['auto'],
              'colsample_bytree': [0.7]}

xgb_grid = GridSearchCV(model,
                        parameters,
                        cv = 5,
                        n_jobs = 5,
                        verbose=3,
                        scoring="r2")
start = time.time()
xgb_grid.fit(X_oheTrainFull,
         y_oheTrainFull)
end = time.time()
elapsedTime = end - start
print(f"elapsed: {elapsedTime}")

Fitting 5 folds for each of 216 candidates, totalling 1080 fits


[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Done  22 tasks      | elapsed:   22.8s
[Parallel(n_jobs=5)]: Done 118 tasks      | elapsed:  2.2min
[Parallel(n_jobs=5)]: Done 278 tasks      | elapsed:  5.9min
[Parallel(n_jobs=5)]: Done 502 tasks      | elapsed: 10.0min
[Parallel(n_jobs=5)]: Done 790 tasks      | elapsed: 16.4min
[Parallel(n_jobs=5)]: Done 1080 out of 1080 | elapsed: 22.1min finished


elapsed: 1325.4275529384613


In [109]:
print(xgb_grid.best_score_)
print(xgb_grid.best_params_)

0.5687215919632157
{'alpha': 0, 'colsample_bytree': 0.7, 'gamma': 0, 'lambda': 10, 'learning_rate': 0.05, 'max_depth': 4, 'min_child_weight': 2, 'objective': 'reg:squarederror', 'sampling_method': 'uniform', 'subsample': 1, 'tree_method': 'auto'}


#### Interpretation

Through hyperparameter tuning, R2 score rose from 0.50370 to 0.56872.

Estimated Kaggle rank: 731/3824

Optimised parameters: 'alpha': 0, 'colsample_bytree': 0.7, 'gamma': 0, 'lambda': 10, 'learning_rate': 0.05, 'max_depth': 4, 'min_child_weight': 2, 'objective': 'reg:squarederror', 'sampling_method': 'uniform', 'subsample': 1, 'tree_method': 'auto'}

Note that the concept of hyperparameter tuning has been shown. There is a wider range of parameters to tune. However, due to time constraints and the lengthy processing time, an exhaustive search has not been performed.

## Introduce dimensionality reduction

In [35]:
#Assess PCA, preserving 95% variance

#run XGBoost on full data set
#label encoding: Yes
#dimensionality reduction: PCA 95%
#removal by variance: no
#grid search: No - using parameters from previous grid search
#split: 5-fold cross validation split

#use the optimised parameters from above
parameters = {'alpha': [0], 
              'colsample_bytree': [0.7], 
              'gamma': [0], 
              'lambda': [10], 
              'learning_rate': [0.05], 
              'max_depth': [4], 
              'min_child_weight': [2], 
              'objective': ['reg:squarederror'], 
              'sampling_method': ['uniform'], 
              'subsample': [1], 
              'tree_method': ['auto']}

X_pca_oheTrainFull95 = pca95.transform(X_oheTrainFull)

xgb_grid = GridSearchCV(model,
                        parameters,
                        cv = 5,
                        n_jobs = 5,
                        verbose=3,
                        scoring="r2")
start = time.time()
xgb_grid.fit(X_pca_oheTrainFull95,
         y_oheTrainFull)
end = time.time()
elapsedTime = end - start
print(f"elapsed: {elapsedTime}")

Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Done   2 out of   5 | elapsed:   14.6s remaining:   22.0s
[Parallel(n_jobs=5)]: Done   5 out of   5 | elapsed:   14.8s finished


elapsed: 17.874645948410034


In [36]:
print(xgb_grid.best_score_)
print(xgb_grid.best_params_)

0.49051852462623274
{'alpha': 0, 'colsample_bytree': 0.7, 'gamma': 0, 'lambda': 10, 'learning_rate': 0.05, 'max_depth': 4, 'min_child_weight': 2, 'objective': 'reg:squarederror', 'sampling_method': 'uniform', 'subsample': 1, 'tree_method': 'auto'}


#### Interpretation

PCA preserving 95% of variance reduces the R2 score. It reduces too many dimensions.

In [38]:
#Assess PCA, preserving 99% variance

#run XGBoost on full data set
#label encoding: Yes
#dimensionality reduction: PCA 99%
#removal by variance: no
#grid search: No - using parameters from previous grid search
#split: 5-fold cross validation split


parameters = {'alpha': [0], 
              'colsample_bytree': [0.7], 
              'gamma': [0], 
              'lambda': [10], 
              'learning_rate': [0.05], 
              'max_depth': [4], 
              'min_child_weight': [2], 
              'objective': ['reg:squarederror'], 
              'sampling_method': ['uniform'], 
              'subsample': [1], 
              'tree_method': ['auto']}

X_pca_oheTrainFull99 = pca99.transform(X_oheTrainFull)

xgb_grid = GridSearchCV(model,
                        parameters,
                        cv = 5,
                        n_jobs = 5,
                        verbose=3,
                        scoring="r2")
start = time.time()
xgb_grid.fit(X_pca_oheTrainFull99,
         y_oheTrainFull)
end = time.time()
elapsedTime = end - start
print(f"elapsed: {elapsedTime}")

Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Done   2 out of   5 | elapsed:   17.5s remaining:   26.3s
[Parallel(n_jobs=5)]: Done   5 out of   5 | elapsed:   17.6s finished


elapsed: 22.295575618743896


In [39]:
print(xgb_grid.best_score_)
print(xgb_grid.best_params_)

0.4904597334852073
{'alpha': 0, 'colsample_bytree': 0.7, 'gamma': 0, 'lambda': 10, 'learning_rate': 0.05, 'max_depth': 4, 'min_child_weight': 2, 'objective': 'reg:squarederror', 'sampling_method': 'uniform', 'subsample': 1, 'tree_method': 'auto'}


#### Interpretation 

PCA preserving 99% of variance also worsens the R2 score. It reduces too many dimensions.
Also note that PCA converts uint columns to float, which may also increase processing time, even with fewer dimensions.

## Removal of numerical columns with low variance

In [48]:
#run XGBoost on full data set
#label encoding: Yes
#dimensionality reduction: no
#removal by variance: Yes - numerical columns based on OLS
#grid search: No - using parameters from previous grid search
#split: 5-fold cross validation split
forRemovalOLSList
print(f"feature set size before removing low variance numericals: {len(X_oheTrainFull.columns)}")
X_oheDropNumericTrain = X_oheTrainFull.drop(columns = forRemovalOLSList)
print(f"feature set size after removing low variance numericals (126 columns): {len(X_oheDropNumericTrain.columns)}")
numColsDropped = len(X_oheTrainFull.columns) - len(X_oheDropNumericTrain.columns)
print(f"Sanity check: num columns dropped: {numColsDropped}")

feature set size before removing low variance numericals: 579
feature set size after removing low variance numericals (126 columns): 453
Sanity check: num columns dropped: 126


In [49]:
parameters = {'alpha': [0], 
              'colsample_bytree': [0.7], 
              'gamma': [0], 
              'lambda': [10], 
              'learning_rate': [0.05], 
              'max_depth': [4], 
              'min_child_weight': [2], 
              'objective': ['reg:squarederror'], 
              'sampling_method': ['uniform'], 
              'subsample': [1], 
              'tree_method': ['auto']}

xgb_grid = GridSearchCV(model,
                        parameters,
                        cv = 5,
                        n_jobs = 5,
                        verbose=3,
                        scoring="r2")
start = time.time()
xgb_grid.fit(X_oheDropNumericTrain,
         y_oheTrainFull)
end = time.time()
elapsedTime = end - start
print(f"elapsed: {elapsedTime}")

Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Done   2 out of   5 | elapsed:    6.5s remaining:    9.8s
[Parallel(n_jobs=5)]: Done   5 out of   5 | elapsed:    6.7s finished


elapsed: 8.509615659713745


In [50]:
print(xgb_grid.best_score_)
print(xgb_grid.best_params_)

0.5679536477669422
{'alpha': 0, 'colsample_bytree': 0.7, 'gamma': 0, 'lambda': 10, 'learning_rate': 0.05, 'max_depth': 4, 'min_child_weight': 2, 'objective': 'reg:squarederror', 'sampling_method': 'uniform', 'subsample': 1, 'tree_method': 'auto'}


#### Interpretation

R^2 dropped from 0.56872 to 0.56795. But processing time is faster, with fewer columns.

## Removal of categorical columns

(Retain numerical columns, since their removal did not improve the R2 score.)

In [57]:
#Removal of Categorical Columns (Part 1/3)
#remove X4, according to OLS analysis

colList = list(X_oheTrainFull)
X_oheDropCatTrain =  X_oheTrainFull.copy(deep = True)
colList
columnName = 'X4_'
for s in colList:
    if columnName in s:
        X_oheDropCatTrain = X_oheDropCatTrain.drop(columns = s)
X_oheDropCatTrain.describe() #visual check that columns with X4_ were dropped.

Unnamed: 0,X0_a,X0_aa,X0_ab,X0_ac,X0_ad,X0_ae,X0_af,X0_ag,X0_ai,X0_aj,X0_ak,X0_al,X0_am,X0_an,X0_ao,X0_ap,X0_aq,X0_as,X0_at,X0_au,X0_av,X0_aw,X0_ax,X0_ay,X0_az,X0_b,X0_ba,X0_bb,X0_bc,X0_c,X0_d,X0_e,X0_f,X0_g,X0_h,X0_i,X0_j,X0_k,X0_l,X0_m,X0_n,X0_o,X0_p,X0_q,X0_r,X0_s,X0_t,X0_u,X0_v,X0_w,X0_x,X0_y,X0_z,X10,X100,X101,X102,X103,X104,X105,X106,X107,X108,X109,X11,X110,X111,X112,X113,X114,X115,X116,X117,X118,X119,X12,X120,X122,X123,X124,X125,X126,X127,X128,X129,X13,X130,X131,X132,X133,X134,X135,X136,X137,X138,X139,X14,X140,X141,X142,X143,X144,X145,X146,X147,X148,X15,X150,X151,X152,X153,X154,X155,X156,X157,X158,X159,X16,X160,X161,X162,X163,X164,X165,X166,X167,X168,X169,X17,X170,X171,X172,X173,X174,X175,X176,X177,X178,X179,X18,X180,X181,X182,X183,X184,X185,X186,X187,X189,X19,X190,X191,X192,X194,X195,X196,X197,X198,X199,X1_a,X1_aa,X1_ab,X1_b,X1_c,X1_d,X1_e,X1_f,X1_g,X1_h,X1_i,X1_j,X1_k,X1_l,X1_m,X1_n,X1_o,X1_p,X1_q,X1_r,X1_s,X1_t,X1_u,X1_v,X1_w,X1_y,X1_z,X20,X200,X201,...,X338,X339,X34,X340,X341,X342,X343,X344,X345,X346,X347,X348,X349,X35,X350,X351,X352,X353,X354,X355,X356,X357,X358,X359,X36,X360,X361,X362,X363,X364,X365,X366,X367,X368,X369,X37,X370,X371,X372,X373,X374,X375,X376,X377,X378,X379,X38,X380,X382,X383,X384,X385,X39,X3_a,X3_b,X3_c,X3_d,X3_e,X3_f,X3_g,X40,X41,X42,X43,X44,X45,X46,X47,X48,X49,X50,X51,X52,X53,X54,X55,X56,X57,X58,X59,X5_a,X5_aa,X5_ab,X5_ac,X5_ad,X5_ae,X5_af,X5_ag,X5_ah,X5_b,X5_c,X5_d,X5_f,X5_g,X5_h,X5_i,X5_j,X5_k,X5_l,X5_m,X5_n,X5_o,X5_p,X5_q,X5_r,X5_s,X5_t,X5_u,X5_v,X5_w,X5_x,X5_y,X5_z,X60,X61,X62,X63,X64,X65,X66,X67,X68,X69,X6_a,X6_b,X6_c,X6_d,X6_e,X6_f,X6_g,X6_h,X6_i,X6_j,X6_k,X6_l,X70,X71,X73,X74,X75,X76,X77,X78,X79,X80,X81,X82,X83,X84,X85,X86,X87,X88,X89,X8_a,X8_b,X8_c,X8_d,X8_e,X8_f,X8_g,X8_h,X8_i,X8_j,X8_k,X8_l,X8_m,X8_n,X8_o,X8_p,X8_q,X8_r,X8_s,X8_t,X8_u,X8_v,X8_w,X8_x,X8_y,X90,X91,X92,X93,X94,X95,X96,X97,X98,X99
count,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,...,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0
mean,0.004989,0.000475,0.000238,0.000238,0.003326,0.0,0.008316,0.0,0.008078,0.035876,0.082918,0.015918,0.004277,0.0,0.00095,0.024471,0.004277,0.002376,0.00594,0.002613,0.0,0.003801,0.004514,0.074364,0.041578,0.002613,0.006415,0.0,0.001426,0.000713,0.017344,0.007603,0.053932,0.000238,0.017819,0.004277,0.043003,0.002613,0.003801,0.008078,0.046329,0.063911,0.0,0.000475,0.002376,0.025184,0.072701,0.004039,0.008553,0.043241,0.071276,0.076978,0.085531,0.013305,0.690188,0.935614,0.00689,0.784509,0.001901,0.002376,0.013067,0.0,0.01473,0.04039,0.0,0.00095,0.974816,0.002851,0.022333,0.146115,0.285579,0.196721,0.04918,0.622238,0.622238,0.075077,0.95771,0.007128,0.002613,0.000475,0.003089,0.038964,0.495129,0.958422,0.122594,0.057971,0.041578,0.02661,0.688525,0.124258,0.022333,0.027085,0.956522,0.581848,0.040865,0.09052,0.42813,0.04039,0.014255,0.770254,0.038251,0.80803,0.001426,0.040865,0.022333,0.044904,0.000475,0.79235,0.085531,0.032312,0.000713,0.208838,0.076503,0.717273,0.282727,0.229746,0.013542,0.002613,0.001188,0.197672,0.040865,0.303397,0.062485,0.004514,0.033262,0.00095,0.270848,0.006652,0.007603,0.024234,0.657401,0.00594,0.009741,0.017344,0.022333,0.017106,0.050131,0.557377,0.047992,0.00784,0.157995,0.093847,0.106201,0.004039,0.001426,0.018769,0.535994,0.420527,0.915419,0.099549,0.000238,0.470896,0.002376,0.464006,0.011642,0.010216,0.032312,0.023046,0.002851,0.033975,0.197909,0.000713,0.140651,0.028748,0.000713,0.00784,0.005464,0.001426,0.00689,0.04823,0.005227,0.004039,0.140176,0.007603,0.004514,0.019482,0.002138,0.000713,0.059634,0.142077,0.007365,0.008791,0.096935,0.012354,0.005464,0.010929,0.142789,0.006652,0.177714,...,0.00689,0.000238,0.005464,0.022333,0.008078,0.022333,0.078403,0.008553,0.022333,0.047517,0.0,0.947256,0.044904,0.232359,0.338798,0.29722,0.05417,0.002138,0.202899,0.380375,0.179853,0.001188,0.426942,0.031837,0.004514,0.076503,0.966025,0.520314,0.753861,0.002851,0.002851,0.001188,0.051794,0.062723,0.000475,0.232359,0.006652,0.014255,0.000475,0.019244,0.22737,0.318841,0.057258,0.314802,0.02067,0.009503,0.033262,0.008078,0.007603,0.001663,0.000475,0.001426,0.000238,0.104538,0.013542,0.461392,0.0689,0.038727,0.255643,0.057258,0.000713,0.011404,0.000238,0.072226,0.011404,0.253267,0.597292,0.01283,0.022333,0.122119,0.214065,0.721787,0.04229,0.00689,0.043478,0.005227,0.021145,0.013305,0.574958,0.000713,0.0,0.02661,0.046804,0.047517,0.043953,0.048705,0.044666,0.048468,0.023046,0.0,0.031124,0.050843,0.001663,0.000238,0.000238,0.04918,0.029698,0.042053,0.046329,0.049418,0.050368,0.004752,0.049418,0.052269,0.051081,0.050843,0.0,0.000238,0.054882,0.054882,0.000475,0.000238,0.0,0.001426,0.953908,0.00594,0.011404,0.375148,0.002138,0.027085,0.001901,0.073414,0.029936,0.048943,0.006652,0.009028,0.148491,0.002851,0.004752,0.247565,0.045141,0.115942,0.246852,0.010216,0.113566,0.919933,0.103588,0.019957,0.999287,0.036113,0.043478,0.012592,0.005702,0.025184,0.947018,0.229033,0.017106,0.001188,0.103588,0.408173,0.001426,0.00095,0.007128,0.000713,0.049893,0.045141,0.023759,0.024471,0.053457,0.057733,0.030886,0.027798,0.056308,0.065811,0.041815,0.023996,0.036826,0.057496,0.038727,0.023759,0.027798,0.052031,0.060584,0.028273,0.028273,0.046092,0.046567,0.024947,0.02756,0.007365,0.001663,0.00095,0.0,0.007365,0.000238,0.758137,0.004277,0.942504,0.008553
std,0.070467,0.021796,0.015414,0.015414,0.057584,0.0,0.09082,0.0,0.089524,0.186002,0.27579,0.125174,0.065263,0.0,0.030817,0.154526,0.065263,0.048691,0.076849,0.051061,0.0,0.061545,0.067043,0.262394,0.199646,0.051061,0.079845,0.0,0.037734,0.026691,0.130564,0.086872,0.22591,0.015414,0.132309,0.065263,0.202888,0.051061,0.061545,0.089524,0.210222,0.244623,0.0,0.021796,0.048691,0.156703,0.259677,0.063432,0.092098,0.203423,0.257316,0.266588,0.279703,0.11459,0.462471,0.245468,0.082729,0.411211,0.043561,0.048691,0.113576,0.0,0.120486,0.196895,0.0,0.030817,0.156703,0.053325,0.147782,0.353264,0.451743,0.397567,0.21627,0.484885,0.484885,0.263547,0.201275,0.084134,0.051061,0.021796,0.055496,0.193532,0.500036,0.199646,0.32801,0.233716,0.199646,0.160959,0.463152,0.329914,0.147782,0.16235,0.203955,0.493314,0.198,0.28696,0.494867,0.196895,0.118555,0.420719,0.191825,0.393896,0.037734,0.198,0.147782,0.207117,0.021796,0.405673,0.279703,0.176848,0.026691,0.406527,0.265832,0.450379,0.450379,0.420719,0.115595,0.051061,0.03445,0.39829,0.198,0.45978,0.242063,0.067043,0.179341,0.030817,0.444451,0.0813,0.086872,0.153792,0.474635,0.076849,0.098226,0.130564,0.147782,0.129683,0.21824,0.496756,0.213776,0.088208,0.364779,0.29165,0.308131,0.063432,0.037734,0.135725,0.498762,0.493702,0.27829,0.299433,0.015414,0.499212,0.048691,0.498762,0.10728,0.10057,0.176848,0.150067,0.053325,0.181186,0.398471,0.026691,0.347702,0.167117,0.026691,0.088208,0.073729,0.037734,0.082729,0.214277,0.072117,0.063432,0.347211,0.086872,0.067043,0.138228,0.046198,0.026691,0.236836,0.34917,0.085514,0.093357,0.295905,0.110475,0.073729,0.103981,0.349899,0.0813,0.382318,...,0.082729,0.015414,0.073729,0.147782,0.089524,0.147782,0.268837,0.092098,0.147782,0.212768,0.0,0.223549,0.207117,0.422387,0.473357,0.457089,0.226379,0.046198,0.402205,0.485537,0.38411,0.03445,0.494693,0.175586,0.067043,0.265832,0.181186,0.499647,0.430812,0.053325,0.053325,0.03445,0.221637,0.242492,0.021796,0.422387,0.0813,0.118555,0.021796,0.137399,0.419183,0.466082,0.232363,0.464492,0.142294,0.097033,0.179341,0.089524,0.086872,0.040752,0.021796,0.037734,0.015414,0.305993,0.115595,0.498566,0.253314,0.192965,0.436274,0.232363,0.026691,0.106192,0.015414,0.258893,0.106192,0.434934,0.490501,0.112552,0.147782,0.327462,0.410221,0.448172,0.201275,0.082729,0.203955,0.072117,0.143885,0.11459,0.494408,0.026691,0.0,0.160959,0.211245,0.212768,0.205016,0.215277,0.206594,0.214778,0.150067,0.0,0.173673,0.219704,0.040752,0.015414,0.015414,0.21627,0.169774,0.200733,0.210222,0.216765,0.21873,0.068777,0.216765,0.222595,0.220189,0.219704,0.0,0.015414,0.227778,0.227778,0.021796,0.015414,0.0,0.037734,0.209709,0.076849,0.106192,0.484219,0.046198,0.16235,0.043561,0.260846,0.170431,0.215774,0.0813,0.094599,0.355629,0.053325,0.068777,0.431649,0.207639,0.320193,0.431231,0.10057,0.317321,0.271428,0.304761,0.13987,0.026691,0.186594,0.203955,0.111519,0.075305,0.156703,0.224024,0.42026,0.129683,0.03445,0.304761,0.491554,0.037734,0.030817,0.084134,0.026691,0.21775,0.207639,0.152314,0.154526,0.224969,0.233266,0.17303,0.164412,0.230543,0.247982,0.20019,0.153055,0.188356,0.232815,0.192965,0.152314,0.164412,0.222117,0.238595,0.165771,0.165771,0.209709,0.210734,0.155981,0.163728,0.085514,0.040752,0.030817,0.0,0.085514,0.015414,0.428262,0.065263,0.232815,0.092098
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
75%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
max,1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0


In [58]:
parameters = {'alpha': [0], 
              'colsample_bytree': [0.7], 
              'gamma': [0], 
              'lambda': [10], 
              'learning_rate': [0.05], 
              'max_depth': [4], 
              'min_child_weight': [2], 
              'objective': ['reg:squarederror'], 
              'sampling_method': ['uniform'], 
              'subsample': [1], 
              'tree_method': ['auto']}

xgb_grid = GridSearchCV(model,
                        parameters,
                        cv = 5,
                        n_jobs = 5,
                        verbose=3,
                        scoring="r2")
start = time.time()
xgb_grid.fit(X_oheDropCatTrain,
         y_oheTrainFull)
end = time.time()
elapsedTime = end - start
print(f"elapsed: {elapsedTime}")

Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Done   2 out of   5 | elapsed:    7.9s remaining:   11.9s
[Parallel(n_jobs=5)]: Done   5 out of   5 | elapsed:    8.1s finished


elapsed: 10.459959506988525


In [59]:
print(xgb_grid.best_score_)
print(xgb_grid.best_params_)

0.5689134528447758
{'alpha': 0, 'colsample_bytree': 0.7, 'gamma': 0, 'lambda': 10, 'learning_rate': 0.05, 'max_depth': 4, 'min_child_weight': 2, 'objective': 'reg:squarederror', 'sampling_method': 'uniform', 'subsample': 1, 'tree_method': 'auto'}


#### Interpretation

Removing X4 improves the R^2 from 0.56872 to 0.56891

Estimated Kaggle rank: 659/3824

In [60]:
#removal of categorical columns, keeping numerical columns (Part 2/3)
#remove X5, according to Tukey Test (X4 already removed above.)

colList = list(X_oheDropCatTrain)
columnName = 'X5_'
for s in colList:
    if columnName in s:
        X_oheDropCatTrain = X_oheDropCatTrain.drop(columns = s)
X_oheDropCatTrain.describe() #check that columns with X4_ and X5_ were dropped.

Unnamed: 0,X0_a,X0_aa,X0_ab,X0_ac,X0_ad,X0_ae,X0_af,X0_ag,X0_ai,X0_aj,X0_ak,X0_al,X0_am,X0_an,X0_ao,X0_ap,X0_aq,X0_as,X0_at,X0_au,X0_av,X0_aw,X0_ax,X0_ay,X0_az,X0_b,X0_ba,X0_bb,X0_bc,X0_c,X0_d,X0_e,X0_f,X0_g,X0_h,X0_i,X0_j,X0_k,X0_l,X0_m,X0_n,X0_o,X0_p,X0_q,X0_r,X0_s,X0_t,X0_u,X0_v,X0_w,X0_x,X0_y,X0_z,X10,X100,X101,X102,X103,X104,X105,X106,X107,X108,X109,X11,X110,X111,X112,X113,X114,X115,X116,X117,X118,X119,X12,X120,X122,X123,X124,X125,X126,X127,X128,X129,X13,X130,X131,X132,X133,X134,X135,X136,X137,X138,X139,X14,X140,X141,X142,X143,X144,X145,X146,X147,X148,X15,X150,X151,X152,X153,X154,X155,X156,X157,X158,X159,X16,X160,X161,X162,X163,X164,X165,X166,X167,X168,X169,X17,X170,X171,X172,X173,X174,X175,X176,X177,X178,X179,X18,X180,X181,X182,X183,X184,X185,X186,X187,X189,X19,X190,X191,X192,X194,X195,X196,X197,X198,X199,X1_a,X1_aa,X1_ab,X1_b,X1_c,X1_d,X1_e,X1_f,X1_g,X1_h,X1_i,X1_j,X1_k,X1_l,X1_m,X1_n,X1_o,X1_p,X1_q,X1_r,X1_s,X1_t,X1_u,X1_v,X1_w,X1_y,X1_z,X20,X200,X201,...,X308,X309,X31,X310,X311,X312,X313,X314,X315,X316,X317,X318,X319,X32,X320,X321,X322,X323,X324,X325,X326,X327,X328,X329,X33,X330,X331,X332,X333,X334,X335,X336,X337,X338,X339,X34,X340,X341,X342,X343,X344,X345,X346,X347,X348,X349,X35,X350,X351,X352,X353,X354,X355,X356,X357,X358,X359,X36,X360,X361,X362,X363,X364,X365,X366,X367,X368,X369,X37,X370,X371,X372,X373,X374,X375,X376,X377,X378,X379,X38,X380,X382,X383,X384,X385,X39,X3_a,X3_b,X3_c,X3_d,X3_e,X3_f,X3_g,X40,X41,X42,X43,X44,X45,X46,X47,X48,X49,X50,X51,X52,X53,X54,X55,X56,X57,X58,X59,X60,X61,X62,X63,X64,X65,X66,X67,X68,X69,X6_a,X6_b,X6_c,X6_d,X6_e,X6_f,X6_g,X6_h,X6_i,X6_j,X6_k,X6_l,X70,X71,X73,X74,X75,X76,X77,X78,X79,X80,X81,X82,X83,X84,X85,X86,X87,X88,X89,X8_a,X8_b,X8_c,X8_d,X8_e,X8_f,X8_g,X8_h,X8_i,X8_j,X8_k,X8_l,X8_m,X8_n,X8_o,X8_p,X8_q,X8_r,X8_s,X8_t,X8_u,X8_v,X8_w,X8_x,X8_y,X90,X91,X92,X93,X94,X95,X96,X97,X98,X99
count,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,...,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0
mean,0.004989,0.000475,0.000238,0.000238,0.003326,0.0,0.008316,0.0,0.008078,0.035876,0.082918,0.015918,0.004277,0.0,0.00095,0.024471,0.004277,0.002376,0.00594,0.002613,0.0,0.003801,0.004514,0.074364,0.041578,0.002613,0.006415,0.0,0.001426,0.000713,0.017344,0.007603,0.053932,0.000238,0.017819,0.004277,0.043003,0.002613,0.003801,0.008078,0.046329,0.063911,0.0,0.000475,0.002376,0.025184,0.072701,0.004039,0.008553,0.043241,0.071276,0.076978,0.085531,0.013305,0.690188,0.935614,0.00689,0.784509,0.001901,0.002376,0.013067,0.0,0.01473,0.04039,0.0,0.00095,0.974816,0.002851,0.022333,0.146115,0.285579,0.196721,0.04918,0.622238,0.622238,0.075077,0.95771,0.007128,0.002613,0.000475,0.003089,0.038964,0.495129,0.958422,0.122594,0.057971,0.041578,0.02661,0.688525,0.124258,0.022333,0.027085,0.956522,0.581848,0.040865,0.09052,0.42813,0.04039,0.014255,0.770254,0.038251,0.80803,0.001426,0.040865,0.022333,0.044904,0.000475,0.79235,0.085531,0.032312,0.000713,0.208838,0.076503,0.717273,0.282727,0.229746,0.013542,0.002613,0.001188,0.197672,0.040865,0.303397,0.062485,0.004514,0.033262,0.00095,0.270848,0.006652,0.007603,0.024234,0.657401,0.00594,0.009741,0.017344,0.022333,0.017106,0.050131,0.557377,0.047992,0.00784,0.157995,0.093847,0.106201,0.004039,0.001426,0.018769,0.535994,0.420527,0.915419,0.099549,0.000238,0.470896,0.002376,0.464006,0.011642,0.010216,0.032312,0.023046,0.002851,0.033975,0.197909,0.000713,0.140651,0.028748,0.000713,0.00784,0.005464,0.001426,0.00689,0.04823,0.005227,0.004039,0.140176,0.007603,0.004514,0.019482,0.002138,0.000713,0.059634,0.142077,0.007365,0.008791,0.096935,0.012354,0.005464,0.010929,0.142789,0.006652,0.177714,...,0.009503,0.007128,0.232359,0.002613,0.598479,0.004277,0.301022,0.431694,0.028748,0.195533,0.007603,0.000713,0.000475,0.011167,0.007128,0.238774,0.021858,0.009266,0.574958,0.005702,0.032312,0.128297,0.040152,0.435258,0.000238,0.0,0.05607,0.000713,0.023996,0.46258,0.003564,0.127346,0.516512,0.00689,0.000238,0.005464,0.022333,0.008078,0.022333,0.078403,0.008553,0.022333,0.047517,0.0,0.947256,0.044904,0.232359,0.338798,0.29722,0.05417,0.002138,0.202899,0.380375,0.179853,0.001188,0.426942,0.031837,0.004514,0.076503,0.966025,0.520314,0.753861,0.002851,0.002851,0.001188,0.051794,0.062723,0.000475,0.232359,0.006652,0.014255,0.000475,0.019244,0.22737,0.318841,0.057258,0.314802,0.02067,0.009503,0.033262,0.008078,0.007603,0.001663,0.000475,0.001426,0.000238,0.104538,0.013542,0.461392,0.0689,0.038727,0.255643,0.057258,0.000713,0.011404,0.000238,0.072226,0.011404,0.253267,0.597292,0.01283,0.022333,0.122119,0.214065,0.721787,0.04229,0.00689,0.043478,0.005227,0.021145,0.013305,0.574958,0.000713,0.001426,0.953908,0.00594,0.011404,0.375148,0.002138,0.027085,0.001901,0.073414,0.029936,0.048943,0.006652,0.009028,0.148491,0.002851,0.004752,0.247565,0.045141,0.115942,0.246852,0.010216,0.113566,0.919933,0.103588,0.019957,0.999287,0.036113,0.043478,0.012592,0.005702,0.025184,0.947018,0.229033,0.017106,0.001188,0.103588,0.408173,0.001426,0.00095,0.007128,0.000713,0.049893,0.045141,0.023759,0.024471,0.053457,0.057733,0.030886,0.027798,0.056308,0.065811,0.041815,0.023996,0.036826,0.057496,0.038727,0.023759,0.027798,0.052031,0.060584,0.028273,0.028273,0.046092,0.046567,0.024947,0.02756,0.007365,0.001663,0.00095,0.0,0.007365,0.000238,0.758137,0.004277,0.942504,0.008553
std,0.070467,0.021796,0.015414,0.015414,0.057584,0.0,0.09082,0.0,0.089524,0.186002,0.27579,0.125174,0.065263,0.0,0.030817,0.154526,0.065263,0.048691,0.076849,0.051061,0.0,0.061545,0.067043,0.262394,0.199646,0.051061,0.079845,0.0,0.037734,0.026691,0.130564,0.086872,0.22591,0.015414,0.132309,0.065263,0.202888,0.051061,0.061545,0.089524,0.210222,0.244623,0.0,0.021796,0.048691,0.156703,0.259677,0.063432,0.092098,0.203423,0.257316,0.266588,0.279703,0.11459,0.462471,0.245468,0.082729,0.411211,0.043561,0.048691,0.113576,0.0,0.120486,0.196895,0.0,0.030817,0.156703,0.053325,0.147782,0.353264,0.451743,0.397567,0.21627,0.484885,0.484885,0.263547,0.201275,0.084134,0.051061,0.021796,0.055496,0.193532,0.500036,0.199646,0.32801,0.233716,0.199646,0.160959,0.463152,0.329914,0.147782,0.16235,0.203955,0.493314,0.198,0.28696,0.494867,0.196895,0.118555,0.420719,0.191825,0.393896,0.037734,0.198,0.147782,0.207117,0.021796,0.405673,0.279703,0.176848,0.026691,0.406527,0.265832,0.450379,0.450379,0.420719,0.115595,0.051061,0.03445,0.39829,0.198,0.45978,0.242063,0.067043,0.179341,0.030817,0.444451,0.0813,0.086872,0.153792,0.474635,0.076849,0.098226,0.130564,0.147782,0.129683,0.21824,0.496756,0.213776,0.088208,0.364779,0.29165,0.308131,0.063432,0.037734,0.135725,0.498762,0.493702,0.27829,0.299433,0.015414,0.499212,0.048691,0.498762,0.10728,0.10057,0.176848,0.150067,0.053325,0.181186,0.398471,0.026691,0.347702,0.167117,0.026691,0.088208,0.073729,0.037734,0.082729,0.214277,0.072117,0.063432,0.347211,0.086872,0.067043,0.138228,0.046198,0.026691,0.236836,0.34917,0.085514,0.093357,0.295905,0.110475,0.073729,0.103981,0.349899,0.0813,0.382318,...,0.097033,0.084134,0.422387,0.051061,0.490264,0.065263,0.458757,0.495371,0.167117,0.396658,0.086872,0.026691,0.021796,0.105093,0.084134,0.426385,0.146237,0.095824,0.494408,0.075305,0.176848,0.334459,0.196339,0.49585,0.015414,0.0,0.230085,0.026691,0.153055,0.498657,0.059598,0.3334,0.499787,0.082729,0.015414,0.073729,0.147782,0.089524,0.147782,0.268837,0.092098,0.147782,0.212768,0.0,0.223549,0.207117,0.422387,0.473357,0.457089,0.226379,0.046198,0.402205,0.485537,0.38411,0.03445,0.494693,0.175586,0.067043,0.265832,0.181186,0.499647,0.430812,0.053325,0.053325,0.03445,0.221637,0.242492,0.021796,0.422387,0.0813,0.118555,0.021796,0.137399,0.419183,0.466082,0.232363,0.464492,0.142294,0.097033,0.179341,0.089524,0.086872,0.040752,0.021796,0.037734,0.015414,0.305993,0.115595,0.498566,0.253314,0.192965,0.436274,0.232363,0.026691,0.106192,0.015414,0.258893,0.106192,0.434934,0.490501,0.112552,0.147782,0.327462,0.410221,0.448172,0.201275,0.082729,0.203955,0.072117,0.143885,0.11459,0.494408,0.026691,0.037734,0.209709,0.076849,0.106192,0.484219,0.046198,0.16235,0.043561,0.260846,0.170431,0.215774,0.0813,0.094599,0.355629,0.053325,0.068777,0.431649,0.207639,0.320193,0.431231,0.10057,0.317321,0.271428,0.304761,0.13987,0.026691,0.186594,0.203955,0.111519,0.075305,0.156703,0.224024,0.42026,0.129683,0.03445,0.304761,0.491554,0.037734,0.030817,0.084134,0.026691,0.21775,0.207639,0.152314,0.154526,0.224969,0.233266,0.17303,0.164412,0.230543,0.247982,0.20019,0.153055,0.188356,0.232815,0.192965,0.152314,0.164412,0.222117,0.238595,0.165771,0.165771,0.209709,0.210734,0.155981,0.163728,0.085514,0.040752,0.030817,0.0,0.085514,0.015414,0.428262,0.065263,0.232815,0.092098
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
75%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
max,1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0


In [61]:
parameters = {'alpha': [0], 
              'colsample_bytree': [0.7], 
              'gamma': [0], 
              'lambda': [10], 
              'learning_rate': [0.05], 
              'max_depth': [4], 
              'min_child_weight': [2], 
              'objective': ['reg:squarederror'], 
              'sampling_method': ['uniform'], 
              'subsample': [1], 
              'tree_method': ['auto']}

xgb_grid = GridSearchCV(model,
                        parameters,
                        cv = 5,
                        n_jobs = 5,
                        verbose=3,
                        scoring="r2")
start = time.time()
xgb_grid.fit(X_oheDropCatTrain,
         y_oheTrainFull)
end = time.time()
elapsedTime = end - start
print(f"elapsed: {elapsedTime}")

Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Done   2 out of   5 | elapsed:    7.2s remaining:   10.8s
[Parallel(n_jobs=5)]: Done   5 out of   5 | elapsed:    7.5s finished


elapsed: 9.703987836837769


In [62]:
print(xgb_grid.best_score_)
print(xgb_grid.best_params_)

0.5681380383360413
{'alpha': 0, 'colsample_bytree': 0.7, 'gamma': 0, 'lambda': 10, 'learning_rate': 0.05, 'max_depth': 4, 'min_child_weight': 2, 'objective': 'reg:squarederror', 'sampling_method': 'uniform', 'subsample': 1, 'tree_method': 'auto'}


#### Interpretation

Removing X5 worsens R2 from from 0.56891 to 0.56813

In [63]:
#removal of categorical columns, keeping numerical columns (Part 3/3)
#remove X4 and X6, according to OLS and Tukey

colList = list(X_oheTrainFull)
X_oheDropCatTrain =  X_oheTrainFull.copy(deep = True)
colList
columnName = 'X4_'
for s in colList:
    if columnName in s:
        X_oheDropCatTrain = X_oheDropCatTrain.drop(columns = s)
columnName = 'X6_'
for s in colList:
    if columnName in s:
        X_oheDropCatTrain = X_oheDropCatTrain.drop(columns = s)
X_oheDropCatTrain.describe() #check that columns with X4_ were dropped.

Unnamed: 0,X0_a,X0_aa,X0_ab,X0_ac,X0_ad,X0_ae,X0_af,X0_ag,X0_ai,X0_aj,X0_ak,X0_al,X0_am,X0_an,X0_ao,X0_ap,X0_aq,X0_as,X0_at,X0_au,X0_av,X0_aw,X0_ax,X0_ay,X0_az,X0_b,X0_ba,X0_bb,X0_bc,X0_c,X0_d,X0_e,X0_f,X0_g,X0_h,X0_i,X0_j,X0_k,X0_l,X0_m,X0_n,X0_o,X0_p,X0_q,X0_r,X0_s,X0_t,X0_u,X0_v,X0_w,X0_x,X0_y,X0_z,X10,X100,X101,X102,X103,X104,X105,X106,X107,X108,X109,X11,X110,X111,X112,X113,X114,X115,X116,X117,X118,X119,X12,X120,X122,X123,X124,X125,X126,X127,X128,X129,X13,X130,X131,X132,X133,X134,X135,X136,X137,X138,X139,X14,X140,X141,X142,X143,X144,X145,X146,X147,X148,X15,X150,X151,X152,X153,X154,X155,X156,X157,X158,X159,X16,X160,X161,X162,X163,X164,X165,X166,X167,X168,X169,X17,X170,X171,X172,X173,X174,X175,X176,X177,X178,X179,X18,X180,X181,X182,X183,X184,X185,X186,X187,X189,X19,X190,X191,X192,X194,X195,X196,X197,X198,X199,X1_a,X1_aa,X1_ab,X1_b,X1_c,X1_d,X1_e,X1_f,X1_g,X1_h,X1_i,X1_j,X1_k,X1_l,X1_m,X1_n,X1_o,X1_p,X1_q,X1_r,X1_s,X1_t,X1_u,X1_v,X1_w,X1_y,X1_z,X20,X200,X201,...,X327,X328,X329,X33,X330,X331,X332,X333,X334,X335,X336,X337,X338,X339,X34,X340,X341,X342,X343,X344,X345,X346,X347,X348,X349,X35,X350,X351,X352,X353,X354,X355,X356,X357,X358,X359,X36,X360,X361,X362,X363,X364,X365,X366,X367,X368,X369,X37,X370,X371,X372,X373,X374,X375,X376,X377,X378,X379,X38,X380,X382,X383,X384,X385,X39,X3_a,X3_b,X3_c,X3_d,X3_e,X3_f,X3_g,X40,X41,X42,X43,X44,X45,X46,X47,X48,X49,X50,X51,X52,X53,X54,X55,X56,X57,X58,X59,X5_a,X5_aa,X5_ab,X5_ac,X5_ad,X5_ae,X5_af,X5_ag,X5_ah,X5_b,X5_c,X5_d,X5_f,X5_g,X5_h,X5_i,X5_j,X5_k,X5_l,X5_m,X5_n,X5_o,X5_p,X5_q,X5_r,X5_s,X5_t,X5_u,X5_v,X5_w,X5_x,X5_y,X5_z,X60,X61,X62,X63,X64,X65,X66,X67,X68,X69,X70,X71,X73,X74,X75,X76,X77,X78,X79,X80,X81,X82,X83,X84,X85,X86,X87,X88,X89,X8_a,X8_b,X8_c,X8_d,X8_e,X8_f,X8_g,X8_h,X8_i,X8_j,X8_k,X8_l,X8_m,X8_n,X8_o,X8_p,X8_q,X8_r,X8_s,X8_t,X8_u,X8_v,X8_w,X8_x,X8_y,X90,X91,X92,X93,X94,X95,X96,X97,X98,X99
count,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,...,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0,4209.0
mean,0.004989,0.000475,0.000238,0.000238,0.003326,0.0,0.008316,0.0,0.008078,0.035876,0.082918,0.015918,0.004277,0.0,0.00095,0.024471,0.004277,0.002376,0.00594,0.002613,0.0,0.003801,0.004514,0.074364,0.041578,0.002613,0.006415,0.0,0.001426,0.000713,0.017344,0.007603,0.053932,0.000238,0.017819,0.004277,0.043003,0.002613,0.003801,0.008078,0.046329,0.063911,0.0,0.000475,0.002376,0.025184,0.072701,0.004039,0.008553,0.043241,0.071276,0.076978,0.085531,0.013305,0.690188,0.935614,0.00689,0.784509,0.001901,0.002376,0.013067,0.0,0.01473,0.04039,0.0,0.00095,0.974816,0.002851,0.022333,0.146115,0.285579,0.196721,0.04918,0.622238,0.622238,0.075077,0.95771,0.007128,0.002613,0.000475,0.003089,0.038964,0.495129,0.958422,0.122594,0.057971,0.041578,0.02661,0.688525,0.124258,0.022333,0.027085,0.956522,0.581848,0.040865,0.09052,0.42813,0.04039,0.014255,0.770254,0.038251,0.80803,0.001426,0.040865,0.022333,0.044904,0.000475,0.79235,0.085531,0.032312,0.000713,0.208838,0.076503,0.717273,0.282727,0.229746,0.013542,0.002613,0.001188,0.197672,0.040865,0.303397,0.062485,0.004514,0.033262,0.00095,0.270848,0.006652,0.007603,0.024234,0.657401,0.00594,0.009741,0.017344,0.022333,0.017106,0.050131,0.557377,0.047992,0.00784,0.157995,0.093847,0.106201,0.004039,0.001426,0.018769,0.535994,0.420527,0.915419,0.099549,0.000238,0.470896,0.002376,0.464006,0.011642,0.010216,0.032312,0.023046,0.002851,0.033975,0.197909,0.000713,0.140651,0.028748,0.000713,0.00784,0.005464,0.001426,0.00689,0.04823,0.005227,0.004039,0.140176,0.007603,0.004514,0.019482,0.002138,0.000713,0.059634,0.142077,0.007365,0.008791,0.096935,0.012354,0.005464,0.010929,0.142789,0.006652,0.177714,...,0.128297,0.040152,0.435258,0.000238,0.0,0.05607,0.000713,0.023996,0.46258,0.003564,0.127346,0.516512,0.00689,0.000238,0.005464,0.022333,0.008078,0.022333,0.078403,0.008553,0.022333,0.047517,0.0,0.947256,0.044904,0.232359,0.338798,0.29722,0.05417,0.002138,0.202899,0.380375,0.179853,0.001188,0.426942,0.031837,0.004514,0.076503,0.966025,0.520314,0.753861,0.002851,0.002851,0.001188,0.051794,0.062723,0.000475,0.232359,0.006652,0.014255,0.000475,0.019244,0.22737,0.318841,0.057258,0.314802,0.02067,0.009503,0.033262,0.008078,0.007603,0.001663,0.000475,0.001426,0.000238,0.104538,0.013542,0.461392,0.0689,0.038727,0.255643,0.057258,0.000713,0.011404,0.000238,0.072226,0.011404,0.253267,0.597292,0.01283,0.022333,0.122119,0.214065,0.721787,0.04229,0.00689,0.043478,0.005227,0.021145,0.013305,0.574958,0.000713,0.0,0.02661,0.046804,0.047517,0.043953,0.048705,0.044666,0.048468,0.023046,0.0,0.031124,0.050843,0.001663,0.000238,0.000238,0.04918,0.029698,0.042053,0.046329,0.049418,0.050368,0.004752,0.049418,0.052269,0.051081,0.050843,0.0,0.000238,0.054882,0.054882,0.000475,0.000238,0.0,0.001426,0.953908,0.00594,0.011404,0.375148,0.002138,0.027085,0.001901,0.073414,0.029936,0.919933,0.103588,0.019957,0.999287,0.036113,0.043478,0.012592,0.005702,0.025184,0.947018,0.229033,0.017106,0.001188,0.103588,0.408173,0.001426,0.00095,0.007128,0.000713,0.049893,0.045141,0.023759,0.024471,0.053457,0.057733,0.030886,0.027798,0.056308,0.065811,0.041815,0.023996,0.036826,0.057496,0.038727,0.023759,0.027798,0.052031,0.060584,0.028273,0.028273,0.046092,0.046567,0.024947,0.02756,0.007365,0.001663,0.00095,0.0,0.007365,0.000238,0.758137,0.004277,0.942504,0.008553
std,0.070467,0.021796,0.015414,0.015414,0.057584,0.0,0.09082,0.0,0.089524,0.186002,0.27579,0.125174,0.065263,0.0,0.030817,0.154526,0.065263,0.048691,0.076849,0.051061,0.0,0.061545,0.067043,0.262394,0.199646,0.051061,0.079845,0.0,0.037734,0.026691,0.130564,0.086872,0.22591,0.015414,0.132309,0.065263,0.202888,0.051061,0.061545,0.089524,0.210222,0.244623,0.0,0.021796,0.048691,0.156703,0.259677,0.063432,0.092098,0.203423,0.257316,0.266588,0.279703,0.11459,0.462471,0.245468,0.082729,0.411211,0.043561,0.048691,0.113576,0.0,0.120486,0.196895,0.0,0.030817,0.156703,0.053325,0.147782,0.353264,0.451743,0.397567,0.21627,0.484885,0.484885,0.263547,0.201275,0.084134,0.051061,0.021796,0.055496,0.193532,0.500036,0.199646,0.32801,0.233716,0.199646,0.160959,0.463152,0.329914,0.147782,0.16235,0.203955,0.493314,0.198,0.28696,0.494867,0.196895,0.118555,0.420719,0.191825,0.393896,0.037734,0.198,0.147782,0.207117,0.021796,0.405673,0.279703,0.176848,0.026691,0.406527,0.265832,0.450379,0.450379,0.420719,0.115595,0.051061,0.03445,0.39829,0.198,0.45978,0.242063,0.067043,0.179341,0.030817,0.444451,0.0813,0.086872,0.153792,0.474635,0.076849,0.098226,0.130564,0.147782,0.129683,0.21824,0.496756,0.213776,0.088208,0.364779,0.29165,0.308131,0.063432,0.037734,0.135725,0.498762,0.493702,0.27829,0.299433,0.015414,0.499212,0.048691,0.498762,0.10728,0.10057,0.176848,0.150067,0.053325,0.181186,0.398471,0.026691,0.347702,0.167117,0.026691,0.088208,0.073729,0.037734,0.082729,0.214277,0.072117,0.063432,0.347211,0.086872,0.067043,0.138228,0.046198,0.026691,0.236836,0.34917,0.085514,0.093357,0.295905,0.110475,0.073729,0.103981,0.349899,0.0813,0.382318,...,0.334459,0.196339,0.49585,0.015414,0.0,0.230085,0.026691,0.153055,0.498657,0.059598,0.3334,0.499787,0.082729,0.015414,0.073729,0.147782,0.089524,0.147782,0.268837,0.092098,0.147782,0.212768,0.0,0.223549,0.207117,0.422387,0.473357,0.457089,0.226379,0.046198,0.402205,0.485537,0.38411,0.03445,0.494693,0.175586,0.067043,0.265832,0.181186,0.499647,0.430812,0.053325,0.053325,0.03445,0.221637,0.242492,0.021796,0.422387,0.0813,0.118555,0.021796,0.137399,0.419183,0.466082,0.232363,0.464492,0.142294,0.097033,0.179341,0.089524,0.086872,0.040752,0.021796,0.037734,0.015414,0.305993,0.115595,0.498566,0.253314,0.192965,0.436274,0.232363,0.026691,0.106192,0.015414,0.258893,0.106192,0.434934,0.490501,0.112552,0.147782,0.327462,0.410221,0.448172,0.201275,0.082729,0.203955,0.072117,0.143885,0.11459,0.494408,0.026691,0.0,0.160959,0.211245,0.212768,0.205016,0.215277,0.206594,0.214778,0.150067,0.0,0.173673,0.219704,0.040752,0.015414,0.015414,0.21627,0.169774,0.200733,0.210222,0.216765,0.21873,0.068777,0.216765,0.222595,0.220189,0.219704,0.0,0.015414,0.227778,0.227778,0.021796,0.015414,0.0,0.037734,0.209709,0.076849,0.106192,0.484219,0.046198,0.16235,0.043561,0.260846,0.170431,0.271428,0.304761,0.13987,0.026691,0.186594,0.203955,0.111519,0.075305,0.156703,0.224024,0.42026,0.129683,0.03445,0.304761,0.491554,0.037734,0.030817,0.084134,0.026691,0.21775,0.207639,0.152314,0.154526,0.224969,0.233266,0.17303,0.164412,0.230543,0.247982,0.20019,0.153055,0.188356,0.232815,0.192965,0.152314,0.164412,0.222117,0.238595,0.165771,0.165771,0.209709,0.210734,0.155981,0.163728,0.085514,0.040752,0.030817,0.0,0.085514,0.015414,0.428262,0.065263,0.232815,0.092098
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
75%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
max,1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0


In [64]:
parameters = {'alpha': [0], 
              'colsample_bytree': [0.7], 
              'gamma': [0], 
              'lambda': [10], 
              'learning_rate': [0.05], 
              'max_depth': [4], 
              'min_child_weight': [2], 
              'objective': ['reg:squarederror'], 
              'sampling_method': ['uniform'], 
              'subsample': [1], 
              'tree_method': ['auto']}

xgb_grid = GridSearchCV(model,
                        parameters,
                        cv = 5,
                        n_jobs = 5,
                        verbose=3,
                        scoring="r2")
start = time.time()
xgb_grid.fit(X_oheDropCatTrain,
         y_oheTrainFull)
end = time.time()
elapsedTime = end - start
print(f"elapsed: {elapsedTime}")

Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Done   2 out of   5 | elapsed:    7.0s remaining:   10.6s
[Parallel(n_jobs=5)]: Done   5 out of   5 | elapsed:    7.5s finished


elapsed: 9.823047637939453


In [65]:
print(xgb_grid.best_score_)
print(xgb_grid.best_params_)

0.5690118775745561
{'alpha': 0, 'colsample_bytree': 0.7, 'gamma': 0, 'lambda': 10, 'learning_rate': 0.05, 'max_depth': 4, 'min_child_weight': 2, 'objective': 'reg:squarederror', 'sampling_method': 'uniform', 'subsample': 1, 'tree_method': 'auto'}


#### Interpretation

Removing X4 and X6 improves the R^2 from 0.56872 to 0.56901

Estimated Kaggle rank: 642/3824

In [75]:
# remove numerical columns with standard deviation = 0
X_oheDropCatNumTrain = X_oheDropCatTrain.drop(columns = numericalNoVariance.index)

X_oheDropCatNumTrain.head() #verify that columns with no variance were removed.

Unnamed: 0,X0_a,X0_aa,X0_ab,X0_ac,X0_ad,X0_ae,X0_af,X0_ag,X0_ai,X0_aj,X0_ak,X0_al,X0_am,X0_an,X0_ao,X0_ap,X0_aq,X0_as,X0_at,X0_au,X0_av,X0_aw,X0_ax,X0_ay,X0_az,X0_b,X0_ba,X0_bb,X0_bc,X0_c,X0_d,X0_e,X0_f,X0_g,X0_h,X0_i,X0_j,X0_k,X0_l,X0_m,X0_n,X0_o,X0_p,X0_q,X0_r,X0_s,X0_t,X0_u,X0_v,X0_w,X0_x,X0_y,X0_z,X10,X100,X101,X102,X103,X104,X105,X106,X108,X109,X110,X111,X112,X113,X114,X115,X116,X117,X118,X119,X12,X120,X122,X123,X124,X125,X126,X127,X128,X129,X13,X130,X131,X132,X133,X134,X135,X136,X137,X138,X139,X14,X140,X141,X142,X143,X144,X145,X146,X147,X148,X15,X150,X151,X152,X153,X154,X155,X156,X157,X158,X159,X16,X160,X161,X162,X163,X164,X165,X166,X167,X168,X169,X17,X170,X171,X172,X173,X174,X175,X176,X177,X178,X179,X18,X180,X181,X182,X183,X184,X185,X186,X187,X189,X19,X190,X191,X192,X194,X195,X196,X197,X198,X199,X1_a,X1_aa,X1_ab,X1_b,X1_c,X1_d,X1_e,X1_f,X1_g,X1_h,X1_i,X1_j,X1_k,X1_l,X1_m,X1_n,X1_o,X1_p,X1_q,X1_r,X1_s,X1_t,X1_u,X1_v,X1_w,X1_y,X1_z,X20,X200,X201,X202,X203,...,X324,X325,X326,X327,X328,X329,X33,X331,X332,X333,X334,X335,X336,X337,X338,X339,X34,X340,X341,X342,X343,X344,X345,X346,X348,X349,X35,X350,X351,X352,X353,X354,X355,X356,X357,X358,X359,X36,X360,X361,X362,X363,X364,X365,X366,X367,X368,X369,X37,X370,X371,X372,X373,X374,X375,X376,X377,X378,X379,X38,X380,X382,X383,X384,X385,X39,X3_a,X3_b,X3_c,X3_d,X3_e,X3_f,X3_g,X40,X41,X42,X43,X44,X45,X46,X47,X48,X49,X50,X51,X52,X53,X54,X55,X56,X57,X58,X59,X5_a,X5_aa,X5_ab,X5_ac,X5_ad,X5_ae,X5_af,X5_ag,X5_ah,X5_b,X5_c,X5_d,X5_f,X5_g,X5_h,X5_i,X5_j,X5_k,X5_l,X5_m,X5_n,X5_o,X5_p,X5_q,X5_r,X5_s,X5_t,X5_u,X5_v,X5_w,X5_x,X5_y,X5_z,X60,X61,X62,X63,X64,X65,X66,X67,X68,X69,X70,X71,X73,X74,X75,X76,X77,X78,X79,X80,X81,X82,X83,X84,X85,X86,X87,X88,X89,X8_a,X8_b,X8_c,X8_d,X8_e,X8_f,X8_g,X8_h,X8_i,X8_j,X8_k,X8_l,X8_m,X8_n,X8_o,X8_p,X8_q,X8_r,X8_s,X8_t,X8_u,X8_v,X8_w,X8_x,X8_y,X90,X91,X92,X94,X95,X96,X97,X98,X99
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,1,1,0,1,0,0,0,0,0,0,1,0,1,0,1,0,0,0,0,1,1,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,...,1,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,1,0,0,0,0,1,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,...,1,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,1,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,1,1,0,1,1,1,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,1,1,0,1,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0


In [76]:
parameters = {'alpha': [0], 
              'colsample_bytree': [0.7], 
              'gamma': [0], 
              'lambda': [10], 
              'learning_rate': [0.05], 
              'max_depth': [4], 
              'min_child_weight': [2], 
              'objective': ['reg:squarederror'], 
              'sampling_method': ['uniform'], 
              'subsample': [1], 
              'tree_method': ['auto']}

xgb_grid = GridSearchCV(model,
                        parameters,
                        cv = 5,
                        n_jobs = 5,
                        verbose=3,
                        scoring="r2")
start = time.time()
xgb_grid.fit(X_oheDropCatNumTrain,
         y_oheTrainFull)
end = time.time()
elapsedTime = end - start
print(f"elapsed: {elapsedTime}")

Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Done   2 out of   5 | elapsed:    6.5s remaining:    9.9s
[Parallel(n_jobs=5)]: Done   5 out of   5 | elapsed:    6.7s finished


elapsed: 7.912122488021851


In [77]:
print(xgb_grid.best_score_)
print(xgb_grid.best_params_)

0.568111120703948
{'alpha': 0, 'colsample_bytree': 0.7, 'gamma': 0, 'lambda': 10, 'learning_rate': 0.05, 'max_depth': 4, 'min_child_weight': 2, 'objective': 'reg:squarederror', 'sampling_method': 'uniform', 'subsample': 1, 'tree_method': 'auto'}


#### Interpretation

Dropping the columns with no variance worsened the R2 score marginally, from 0.56901 to 0.56811.

It's unclear why this effect occurs.

## Dimensionality reduction after removing columns with low variance

In [67]:
perform PCA on dataset where X4 and X6 are removed.
pca99afterDrop = PCA(0.99)
pca99afterDrop.fit(X_oheDropCatTrain)#pca using top 99%

parameters = {'alpha': [0], 
              'colsample_bytree': [0.7], 
              'gamma': [0], 
              'lambda': [10], 
              'learning_rate': [0.05], 
              'max_depth': [4], 
              'min_child_weight': [2], 
              'objective': ['reg:squarederror'], 
              'sampling_method': ['uniform'], 
              'subsample': [1], 
              'tree_method': ['auto']}

X_pca_oheDropCatTrain = pca99afterDrop.transform(X_oheDropCatTrain)

xgb_grid = GridSearchCV(model,
                        parameters,
                        cv = 5,
                        n_jobs = 5,
                        verbose=3,
                        scoring="r2")
start = time.time()
xgb_grid.fit(X_pca_oheDropCatTrain,
         y_oheTrainFull)
end = time.time()
elapsedTime = end - start
print(f"elapsed: {elapsedTime}")

Fitting 5 folds for each of 1 candidates, totalling 5 fits


[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Done   2 out of   5 | elapsed:   16.4s remaining:   24.7s
[Parallel(n_jobs=5)]: Done   5 out of   5 | elapsed:   16.6s finished


elapsed: 20.916150331497192


In [68]:
print(xgb_grid.best_score_)
print(xgb_grid.best_params_)

0.4933094379712989
{'alpha': 0, 'colsample_bytree': 0.7, 'gamma': 0, 'lambda': 10, 'learning_rate': 0.05, 'max_depth': 4, 'min_child_weight': 2, 'objective': 'reg:squarederror', 'sampling_method': 'uniform', 'subsample': 1, 'tree_method': 'auto'}


#### Interpretation

Applying PCA to the dataset after dropping columns with low variance still results in a worse R2 score, from 0.56901 to 0.49331. 

Dimensionality reduction does not help in this case. 

Further, training time increases from 9.8 seconds to 20.9 seconds after dimensionality reduction due to the conversion of integer columns to float.

In [None]:
## Hyperparameter tuning on best set of columns

In [78]:
parameters = {'alpha': [0, 1], 
              'colsample_bytree': [0.7], 
              'gamma': [0], 
              'lambda': [10, 100],
              'learning_rate': [0.01, 0.03, 0.05], #so called `eta` value
              'max_depth': [4, 5, 6],
              'min_child_weight': [1, 2, 4],
              'objective': ['reg:squarederror'], 
              'sampling_method': ['uniform'], 
              'subsample': [0.7, 1],
              'tree_method': ['auto']}

xgb_grid = GridSearchCV(model,
                        parameters,
                        cv = 5,
                        n_jobs = 5,
                        verbose=3,
                        scoring="r2")
start = time.time()
xgb_grid.fit(X_oheDropCatTrain,
         y_oheTrainFull)
end = time.time()
elapsedTime = end - start
print(f"elapsed: {elapsedTime}")

Fitting 5 folds for each of 216 candidates, totalling 1080 fits


[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Done  22 tasks      | elapsed:   17.0s
[Parallel(n_jobs=5)]: Done 118 tasks      | elapsed:  1.5min
[Parallel(n_jobs=5)]: Done 278 tasks      | elapsed:  4.5min
[Parallel(n_jobs=5)]: Done 502 tasks      | elapsed:  7.7min
[Parallel(n_jobs=5)]: Done 790 tasks      | elapsed: 13.0min
[Parallel(n_jobs=5)]: Done 1080 out of 1080 | elapsed: 17.4min finished


elapsed: 1046.2065477371216


In [80]:
print(xgb_grid.best_score_)
print(xgb_grid.best_params_)

0.5692304085570111
{'alpha': 1, 'colsample_bytree': 0.7, 'gamma': 0, 'lambda': 10, 'learning_rate': 0.05, 'max_depth': 4, 'min_child_weight': 2, 'objective': 'reg:squarederror', 'sampling_method': 'uniform', 'subsample': 1, 'tree_method': 'auto'}


In [107]:
parameters = {'alpha': [0, 1], 
              'colsample_bytree': [0.7], 
              'gamma': [0], 
              'lambda': [10, 100],
              'learning_rate': [0.01, 0.03, 0.05], #so called `eta` value
              'max_depth': [4, 6, 8, 10],
              'min_child_weight': [1, 2, 4, 6],
              'objective': ['reg:squarederror'], 
              'sampling_method': ['uniform'], 
              'subsample': [0.7, 1],
              'tree_method': ['auto']}

xgb_grid = GridSearchCV(model,
                        parameters,
                        cv = 5,
                        n_jobs = 5,
                        verbose=3,
                        scoring="r2")
start = time.time()
xgb_grid.fit(X_oheDropCatTrain,
         y_oheTrainFull)
end = time.time()
elapsedTime = end - start
print(f"elapsed: {elapsedTime}")

Fitting 5 folds for each of 384 candidates, totalling 1920 fits


[Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
[Parallel(n_jobs=5)]: Done  22 tasks      | elapsed:   25.7s
[Parallel(n_jobs=5)]: Done 118 tasks      | elapsed:  2.1min
[Parallel(n_jobs=5)]: Done 278 tasks      | elapsed:  6.1min
[Parallel(n_jobs=5)]: Done 502 tasks      | elapsed: 12.3min
[Parallel(n_jobs=5)]: Done 790 tasks      | elapsed: 16.1min
[Parallel(n_jobs=5)]: Done 1142 tasks      | elapsed: 22.7min
[Parallel(n_jobs=5)]: Done 1558 tasks      | elapsed: 31.7min
[Parallel(n_jobs=5)]: Done 1920 out of 1920 | elapsed: 39.2min finished


elapsed: 2357.072856426239


In [108]:
print(xgb_grid.best_score_)
print(xgb_grid.best_params_)

0.5693947552091031
{'alpha': 0, 'colsample_bytree': 0.7, 'gamma': 0, 'lambda': 10, 'learning_rate': 0.05, 'max_depth': 4, 'min_child_weight': 6, 'objective': 'reg:squarederror', 'sampling_method': 'uniform', 'subsample': 1, 'tree_method': 'auto'}


#### Interpretation

Hyperparameter optimisation improves the R^2 from 0.56891 to 0.56939

Estimated Kaggle rank: 599/3824

Best parameters: {'alpha': 1, 'colsample_bytree': 0.7, 'gamma': 0, 'lambda': 10, 'learning_rate': 0.05, 'max_depth': 4, 'min_child_weight': 2, 'objective': 'reg:squarederror', 'sampling_method': 'uniform', 'subsample': 1, 'tree_method': 'auto'}

A more extensive search of parameters could possible produce better scores. Time constrains currently prevent a wider search.

In [114]:
#Using the set of best parameters, we then use grid search to find other scoring metrics.
# (An alternative would have been to due a train/test split, then use the train split with gridSearchCV, then evaluate metrics with the test split.)
parameters = {'alpha': [1], 
              'colsample_bytree': [0.7], 
              'gamma': [0], 
              'lambda': [10],
              'learning_rate': [0.05], #so called `eta` value
              'max_depth': [4],
              'min_child_weight': [2],
              'objective': ['reg:squarederror'], 
              'sampling_method': ['uniform'], 
              'subsample': [1],
              'tree_method': ['auto']}

xgb_grid = GridSearchCV(model, parameters, cv = 5, scoring="neg_mean_absolute_error")
xgb_grid.fit(X_oheDropCatTrain, y_oheTrainFull)
print(f"Mean Abs Error\t\tMAE: {-xgb_grid.best_score_}")
xgb_grid = GridSearchCV(model, parameters, cv = 5, scoring="neg_mean_squared_error")
xgb_grid.fit(X_oheDropCatTrain, y_oheTrainFull)
print(f"Mean Squared Error \tMSE: {-xgb_grid.best_score_}")
xgb_grid = GridSearchCV(model, parameters, cv = 5, scoring="neg_root_mean_squared_error")
xgb_grid.fit(X_oheDropCatTrain, y_oheTrainFull)
print(f"Root Mean Squared Error\tRMSE: {-xgb_grid.best_score_}")
xgb_grid = GridSearchCV(model, parameters, cv = 5, scoring="r2")
xgb_grid.fit(X_oheDropCatTrain, y_oheTrainFull)
print(f"R2-score \t\t\t: {xgb_grid.best_score_}")

Mean Abs Error		MAE: 5.17094996613067
Mean Squared Error 	MSE: 70.65814899443838
Root Mean Squared Error	RMSE: 8.29879420320797
R2-score 			: 0.5692304085570111


#### Interpretation

Other scoring metrics which could be used have results with the best parameters as follows:

- Mean Abs Error		MAE: 5.1709

- Mean Squared Error 	MSE: 70.658

- Root Mean Squared Error	RMSE: 8.2988    

(Best parameters were determined with the R2 score)

## Produce test set predictions

In [112]:

# make predictions for test data
testID = oheTestFull['ID']
X_oheDropCatTrainAlign, oheDropCatTestAlign = X_oheDropCatTrain.align(oheTestFull, join='left', fill_value=0, axis=1) 

y_pred_test = xgb_grid.predict(oheDropCatTestAlign)
#print(f"testID: {testID}")
#print(f"y_pred: {y_pred_test}")
testOutput = testID.to_frame()
testOutput['y']=y_pred_test
#testOutput = pd.concat([testID,pd.Series(y_pred_test)], axis=1)
testOutput.to_csv(path_or_buf=r'.\testOutput.csv', header=True, index=False)
testOutput.head()

Unnamed: 0,ID,y
0,1,78.710793
1,2,93.601509
2,3,78.790527
3,4,78.522705
4,5,111.498039


#### Interpretation

After removing categorical and numerical columns with low variance, performing dimensionality reduction and hyperparameter tuning with 5-fold cross validation, the validation score on the training data, using the R2 scoring metric is 0.56939.

The test file was processed and predictions produced. These predictions were submitted to the original Kaggle competition from which the dataset was obtained. The test file predictions produced a R2 score of 0.54941, which ranks 1238/3824.

Further steps which could improve the score:

- checking the training set for outliers and removing them. This can be done using 3 * standard deviation as a threshold.

- Performing further variance tests, such as the ANOVA F-value (sklearn.feature_selection.f_regression), and dropping those features which do not have sufficient variance.

- Performing alternate dimensionality reduction, such as t-distributed stochastic neighbor embedding (t-SNE).

- Performing a more extensive grid search on XGBoost regression.

- Testing other regression machine learning techniques, such as Multi-layer perceptrons.

Due to time constraints, these alternatives have not been implemented and tested yet.