***
# Final Case Study
MSDS 7333 Quantifying the World  
*Allison Roderick, Jenna Ford, and Will Arnost* 
***

## Table of Contents

<a href='#Section_1'> 1. Introduction </a>  
<a href='#Section_2'> 2. Question </a>  
<a href='#Section_3'> 3. Methods </a>  
<a href='#Section_3_a'> &nbsp;&nbsp;&nbsp; a. Dataset </a>  
<a href='#Section_3_b'> &nbsp;&nbsp;&nbsp; b. Neural Network Structure </a>  
<a href='#Section_3_c'> &nbsp;&nbsp;&nbsp; c. Other Considerations </a>  
<a href='#Section_4'> 4. Modeling </a>  
<a href='#Section_5'> 5. Results </a>  
<a href='#Section_6'> 6. Conclusion </a>  
<a href='#Section_7'> 7. References </a>  
<a href='#Section_8'> 8. Code </a>  

In [4]:
%%html
<style>
  table {margin-left: 0 !important;}
</style>

<a id = 'Section_1'></a>

## 1. Introduction

This week's case study involves replicating results produced in the paper "Searching for Exotic Particles in High-Energy Physics with Deep Learning" by Baldi, Sadowski, and Whiteson [1]. The 2014 paper looks to distinguish between particle collisions that produce exotic particles and those that do not. The authors investigate the use of deep neural networks to improve accuracy over other methods. 

We will attempt to replicate that paper's neural network architecture and performance. The packages used in the paper are outdated, so we will be using Tensorflow to build our network. We hope to get as close to their AUC of 0.885 as possible.

<a id = 'Section_2'></a>

## 2. Question

<a id = 'Section_3'></a>

## 3. Methods

This section gives an overview of what we know about the data and how we prepared the dataset for modeling.

<a id = 'Section_3_a'></a>

### 3a. Dataset

<a id = 'Section_3_b'></a>

### 3b. Neural Network Stucture

<a id = 'Section_3_c'></a>

### 3c. Other Considerations

<a id = 'Section_4'></a>

## 4. Modeling

<a id = 'Section_5'></a>

## 5. Results

<a id = 'Section_6'></a>

## 6. Conclusion

<a id = 'Section_7'></a>

## 7. References

<a id = 'Section_8'></a>

## 8. Code

### Load Packages

In [1]:
import numpy as np
import pandas as pd
import pandas_profiling

### Read in the Data

In [53]:
df = pd.read_csv("../final_project.csv")

In [54]:
# there are missing values
df.isnull().values.any()

True

In [55]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 160000 entries, 0 to 159999
Data columns (total 51 columns):
x0     159974 non-null float64
x1     159975 non-null float64
x2     159962 non-null float64
x3     159963 non-null float64
x4     159974 non-null float64
x5     159963 non-null float64
x6     159974 non-null float64
x7     159973 non-null float64
x8     159979 non-null float64
x9     159970 non-null float64
x10    159957 non-null float64
x11    159970 non-null float64
x12    159964 non-null float64
x13    159969 non-null float64
x14    159966 non-null float64
x15    159965 non-null float64
x16    159974 non-null float64
x17    159973 non-null float64
x18    159960 non-null float64
x19    159965 non-null float64
x20    159962 non-null float64
x21    159971 non-null float64
x22    159973 non-null float64
x23    159953 non-null float64
x24    159972 non-null object
x25    159978 non-null float64
x26    159964 non-null float64
x27    159970 non-null float64
x28    159965 non-null

In [56]:
df.describe()

Unnamed: 0,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20,x21,x22,x23,x25,x26,x27,x28,x31,x33,x34,x35,x36,x38,x39,x40,x41,x42,x43,x44,x45,x46,x47,x48,x49,y
count,159974.0,159975.0,159962.0,159963.0,159974.0,159963.0,159974.0,159973.0,159979.0,159970.0,159957.0,159970.0,159964.0,159969.0,159966.0,159965.0,159974.0,159973.0,159960.0,159965.0,159962.0,159971.0,159973.0,159953.0,159978.0,159964.0,159970.0,159965.0,159961.0,159959.0,159959.0,159970.0,159973.0,159969.0,159977.0,159964.0,159960.0,159974.0,159963.0,159960.0,159971.0,159969.0,159963.0,159968.0,159968.0,160000.0
mean,-0.001028,0.001358,-1.150145,-0.024637,-0.000549,0.013582,-1.67067,-7.692795,-0.03054,0.005462,0.002253,0.030232,-1.334402,0.007669,0.008104,0.001215,0.006223,0.01204,0.012694,0.024555,0.299074,-0.029137,0.0084,0.722028,-0.000806,-0.001066,-0.004159,0.031543,-0.005945,-0.006567,-0.000426,0.000936,0.006453,6.05913,0.004253,-2.316526,6.701076,-1.83382,-0.002091,-0.00625,0.000885,-12.755395,0.028622,-0.000224,-0.674224,0.401231
std,0.371137,6.340632,13.27348,8.065032,6.382293,7.670076,19.298665,30.542264,8.901185,6.35504,7.871429,8.769633,14.75099,8.953837,6.964097,3.271779,4.984065,7.569351,4.540714,7.595316,5.806203,9.409635,5.41201,14.909127,1.263656,0.843258,6.774047,14.439534,2.767508,1.747762,8.01418,2.379558,1.593183,16.891603,5.134322,17.043549,18.680196,5.110705,1.534952,4.164595,0.396621,36.608641,4.788157,1.935501,15.036738,0.490149
min,-1.592635,-26.278302,-59.394048,-35.476594,-28.467536,-33.822988,-86.354483,-181.506976,-37.691045,-27.980659,-36.306571,-38.092869,-64.197967,-38.723514,-30.905214,-17.002359,-26.042983,-34.395898,-20.198686,-35.633396,-26.677396,-43.501854,-23.644193,-66.640341,-6.364653,-3.857484,-32.003555,-72.896705,-12.289364,-7.451454,-36.116606,-10.008149,-6.866024,-74.297559,-22.101647,-74.059196,-82.167224,-27.93375,-6.876234,-17.983487,-1.753221,-201.826828,-21.086333,-8.490155,-65.791191,0.0
25%,-0.251641,-4.260973,-10.166536,-5.454438,-4.313118,-5.14813,-14.780146,-27.324771,-6.031058,-4.260619,-5.288196,-5.903274,-11.379492,-6.029945,-4.696755,-2.207774,-3.344027,-5.07147,-3.056131,-5.101553,-3.607789,-6.361115,-3.649766,-9.268532,-0.852784,-0.567293,-4.597919,-9.702464,-1.874206,-1.183681,-5.401084,-1.610337,-1.068337,-5.249882,-3.458716,-13.953629,-5.80408,-5.162869,-1.039677,-2.812055,-0.266518,-36.428329,-3.216016,-1.3208,-10.931753,0.0
50%,-0.002047,0.004813,-1.340932,-0.031408,0.000857,0.014118,-1.948594,-6.956789,-0.01684,0.006045,-0.018176,0.010941,-1.624439,-0.003473,0.002467,0.003535,0.012754,0.024541,0.015904,0.044703,0.433055,-0.026385,0.011144,1.029609,-0.003723,-0.001501,0.037138,0.24421,0.002013,-0.006079,-0.013089,-0.002399,0.003645,6.18441,0.019068,-2.701867,6.84011,-1.923754,-0.004385,-0.010484,0.001645,-12.982497,0.035865,-0.011993,-0.57441,0.0
75%,0.248532,4.28422,7.871676,5.445179,4.30666,5.190749,11.446931,12.217071,5.972349,4.305734,5.331573,5.935032,8.374524,6.041959,4.701299,2.21166,3.366853,5.101962,3.073002,5.164732,4.306566,6.316457,3.672678,11.028035,0.851765,0.567406,4.649773,9.936995,1.856369,1.17946,5.411667,1.603089,1.079895,17.420148,3.463308,8.981616,19.266367,1.453507,1.033275,2.783274,0.269049,11.445443,3.268028,1.317703,9.651072,1.0
max,1.600849,27.988178,63.545653,38.906025,26.247812,35.55011,92.390605,149.150634,39.049831,27.377842,37.945583,36.360443,73.279354,42.392177,32.54634,13.782559,21.961123,37.057048,19.652986,33.51555,27.81456,46.237503,24.863012,58.4905,5.314169,3.951652,28.645074,67.753845,12.279356,7.78712,34.841428,9.892426,6.999544,90.467981,21.545591,88.824477,100.050432,22.668041,6.680922,19.069759,1.669205,150.859415,20.836854,8.226552,66.877604,1.0


In [None]:
profiling
profile = df.profile_report(title="Final Jeopardy",pool_size=4)
profile.to_file("final_jeopardy.html")

### Data Cleanup

In [58]:
# Fixing Numeric columns that imported as string
df['x32'] = df['x32'].replace('[\%,]', '', regex=True).astype(float)/100
df['x37'] = df['x37'].replace('[\$,]', '', regex=True).astype(float)
df['x29'] = df['x29'].replace('Dev', 'Dec', regex=True)
df['x29'] = df['x29'].replace('July', 'Jul', regex=True)
df['x29'] = df['x29'].replace('January', 'Jan', regex=True)
df['x29'] = df['x29'].replace('sept.', 'Sep', regex=True)

In [59]:
pd.set_option("display.max_rows", 500, "display.max_columns", None)
df.head()

Unnamed: 0,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20,x21,x22,x23,x24,x25,x26,x27,x28,x29,x30,x31,x32,x33,x34,x35,x36,x37,x38,x39,x40,x41,x42,x43,x44,x45,x46,x47,x48,x49,y
0,-0.166563,-3.961588,4.621113,2.481908,-1.800135,0.804684,6.718751,-14.789997,-1.040673,-4.20495,6.187465,13.251523,25.665413,-5.017267,10.503714,-2.517678,2.11791,5.865923,-6.666158,1.791497,-1.909114,-1.73794,-2.516715,3.553013,euorpe,-0.80134,1.14295,1.005131,-18.473784,Jul,tuesday,-3.851669,0.0,-1.940031,-5.492063,0.627121,-0.873824,1313.96,-1.353729,-5.186148,-10.6122,-1.497117,5.414063,-2.325655,1.674827,-0.264332,60.781427,-7.689696,0.151589,-8.040166,0
1,-0.149894,-0.585676,27.839856,4.152333,6.426802,-2.426943,40.477058,-6.725709,0.896421,0.330165,-11.708859,-2.352809,-25.014934,9.799608,-10.960705,1.504,-2.397836,-9.301839,-1.999413,5.045258,-5.809984,10.814319,-0.478112,10.590601,asia,0.818792,-0.642987,0.751086,3.749377,Aug,wednesday,1.391594,-0.0002,2.211462,-4.460591,1.035461,0.22827,1962.78,32.816804,-5.150012,2.147427,36.29279,4.490915,0.762561,6.526662,1.007927,15.805696,-4.896678,-0.320283,16.719974,0
2,-0.321707,-1.429819,12.251561,6.586874,-5.304647,-11.31109,17.81285,11.060572,5.32588,-2.632984,1.572647,-4.170771,12.078602,-5.158498,7.30278,-2.192431,-4.065428,-7.675055,4.041629,-6.633628,1.700321,-2.419221,2.467521,-5.270615,asia,-0.718315,-0.566757,4.171088,11.522448,Jul,wednesday,-3.262082,-0.0001,0.419607,-3.804056,-0.763357,-1.612561,430.47,-0.333199,8.728585,-0.863137,-0.368491,9.088864,-0.689886,-2.731118,0.7542,30.856417,-7.428573,-2.090804,-7.869421,0
3,-0.245594,5.076677,-24.149632,3.637307,6.505811,2.290224,-35.111751,-18.913592,-0.337041,-5.568076,-2.000255,-19.286668,10.99533,-5.914378,2.5114,1.292362,-2.496882,-15.722954,-2.735382,1.117536,1.92367,-14.179167,1.470625,-11.484431,asia,-0.05243,-0.558582,9.215569,30.595226,Jul,wednesday,-2.285241,0.0001,-3.442715,4.42016,1.164532,3.033455,-2366.29,14.188669,-6.38506,12.084421,15.691546,-7.467775,2.940789,-6.424112,0.419776,-72.424569,5.361375,1.80607,-7.670847,0
4,-0.273366,0.306326,-11.352593,1.676758,2.928441,-0.616824,-16.505817,27.532281,1.199715,-4.309105,6.66753,1.965913,-28.106348,-1.25895,5.759941,0.472584,-1.150097,-14.118709,4.527964,-1.284372,-9.026317,-7.039818,-1.978748,-15.998166,asia,-0.223449,0.350781,1.811182,-4.094084,Jul,tuesday,0.921047,0.0001,-0.43164,12.165494,-0.167726,-0.341604,-620.66,-12.578926,1.133798,30.004727,-13.911297,-5.229937,1.783928,3.957801,-0.096988,-14.085435,-0.208351,-0.894942,15.724742,1


In [60]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 160000 entries, 0 to 159999
Data columns (total 51 columns):
x0     159974 non-null float64
x1     159975 non-null float64
x2     159962 non-null float64
x3     159963 non-null float64
x4     159974 non-null float64
x5     159963 non-null float64
x6     159974 non-null float64
x7     159973 non-null float64
x8     159979 non-null float64
x9     159970 non-null float64
x10    159957 non-null float64
x11    159970 non-null float64
x12    159964 non-null float64
x13    159969 non-null float64
x14    159966 non-null float64
x15    159965 non-null float64
x16    159974 non-null float64
x17    159973 non-null float64
x18    159960 non-null float64
x19    159965 non-null float64
x20    159962 non-null float64
x21    159971 non-null float64
x22    159973 non-null float64
x23    159953 non-null float64
x24    159972 non-null object
x25    159978 non-null float64
x26    159964 non-null float64
x27    159970 non-null float64
x28    159965 non-null

### Impute Missing Values

In [61]:
# Impute missing numeric
numb = df.select_dtypes(include='number').columns
df[numb] = df[numb].fillna(df[numb].median().to_dict())

In [62]:
# show columns with missing values still (only the categorical columns left)
for i in df.columns:
    if df.loc[df[i].isna(),i].shape[0]>0:
        print(i, df.loc[df[i].isna(),i].shape)

x24 (28,)
x29 (30,)
x30 (30,)


In [64]:
# fill missing categorical columns with the word 'Missing'
cats = df.select_dtypes(include='object').columns

df[cats] = df[cats].transform(lambda x: x.fillna('Missing'))

In [65]:
# show columns with missing values still (none)
for i in df.columns:
    if df.loc[df[i].isna(),i].shape[0]>0:
        print(i, df.loc[df[i].isna(),i].shape)

### Create Model Datasets

In [66]:
X = df.copy().drop(columns=["y"])
print("The shape of X is: ", X.shape)

y = df.loc[:,"y"].copy()
print("The shape of y is: ", y.shape)

The shape of X is:  (160000, 50)
The shape of y is:  (160000,)


### Normalize and One-Hot Encode the data

In [67]:
from sklearn.preprocessing import StandardScaler
def transform_data(data):
    #OH encode
    label_encode = data.select_dtypes(include='object').columns
    normalize = data.select_dtypes(include='number').columns

    data_OHE = pd.get_dummies(data, columns=label_encode)
    
    #Standardize the variables
    scaler = StandardScaler()
    data_OHE[normalize] = scaler.fit_transform(data_OHE[normalize])
 
    return data_OHE

In [68]:
X2 = transform_data(X)
display(X2)

Unnamed: 0,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20,x21,x22,x23,x25,x26,x27,x28,x31,x32,x33,x34,x35,x36,x37,x38,x39,x40,x41,x42,x43,x44,x45,x46,x47,x48,x49,x24_Missing,x24_america,x24_asia,x24_euorpe,x29_Apr,x29_Aug,x29_Dec,x29_Feb,x29_Jan,x29_Jul,x29_Jun,x29_Mar,x29_May,x29_Missing,x29_Nov,x29_Oct,x29_Sep,x30_Missing,x30_friday,x30_monday,x30_thurday,x30_tuesday,x30_wednesday
0,-0.446058,-0.625059,0.434853,0.310829,-0.281989,0.103154,0.434754,-0.232398,-0.113491,-0.662595,0.785889,1.507768,1.830589,-0.561261,1.507268,-0.769972,0.423723,0.773434,-1.471070,0.232661,-0.380367,-0.181618,-0.466617,0.189905,-0.633551,1.356818,0.149007,-1.281721,-1.389773,0.001046,-1.106396,-0.685330,0.263178,-0.552575,1.313755,-0.438894,-1.010999,-0.486785,-0.438929,1.418299,-1.513949,0.403711,-0.668753,2.008931,-1.612152,0.078445,-0.489915,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0
1,-0.401142,-0.092590,2.184324,0.517973,1.007145,-0.318226,2.184158,0.031663,0.104146,0.051099,-1.488004,-0.271764,-1.605533,1.093712,-1.575223,0.459369,-0.482390,-1.230581,-0.443183,0.661100,-1.052294,1.152486,-0.089903,0.662008,0.648640,-0.761327,0.111501,0.257502,0.505044,-1.922310,1.269235,-0.556607,0.434798,0.139242,1.962659,1.584238,-1.003960,0.261950,1.584323,1.237653,0.498220,1.568880,2.539292,0.780253,-1.028765,-0.165378,1.156898,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,-0.864117,-0.225734,1.009787,0.819873,-0.831135,-1.476650,1.009666,0.614063,0.601806,-0.415214,0.199534,-0.479086,0.909405,-0.577035,1.047584,-0.670551,-0.817003,-1.015645,0.887405,-0.876716,0.241360,-0.254028,0.454422,-0.402011,-0.567844,-0.670917,0.616418,0.795881,-1.176707,-0.960632,0.243872,-0.474674,-0.321222,-1.016301,0.430150,-0.378472,1.699345,0.085290,-0.378503,2.137400,-0.448141,-0.654377,1.899509,1.191419,-1.557610,-1.080233,-0.478558,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1
3,-0.659021,0.800509,-1.732948,0.454106,1.019525,0.296857,-1.732962,-0.367422,-0.034436,-0.877112,-0.254436,-2.202916,0.835959,-0.661464,0.359497,0.394676,-0.502264,-2.078959,-0.605286,0.143918,0.279832,-1.503922,0.270205,-0.818853,-0.040855,-0.661222,1.361168,2.116902,-0.823695,0.962724,-1.966285,0.551668,0.489044,1.900138,-2.366972,0.481324,-1.244525,0.845053,0.481343,-1.102473,1.917474,-1.541250,1.056248,-1.630082,1.113870,0.933341,-0.465351,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1
4,-0.733857,0.048101,-0.768724,0.210985,0.458963,-0.082200,-0.768776,1.153419,0.138222,-0.678986,0.846885,0.220747,-1.815131,-0.141475,0.826018,0.144087,-0.232023,-1.867001,0.994524,-0.172353,-1.606310,-0.745123,-0.367206,-1.121648,-0.176202,0.417295,0.268009,-0.285753,0.334997,0.962724,-0.243241,1.518249,-0.070886,-0.218485,-0.621116,-1.103504,0.220015,1.896616,-1.103578,-0.664564,1.163705,0.951968,-0.246791,-0.036334,-0.049498,-0.462313,1.090704,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
159995,-1.309589,-0.673746,0.118113,-0.244019,0.275489,-0.154046,0.118027,1.597177,-0.212644,-0.374491,-0.097816,-1.435799,-1.079819,0.171087,-1.348465,0.581124,0.663369,-0.701101,-0.505468,-0.014671,-0.320516,-0.697041,1.055926,0.367743,0.125562,0.571108,0.054634,1.948343,0.938940,0.001046,1.324623,-0.780613,-0.133254,-1.483602,-0.892451,-0.861474,0.481424,-0.159377,-0.861534,1.943269,-0.568841,-0.338020,-0.919989,0.767514,0.905588,-1.979672,1.933069,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
159996,2.227142,0.757558,1.756477,1.413174,0.268962,0.909770,1.756328,-0.150145,-0.430371,1.055904,-0.823173,-0.986539,-1.528971,0.371886,-1.167562,1.231081,-0.008203,0.826907,1.585016,0.040006,0.234384,-1.708560,-0.906572,0.743386,-0.975301,-0.851959,-0.348621,-1.247645,2.527407,-0.960632,0.702462,-0.932502,0.385921,-1.090294,1.588480,0.326692,-1.446234,-0.361241,0.326702,0.523510,-0.443119,-1.210699,-0.047363,0.369774,1.328151,-0.710233,-0.063168,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1
159997,-2.159656,0.845621,0.632436,-0.926509,0.359744,-0.361141,0.632330,1.644046,0.163082,1.067517,-0.017163,-0.494201,-0.417264,-0.188937,0.228828,-0.064052,0.760486,0.535330,0.289341,-0.377945,-1.720478,-1.237651,0.093582,-0.627511,-0.751442,-0.431736,-0.918515,0.747972,0.294570,0.001046,0.056458,0.665110,-0.291879,0.098819,0.687173,-0.377298,-1.633326,1.466160,-0.377329,0.330655,0.482411,1.740438,0.540774,0.016396,0.676048,0.636174,0.266686,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1
159998,0.916897,1.200062,0.491160,-0.347323,0.634097,1.996178,0.491059,-0.083183,0.296669,0.668143,-0.951236,1.383922,1.514428,-0.519404,-1.142107,1.277822,-1.187514,-0.352399,-1.167210,2.005025,0.784011,-0.070308,1.852185,1.390488,0.148532,1.207218,-1.388047,0.859245,1.306812,-1.922310,1.406382,-0.451382,-0.479975,-0.758560,0.438891,-0.375215,-0.115790,-1.762733,-0.375246,0.241218,-2.242042,0.133567,0.519041,0.487295,0.349411,-0.144367,-0.087277,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1


In [69]:
X2.describe()

Unnamed: 0,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20,x21,x22,x23,x25,x26,x27,x28,x31,x32,x33,x34,x35,x36,x37,x38,x39,x40,x41,x42,x43,x44,x45,x46,x47,x48,x49,x24_Missing,x24_america,x24_asia,x24_euorpe,x29_Apr,x29_Aug,x29_Dec,x29_Feb,x29_Jan,x29_Jul,x29_Jun,x29_Mar,x29_May,x29_Missing,x29_Nov,x29_Oct,x29_Sep,x30_Missing,x30_friday,x30_monday,x30_thurday,x30_tuesday,x30_wednesday
count,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0,160000.0
mean,4.4955710000000003e-17,8.381663e-18,-5.851916e-17,-2.65607e-17,-2.6938170000000003e-17,-5.459522e-18,2.265896e-17,-3.236716e-17,7.825962000000001e-17,-2.4664300000000002e-17,-1.6076720000000003e-17,3.346351e-17,-4.880558e-18,2.208997e-17,1.1039780000000001e-17,5.27911e-17,4.0256170000000005e-17,6.113721e-17,-1.0447890000000001e-17,2.3956180000000003e-17,3.190642e-17,2.1448120000000002e-17,4.129856e-17,2.2805370000000003e-17,-3.947676e-17,-1.1632710000000001e-17,-1.8043900000000003e-17,-3.407136e-17,2.7146340000000004e-17,-6.913511e-16,2.200323e-17,-1.72598e-17,6.24778e-18,-3.821041e-17,2.4222290000000003e-17,1.21804e-16,1.3710910000000001e-17,-1.8046680000000002e-17,-4.4989010000000004e-17,-4.980044e-18,-3.4292710000000004e-17,-4.653257000000001e-17,2.181588e-18,1.0388910000000002e-17,4.346523e-17,1.3362920000000001e-17,-2.82524e-17,0.000175,0.027931,0.868531,0.103362,0.042256,0.183787,0.000144,0.000875,5.6e-05,0.284806,0.258306,0.007694,0.137119,0.000188,0.002106,0.015044,0.067619,0.000188,0.003525,0.00305,0.183931,0.174712,0.634594
std,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,1.000003,0.013228,0.164776,0.337913,0.304433,0.201174,0.387312,0.011989,0.029568,0.0075,0.451324,0.437705,0.087376,0.343974,0.013692,0.045846,0.121727,0.251091,0.013692,0.059267,0.055143,0.387429,0.379722,0.481545
min,-4.288823,-4.14498,-4.388522,-4.396283,-4.460683,-4.412028,-4.388434,-5.691441,-4.231245,-4.404195,-4.61337,-4.347593,-4.262139,-4.326084,-4.439441,-5.197628,-5.226939,-4.54609,-4.451716,-4.695258,-4.646718,-4.620455,-4.370773,-4.51888,-5.036419,-4.573763,-4.724281,-5.051167,-4.438996,-4.807344,-4.260227,-4.507126,-4.206687,-4.314053,-4.73553,-4.757674,-4.305838,-4.209855,-4.757965,-5.107341,-4.478939,-4.317236,-4.423039,-5.165182,-4.410353,-4.386876,-4.330973,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,-0.6751325,-0.6721857,-0.6791459,-0.6730523,-0.6755947,-0.6729232,-0.6792013,-0.6426318,-0.674078,-0.671244,-0.6719356,-0.6765309,-0.6809367,-0.6741967,-0.6754506,-0.6749848,-0.6721729,-0.6714868,-0.6757744,-0.6747496,-0.6727404,-0.6729052,-0.6759158,-0.6699772,-0.674209,-0.6713979,-0.67814,-0.6740929,-0.6749454,-0.9606321,-0.6734678,-0.6738401,-0.6769885,-0.6745658,-0.6916378,-0.6694053,-0.6743219,-0.6827701,-0.6693793,-0.651322,-0.6759257,-0.6736188,-0.6739956,-0.6466441,-0.6774577,-0.68219,-0.6821369,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,-0.002746371,0.0005447652,-0.01437191,-0.0008394334,0.0002202375,6.990824e-05,-0.01440009,0.02409599,0.00153905,9.173396e-05,-0.002595107,-0.002199643,-0.01966006,-0.001244282,-0.0008092785,0.0007092059,0.001310282,0.001651351,0.000706776,0.002652317,0.02307276,0.0002924825,0.0005069986,0.02062737,-0.002308286,-0.0005161663,0.006095692,0.01472654,0.002875236,0.001045825,0.0002796334,-0.0015799,-0.001401311,-0.001761759,-0.01413743,0.007416011,0.002885359,-0.02260671,0.007441959,-0.01759591,-0.001493971,-0.001016356,0.00191706,-0.00620292,0.001512456,-0.006079897,0.006637382,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
75%,0.672405,0.6753186,0.6795468,0.6781186,0.6747798,0.6747967,0.6796256,0.6518074,0.6743741,0.6766323,0.6768308,0.6732292,0.6580844,0.6738479,0.6737908,0.6754615,0.6741725,0.6723684,0.6738985,0.6766587,0.6900914,0.6743317,0.6770309,0.691133,0.6746406,0.674139,0.686695,0.6859042,0.6728554,0.9627237,0.6784543,0.6752713,0.6732924,0.6737657,0.663818,0.6725604,0.6736552,0.6627977,0.6725914,0.6431158,0.6744227,0.6696912,0.6760229,0.6609153,0.6763977,0.6808362,0.6866128,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
max,4.316499,4.414243,4.874662,4.827667,4.113032,4.633689,4.874392,5.135737,4.39077,4.30761,4.82105,4.143131,5.058809,4.734143,4.672795,4.212661,4.40539,4.894508,4.325929,4.409923,4.739553,4.917404,4.592894,3.87528,4.206332,4.687976,4.229672,4.690586,4.439675,4.809435,4.459823,4.348097,4.157264,4.389767,5.547334,4.997588,4.195871,5.34816,4.997875,4.79463,4.354407,4.581107,4.206725,4.469743,4.346286,4.250902,4.492914,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


### Create the Train/Test Split

In [52]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X2,y,test_size=.2, random_state=42)