This worksheet is to take the basic random forest benchmark work from Part 1 and add break out some additional featues to see if they improve the score.  

# Data Preprocessing

First I've imported all of fastai which includes pandas and numpy. Then I'll import some of the other functionality I'll need. Next I uploaded the train and test csv files as dataframes and then show their heads just to see what the columns contain. 

In [1]:
from fastai.imports import *

from fastai.tabular.all import *
from sklearn.ensemble import RandomForestRegressor

In [2]:
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
from sklearn.metrics import log_loss
from sklearn.metrics import accuracy_score

import xgboost as xgb
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold


In [3]:
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')

In [4]:
train.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported
0,0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,Maham Ofracculy,False
1,0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,Juanna Vines,True
2,0003_01,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,6715.0,49.0,Altark Susent,False
3,0003_02,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,3329.0,193.0,Solam Susent,False
4,0004_01,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,565.0,2.0,Willy Santantines,True


In [5]:
test.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name
0,0013_01,Earth,True,G/3/S,TRAPPIST-1e,27.0,False,0.0,0.0,0.0,0.0,0.0,Nelly Carsoning
1,0018_01,Earth,False,F/4/S,TRAPPIST-1e,19.0,False,0.0,9.0,0.0,2823.0,0.0,Lerome Peckers
2,0019_01,Europa,True,C/0/S,55 Cancri e,31.0,False,0.0,0.0,0.0,0.0,0.0,Sabih Unhearfus
3,0021_01,Europa,False,C/1/S,TRAPPIST-1e,38.0,False,0.0,6652.0,0.0,181.0,585.0,Meratz Caltilter
4,0023_01,Earth,False,F/5/S,TRAPPIST-1e,20.0,False,10.0,0.0,635.0,0.0,0.0,Brence Harperez


It's interesting that most of the columns have between 179 and 217 missing data fields. Almost all the columns have missing data and the range is very tight. However out of 8693 rows this missing data amounts to only about 2% of the total rows. But the random forest model will need to multiply a coefficient by each value. This can't be done for the missing values. So I'll need to replace the missing values with a number. The easiest way to do this is to replace them with the mode of a column. The mode is the number that occurs most often in a set of values.

In [6]:
train.isna().sum()

PassengerId       0
HomePlanet      201
CryoSleep       217
Cabin           199
Destination     182
Age             179
VIP             203
RoomService     181
FoodCourt       183
ShoppingMall    208
Spa             183
VRDeck          188
Name            200
Transported       0
dtype: int64

In [7]:
modes = train.mode().iloc[0]
modes

PassengerId                0001_01
HomePlanet                   Earth
CryoSleep                    False
Cabin                      G/734/S
Destination            TRAPPIST-1e
Age                           24.0
VIP                          False
RoomService                    0.0
FoodCourt                      0.0
ShoppingMall                   0.0
Spa                            0.0
VRDeck                         0.0
Name            Alraium Disivering
Transported                   True
Name: 0, dtype: object

Below the function will fill all the NaaNs with the modes identified above. Inplace=True simply means that this will be done in the current dataframe and won't need to create a new dataframe.

In [8]:
train.fillna(modes, inplace=True)

In [9]:
train.isna().sum()

PassengerId     0
HomePlanet      0
CryoSleep       0
Cabin           0
Destination     0
Age             0
VIP             0
RoomService     0
FoodCourt       0
ShoppingMall    0
Spa             0
VRDeck          0
Name            0
Transported     0
dtype: int64

I'll need to do the same thing with the test dataframe. 

In [10]:
test.isna().sum()

PassengerId       0
HomePlanet       87
CryoSleep        93
Cabin           100
Destination      92
Age              91
VIP              93
RoomService      82
FoodCourt       106
ShoppingMall     98
Spa             101
VRDeck           80
Name             94
dtype: int64

In [11]:
modes = test.mode().iloc[0]
modes

PassengerId              0013_01
HomePlanet                 Earth
CryoSleep                  False
Cabin                    G/160/P
Destination          TRAPPIST-1e
Age                         18.0
VIP                        False
RoomService                  0.0
FoodCourt                    0.0
ShoppingMall                 0.0
Spa                          0.0
VRDeck                       0.0
Name            Berta Barnolderg
Name: 0, dtype: object

In [12]:
test.fillna(modes, inplace=True)

In [13]:
test.isna().sum()

PassengerId     0
HomePlanet      0
CryoSleep       0
Cabin           0
Destination     0
Age             0
VIP             0
RoomService     0
FoodCourt       0
ShoppingMall    0
Spa             0
VRDeck          0
Name            0
dtype: int64

I want to add new columns for the group and size subcomponents in PassengerId, two new columns to separate out the deck and the side of the Cabin, a new columnn for last name, and a new column to sum up all the spending for the RoomService, FoodCourt, ShoppingMall, Spa and VRDeck columns.

In [14]:
train['Group'] = train['PassengerId'].str[0:4]
train['GroupSize'] = train.groupby('Group')['Group'].transform(len)
train[['Deck', 'Number', 'Side']] = train['Cabin'].str.split('/', expand=True)
splitted = train['Name'].str.split()
train['LastName'] = splitted.str[-1]
train['Spend'] = train['RoomService'] + train['FoodCourt'] + train['ShoppingMall'] + train['Spa'] + train['VRDeck']

In [15]:
train.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,...,VRDeck,Name,Transported,Group,GroupSize,Deck,Number,Side,LastName,Spend
0,0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,...,0.0,Maham Ofracculy,False,1,1,B,0,P,Ofracculy,0.0
1,0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,...,44.0,Juanna Vines,True,2,1,F,0,S,Vines,736.0
2,0003_01,Europa,False,A/0/S,TRAPPIST-1e,58.0,True,43.0,3576.0,0.0,...,49.0,Altark Susent,False,3,2,A,0,S,Susent,10383.0
3,0003_02,Europa,False,A/0/S,TRAPPIST-1e,33.0,False,0.0,1283.0,371.0,...,193.0,Solam Susent,False,3,2,A,0,S,Susent,5176.0
4,0004_01,Earth,False,F/1/S,TRAPPIST-1e,16.0,False,303.0,70.0,151.0,...,2.0,Willy Santantines,True,4,1,F,1,S,Santantines,1091.0


In [16]:
train.describe().round().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Age,8693.0,29.0,14.0,0.0,20.0,27.0,37.0,79.0
RoomService,8693.0,220.0,661.0,0.0,0.0,0.0,41.0,14327.0
FoodCourt,8693.0,448.0,1596.0,0.0,0.0,0.0,61.0,29813.0
ShoppingMall,8693.0,170.0,598.0,0.0,0.0,0.0,22.0,23492.0
Spa,8693.0,305.0,1126.0,0.0,0.0,0.0,53.0,22408.0
VRDeck,8693.0,298.0,1134.0,0.0,0.0,0.0,40.0,24133.0
GroupSize,8693.0,2.0,2.0,1.0,1.0,1.0,3.0,8.0
Spend,8693.0,1441.0,2803.0,0.0,0.0,716.0,1441.0,35987.0


In [17]:
train.describe(include=object).round().T

Unnamed: 0,count,unique,top,freq
PassengerId,8693,8693,0001_01,1
HomePlanet,8693,3,Earth,4803
Cabin,8693,6560,G/734/S,207
Destination,8693,3,TRAPPIST-1e,6097
Name,8693,8473,Alraium Disivering,202
Group,8693,6217,4498,8
Deck,8693,8,F,2794
Number,8693,1817,734,208
Side,8693,2,S,4487
LastName,8693,2217,Disivering,207


I also want to do the same things on the test dataset

In [18]:
test['Group'] = test['PassengerId'].str[0:4]
test['GroupSize'] = test.groupby('Group')['Group'].transform(len)
test[['Deck', 'Number', 'Side']] = test['Cabin'].str.split('/', expand=True)
splitted = test['Name'].str.split()
test['LastName'] = splitted.str[-1]
test['Spend'] = test['RoomService'] + test['FoodCourt'] + test['ShoppingMall'] + test['Spa'] + test['VRDeck']

In [19]:
test.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Group,GroupSize,Deck,Number,Side,LastName,Spend
0,0013_01,Earth,True,G/3/S,TRAPPIST-1e,27.0,False,0.0,0.0,0.0,0.0,0.0,Nelly Carsoning,13,1,G,3,S,Carsoning,0.0
1,0018_01,Earth,False,F/4/S,TRAPPIST-1e,19.0,False,0.0,9.0,0.0,2823.0,0.0,Lerome Peckers,18,1,F,4,S,Peckers,2832.0
2,0019_01,Europa,True,C/0/S,55 Cancri e,31.0,False,0.0,0.0,0.0,0.0,0.0,Sabih Unhearfus,19,1,C,0,S,Unhearfus,0.0
3,0021_01,Europa,False,C/1/S,TRAPPIST-1e,38.0,False,0.0,6652.0,0.0,181.0,585.0,Meratz Caltilter,21,1,C,1,S,Caltilter,7418.0
4,0023_01,Earth,False,F/5/S,TRAPPIST-1e,20.0,False,10.0,0.0,635.0,0.0,0.0,Brence Harperez,23,1,F,5,S,Harperez,645.0


To start looking at survival transport rates by various features. 

I want to do the easiest separation of continuous and categorical variables possible, so I'm using cont_cat_split. 

In [20]:
cont,cat = cont_cat_split(train)

We can see that now the continuous and categorical columns are identified. 

In [21]:
cont

['Age', 'RoomService', 'FoodCourt', 'ShoppingMall', 'Spa', 'VRDeck', 'Spend']

In [22]:
cat

['PassengerId',
 'HomePlanet',
 'CryoSleep',
 'Cabin',
 'Destination',
 'VIP',
 'Name',
 'Transported',
 'Group',
 'GroupSize',
 'Deck',
 'Number',
 'Side',
 'LastName']

In [23]:
train[cat]

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,VIP,Name,Transported,Group,GroupSize,Deck,Number,Side,LastName
0,0001_01,Europa,False,B/0/P,TRAPPIST-1e,False,Maham Ofracculy,False,0001,1,B,0,P,Ofracculy
1,0002_01,Earth,False,F/0/S,TRAPPIST-1e,False,Juanna Vines,True,0002,1,F,0,S,Vines
2,0003_01,Europa,False,A/0/S,TRAPPIST-1e,True,Altark Susent,False,0003,2,A,0,S,Susent
3,0003_02,Europa,False,A/0/S,TRAPPIST-1e,False,Solam Susent,False,0003,2,A,0,S,Susent
4,0004_01,Earth,False,F/1/S,TRAPPIST-1e,False,Willy Santantines,True,0004,1,F,1,S,Santantines
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8688,9276_01,Europa,False,A/98/P,55 Cancri e,True,Gravior Noxnuther,False,9276,1,A,98,P,Noxnuther
8689,9278_01,Earth,True,G/1499/S,PSO J318.5-22,False,Kurta Mondalley,False,9278,1,G,1499,S,Mondalley
8690,9279_01,Earth,False,G/1500/S,TRAPPIST-1e,False,Fayey Connon,True,9279,1,G,1500,S,Connon
8691,9280_01,Europa,False,E/608/S,55 Cancri e,False,Celeon Hontichre,False,9280,2,E,608,S,Hontichre


Then I move through each of the categorical items and transform it into numbers using pd.Categorical. 

In [24]:
for i in cat:
    train[i] = pd.Categorical(train[i])

Each item appears to be still a text item. 

In [25]:
train[cat].head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,VIP,Name,Transported,Group,GroupSize,Deck,Number,Side,LastName
0,0001_01,Europa,False,B/0/P,TRAPPIST-1e,False,Maham Ofracculy,False,1,1,B,0,P,Ofracculy
1,0002_01,Earth,False,F/0/S,TRAPPIST-1e,False,Juanna Vines,True,2,1,F,0,S,Vines
2,0003_01,Europa,False,A/0/S,TRAPPIST-1e,True,Altark Susent,False,3,2,A,0,S,Susent
3,0003_02,Europa,False,A/0/S,TRAPPIST-1e,False,Solam Susent,False,3,2,A,0,S,Susent
4,0004_01,Earth,False,F/1/S,TRAPPIST-1e,False,Willy Santantines,True,4,1,F,1,S,Santantines


But behind the scenes python had transformed them to numbers, as you can see with Cabin and HomePlanet. The number is just an index for looking up the value in the list of all unique values. From the describe function above there are 6560 unique values for Cabin and only 3 unique values for HomePlanet. That's why some of the values are so high for Cabin. 

In [26]:
train.Cabin.cat.codes.head()

0     149
1    2184
2       1
3       1
4    2186
dtype: int16

In [27]:
train.HomePlanet.cat.codes.head()

0    1
1    0
2    1
3    1
4    0
dtype: int8

Since I will also need the dependent variable of Transported to be a number, I want to confirm that is the case. 

In [28]:
train.Transported.cat.codes.head()

0    0
1    1
2    0
3    0
4    1
dtype: int8

In [29]:
cat

['PassengerId',
 'HomePlanet',
 'CryoSleep',
 'Cabin',
 'Destination',
 'VIP',
 'Name',
 'Transported',
 'Group',
 'GroupSize',
 'Deck',
 'Number',
 'Side',
 'LastName']

And now I need to also convert the categories in the test dataset to numbers. First I'm creating the test_passid dataframe to ensure the real passenger IDs are linked to these numbers, since I will need to convert these back to categorical data for my Kaggle submission.

In [30]:
test_passid = pd.DataFrame(test['PassengerId'])

In [31]:
test_passid.head()

Unnamed: 0,PassengerId
0,0013_01
1,0018_01
2,0019_01
3,0021_01
4,0023_01


Next I break out the categorical data as test_cat and the continuous data as test_cont. Then I convert each column in test_cat into continuous data, since the ML model can only work with numbers. 

In [32]:
test_cont,test_cat = cont_cat_split(test)
for i in test_cat:
    test[i] = pd.Categorical(test[i])

Again, the categorical data appears to still be categorical, but behind the scenes they are encoded into categorical data. 

In [33]:
test[test_cat].head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,VIP,Name,Group,GroupSize,Deck,Number,Side,LastName
0,0013_01,Earth,True,G/3/S,TRAPPIST-1e,False,Nelly Carsoning,13,1,G,3,S,Carsoning
1,0018_01,Earth,False,F/4/S,TRAPPIST-1e,False,Lerome Peckers,18,1,F,4,S,Peckers
2,0019_01,Europa,True,C/0/S,55 Cancri e,False,Sabih Unhearfus,19,1,C,0,S,Unhearfus
3,0021_01,Europa,False,C/1/S,TRAPPIST-1e,False,Meratz Caltilter,21,1,C,1,S,Caltilter
4,0023_01,Earth,False,F/5/S,TRAPPIST-1e,False,Brence Harperez,23,1,F,5,S,Harperez


In [34]:
test.HomePlanet.cat.codes.head()

0    0
1    0
2    1
3    1
4    0
dtype: int8

In [35]:
test.PassengerId.cat.codes.head()

0    0
1    1
2    2
3    3
4    4
dtype: int16

I created a PassIDCode column to house the actual numbers so that I can match them back to the categorical data when I create the submission file for the Kaggle contest. 

In [36]:
test_passid['PassIdCode'] = test.PassengerId.cat.codes

In [37]:
test_passid.head()

Unnamed: 0,PassengerId,PassIdCode
0,0013_01,0
1,0018_01,1
2,0019_01,2
3,0021_01,3
4,0023_01,4


In [38]:
train[cat] = train[cat].apply(lambda x: x.cat.codes)
test[test_cat] = test[test_cat].apply(lambda x: x.cat.codes)

In [40]:
train.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,...,VRDeck,Name,Transported,Group,GroupSize,Deck,Number,Side,LastName,Spend
0,0,1,0,149,2,39.0,0,0.0,0.0,0.0,...,0.0,5252,0,0,0,1,0,0,1431,0.0
1,1,0,0,2184,2,24.0,0,109.0,9.0,25.0,...,44.0,4502,1,1,0,5,0,1,2109,736.0
2,2,1,0,1,2,58.0,1,43.0,3576.0,0.0,...,49.0,457,0,2,1,0,0,1,1990,10383.0
3,3,1,0,1,2,33.0,0,0.0,1283.0,371.0,...,193.0,7149,0,2,1,0,0,1,1990,5176.0
4,4,0,0,2186,2,16.0,0,303.0,70.0,151.0,...,2.0,8319,1,3,0,5,1,1,1778,1091.0


In [41]:
test.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Group,GroupSize,Deck,Number,Side,LastName,Spend
0,0,0,1,2784,2,27.0,0,0.0,0.0,0.0,0.0,0.0,2912,0,0,6,820,1,275,0.0
1,1,0,0,1867,2,19.0,0,0.0,9.0,0.0,2823.0,0.0,2406,1,0,5,927,1,1190,2832.0
2,2,1,1,257,0,31.0,0,0.0,0.0,0.0,0.0,0.0,3376,2,0,2,0,1,1604,0.0
3,3,1,0,259,2,38.0,0,0.0,6652.0,0.0,181.0,585.0,2711,3,0,2,1,1,262,7418.0
4,4,0,0,1940,2,20.0,0,10.0,0.0,635.0,0.0,0.0,668,4,0,5,1029,1,736,645.0


# Utilizing the Model

In [42]:
def get_score(model,X,y):
    n = cross_val_score(model,X,y,scoring ='accuracy',cv=20)
    return n

In [43]:
params_XGB_best ={'lambda': 3.0610042624477543, 
             'alpha': 4.581902571574289, 
             'colsample_bytree': 0.9241969052729379, 
             'subsample': 0.9527591724824661, 
             'learning_rate': 0.06672065863100594, 
             'n_estimators': 725, #initial value is 651
             'max_depth': 5, 
             'min_child_weight': 1, 
             'num_parallel_tree': 1}

In [44]:
def t_fold(X,y,n_splits): 
    params= {'lambda': 3.0610042624477543, 
             'alpha': 4.581902571574289, 
             'colsample_bytree': 0.9241969052729379, 
             'subsample': 0.9527591724824661, 
             'learning_rate': 0.06672065863100594, 
             'n_estimators': 7250, #initial value is 725
             'max_depth': 5, 
             'min_child_weight': 1, 
             'num_parallel_tree': 1,
             'early_stopping_rounds':200,}
    results=[]
    n_iterations=[]
    skf = StratifiedKFold(n_splits=n_splits)
    for train_index, test_index in skf.split(X, y):
        train_X, valid_X = X.iloc[train_index], X.iloc[test_index]
        train_y, valid_y = y.iloc[train_index], y.iloc[test_index]
        model = xgb.XGBClassifier(**params).fit(train_X,train_y,
                                      eval_set=[(valid_X,valid_y)],
                                      verbose=0
                                     )  
        n_iteration = model.get_booster().best_iteration
        n_iterations.append(n_iteration)
        result = accuracy_score(valid_y,(model.predict(valid_X)))
        results.append(result)
        i=int(sum(n_iterations)/len(n_iterations))
    print("Average n_ite=" + str(i))
    print("% of scatter =" + str(np.std(n_iterations)/i))    
    n=sum(results)/len(results) 
    print (n)
    print("FIIINISH__________________________________\n")
    return n

In [45]:
X = train.drop('Transported',axis=1)
y = train.Transported

In [46]:
X,y = shuffle(X,y)
X = X.reset_index(drop=True)
y = y.reset_index(drop=True)

In [47]:
y.head()

0    0
1    0
2    1
3    0
4    1
Name: Transported, dtype: int8

In [48]:
print(get_score(xgb.XGBClassifier(**params_XGB_best),X,y).mean())

0.8044477991419037


In [55]:
test['Transported'] = (xgb.XGBClassifier(**params_XGB_best).fit(X,y)).predict(test)

In [56]:
test.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,...,VRDeck,Name,Group,GroupSize,Deck,Number,Side,LastName,Spend,Transported
0,0,0,1,2784,2,27.0,0,0.0,0.0,0.0,...,0.0,2912,0,0,6,820,1,275,0.0,1
1,1,0,0,1867,2,19.0,0,0.0,9.0,0.0,...,0.0,2406,1,0,5,927,1,1190,2832.0,0
2,2,1,1,257,0,31.0,0,0.0,0.0,0.0,...,0.0,3376,2,0,2,0,1,1604,0.0,1
3,3,1,0,259,2,38.0,0,0.0,6652.0,0.0,...,585.0,2711,3,0,2,1,1,262,7418.0,1
4,4,0,0,1940,2,20.0,0,10.0,0.0,635.0,...,0.0,668,4,0,5,1029,1,736,645.0,1


In [59]:
test['PassengerId'] = np.where(test_passid['PassIdCode'] == test['PassengerId'], test_passid['PassengerId'], 'NaN')

In [60]:
test['Transported'] = np.where(test['Transported'] == 1, 'True', 'False')

In [61]:
test.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,...,VRDeck,Name,Group,GroupSize,Deck,Number,Side,LastName,Spend,Transported
0,0013_01,0,1,2784,2,27.0,0,0.0,0.0,0.0,...,0.0,2912,0,0,6,820,1,275,0.0,True
1,0018_01,0,0,1867,2,19.0,0,0.0,9.0,0.0,...,0.0,2406,1,0,5,927,1,1190,2832.0,False
2,0019_01,1,1,257,0,31.0,0,0.0,0.0,0.0,...,0.0,3376,2,0,2,0,1,1604,0.0,True
3,0021_01,1,0,259,2,38.0,0,0.0,6652.0,0.0,...,585.0,2711,3,0,2,1,1,262,7418.0,True
4,0023_01,0,0,1940,2,20.0,0,10.0,0.0,635.0,...,0.0,668,4,0,5,1029,1,736,645.0,True


In [62]:
submit = test[['PassengerId', 'Transported']]

In [63]:
submit.head()

Unnamed: 0,PassengerId,Transported
0,0013_01,True
1,0018_01,False
2,0019_01,True
3,0021_01,True
4,0023_01,True


In [64]:
submit.to_csv('submit_xbs_mydata.csv', index=False)

Received a Kaggle score of 0.77951. I probably need to do a full grid search for the XGB best parameters. 