# Lab: Titanic Survival Exploration with Decision Trees

## Getting Started
In the introductory project, you studied the Titanic survival data, and you were able to make predictions about passenger survival. In that project, you built a decision tree by hand, that at each stage, picked the features that were most correlated with survival. Lucky for us, this is exactly how decision trees work! In this lab, we'll do this much quicker by implementing a decision tree in sklearn.

We'll start by loading the dataset and displaying some of its rows.

Recall that these are the various features present for each passenger on the ship:
- **Pclass**: Socio-economic class (1 = Upper class; 2 = Middle class; 3 = Lower class)
- **Name**: Name of passenger
- **Sex**: Sex of the passenger
- **Age**: Age of the passenger (Some entries contain `NaN`)
- **SibSp**: Number of siblings and spouses of the passenger aboard
- **Parch**: Number of parents and children of the passenger aboard
- **Ticket**: Ticket number of the passenger
- **Fare**: Fare paid by the passenger
- **Cabin** Cabin number of the passenger (Some entries contain `NaN`)
- **Embarked**: Port of embarkation of the passenger (C = Cherbourg; Q = Queenstown; S = Southampton)  
- (Target variable) **Survived**: Outcome of survival (0 = No; 1 = Yes)  



In [1]:
# Import libraries necessary for this project
import numpy as np
import pandas as pd

In [2]:
# Load the dataset
df = pd.read_csv('titanic_data.csv')

# Print the first few entries of the Titanic data
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [3]:
# define variables(features, outcomes)
#Note: do not include Name column with features 
outcomes = df['Survived']
features_raw = df.drop(['Survived', 'Name'], axis = 1)

# Show the new dataset with 'Survived' removed
features_raw.head()

Unnamed: 0,PassengerId,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,3,male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,3,female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,female,35.0,1,0,113803,53.1,C123,S
4,5,3,male,35.0,0,0,373450,8.05,,S


In [4]:
#data exploration:
features_raw.isnull().sum()

PassengerId      0
Pclass           0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

In [5]:
#data cleaning: fill null values with zero
features_raw.fillna(0, inplace=True)

features_raw.isnull().sum()

PassengerId    0
Pclass         0
Sex            0
Age            0
SibSp          0
Parch          0
Ticket         0
Fare           0
Cabin          0
Embarked       0
dtype: int64

## Preprocessing the data


In [6]:
features_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 10 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Pclass       891 non-null    int64  
 2   Sex          891 non-null    object 
 3   Age          891 non-null    float64
 4   SibSp        891 non-null    int64  
 5   Parch        891 non-null    int64  
 6   Ticket       891 non-null    object 
 7   Fare         891 non-null    float64
 8   Cabin        891 non-null    object 
 9   Embarked     891 non-null    object 
dtypes: float64(2), int64(4), object(4)
memory usage: 69.7+ KB


In [7]:
#transformation: Perform feature scaling on the data
# first: define the standardization scaling object using StandardScaler().
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
data = ['PassengerId', 'Pclass', 'Age', 'SibSp', 'Parch', 'Fare']

# second: apply the scaler to the numerical columns on the data:
features_raw[data] = scaler.fit_transform(features_raw[data])

In [8]:
features_raw.head()

Unnamed: 0,PassengerId,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,-1.730108,0.827377,male,-0.102313,0.432793,-0.473674,A/5 21171,-0.502445,0,S
1,-1.72622,-1.566107,female,0.807492,0.432793,-0.473674,PC 17599,0.786845,C85,C
2,-1.722332,0.827377,female,0.125138,-0.474545,-0.473674,STON/O2. 3101282,-0.488854,0,S
3,-1.718444,-1.566107,female,0.636903,0.432793,-0.473674,113803,0.42073,C123,S
4,-1.714556,0.827377,male,0.636903,-0.474545,-0.473674,373450,-0.486337,0,S


we'll one-hot encode the features.

In [9]:
#dummies variables: convert catogrical columns to numerical
## perform one-hot encoding on categorical columns Using pandas.get_dummies()
features = pd.get_dummies(features_raw)

In [10]:
features.shape

(891, 841)

## Training the model

Now we're ready to train a model in sklearn. First, let's split the data into training and testing sets. Then we'll train the model on the training set.

In [11]:
#split the data to two sets. training set and testing set:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(features, outcomes, test_size=0.2, random_state=42)

In [12]:
# Import the classifier from sklearn
from sklearn.tree import DecisionTreeClassifier

# Define the classifier, and fit it to the data
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

DecisionTreeClassifier()

## Testing the model
Now, let's see how our model does, let's calculate the accuracy over both the training and the testing set.

In [13]:
# Making predictions on scaling data
predictions_train = model.predict(X_train)
predictions_test = model.predict(X_test)

from sklearn.metrics import accuracy_score

train_accuracy = accuracy_score(y_train, predictions_train)
test_accuracy = accuracy_score(y_test, predictions_test)

print('The training accuracy is', train_accuracy)
print('The test accuracy is', test_accuracy)

The training accuracy is 1.0
The test accuracy is 0.8212290502793296


# Improving the model

Ok, high training accuracy and a lower testing accuracy. We may be overfitting a bit.

So now it's your turn to shine! Train a new model, and try to specify some parameters in order to improve the testing accuracy, such as:
- `max_depth` The maximum number of levels in the tree.
- `min_samples_leaf` The minimum number of samples allowed in a leaf.
- `min_samples_split` The minimum number of samples required to split an internal node.



use Grid Search!



In [14]:
#grid search
#import gridsearch
from sklearn.metrics import f1_score
from sklearn.metrics import make_scorer
from sklearn.model_selection import GridSearchCV

#define the classifier model by DecisionTree
clf = DecisionTreeClassifier(random_state=42)

#define the parameters:
# HINT: parameters = {'parameter_1': [value1, value2], 'parameter_2': [value1, value2]}
parameters = {'max_depth':[2,4,6,8,10],'min_samples_leaf':[2,4,6,8,10], 'min_samples_split':[2,4,6,8,10]}

#define the score method using make_scorer()
scorer =  make_scorer(f1_score)

#define gridsearchcv function with cv=3 (so cross validation=3)
grid_obj = GridSearchCV(clf, parameters, scoring=scorer, verbose=3)


#fit/ train the function/ object
grid_fit = grid_obj.fit(X_train, y_train)

#get the best estimtor model
best_clf = grid_fit.best_estimator_

Fitting 5 folds for each of 125 candidates, totalling 625 fits
[CV] max_depth=2, min_samples_leaf=2, min_samples_split=2 ............
[CV]  max_depth=2, min_samples_leaf=2, min_samples_split=2, score=0.600, total=   0.0s
[CV] max_depth=2, min_samples_leaf=2, min_samples_split=2 ............
[CV]  max_depth=2, min_samples_leaf=2, min_samples_split=2, score=0.706, total=   0.0s
[CV] max_depth=2, min_samples_leaf=2, min_samples_split=2 ............
[CV]  max_depth=2, min_samples_leaf=2, min_samples_split=2, score=0.667, total=   0.0s
[CV] max_depth=2, min_samples_leaf=2, min_samples_split=2 ............
[CV]  max_depth=2, min_samples_leaf=2, min_samples_split=2, score=0.560, total=   0.0s
[CV] max_depth=2, min_samples_leaf=2, min_samples_split=2 ............
[CV]  max_depth=2, min_samples_leaf=2, min_samples_split=2, score=0.642, total=   0.0s
[CV] max_depth=2, min_samples_leaf=2, min_samples_split=4 ............


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s


[CV]  max_depth=2, min_samples_leaf=2, min_samples_split=4, score=0.600, total=   0.0s
[CV] max_depth=2, min_samples_leaf=2, min_samples_split=4 ............
[CV]  max_depth=2, min_samples_leaf=2, min_samples_split=4, score=0.706, total=   0.0s
[CV] max_depth=2, min_samples_leaf=2, min_samples_split=4 ............
[CV]  max_depth=2, min_samples_leaf=2, min_samples_split=4, score=0.667, total=   0.0s
[CV] max_depth=2, min_samples_leaf=2, min_samples_split=4 ............
[CV]  max_depth=2, min_samples_leaf=2, min_samples_split=4, score=0.560, total=   0.0s
[CV] max_depth=2, min_samples_leaf=2, min_samples_split=4 ............
[CV]  max_depth=2, min_samples_leaf=2, min_samples_split=4, score=0.642, total=   0.0s
[CV] max_depth=2, min_samples_leaf=2, min_samples_split=6 ............
[CV]  max_depth=2, min_samples_leaf=2, min_samples_split=6, score=0.600, total=   0.0s
[CV] max_depth=2, min_samples_leaf=2, min_samples_split=6 ............
[CV]  max_depth=2, min_samples_leaf=2, min_samples_s

[CV]  max_depth=2, min_samples_leaf=6, min_samples_split=6, score=0.600, total=   0.0s
[CV] max_depth=2, min_samples_leaf=6, min_samples_split=6 ............
[CV]  max_depth=2, min_samples_leaf=6, min_samples_split=6, score=0.706, total=   0.0s
[CV] max_depth=2, min_samples_leaf=6, min_samples_split=6 ............
[CV]  max_depth=2, min_samples_leaf=6, min_samples_split=6, score=0.667, total=   0.0s
[CV] max_depth=2, min_samples_leaf=6, min_samples_split=6 ............
[CV]  max_depth=2, min_samples_leaf=6, min_samples_split=6, score=0.560, total=   0.0s
[CV] max_depth=2, min_samples_leaf=6, min_samples_split=6 ............
[CV]  max_depth=2, min_samples_leaf=6, min_samples_split=6, score=0.642, total=   0.0s
[CV] max_depth=2, min_samples_leaf=6, min_samples_split=8 ............
[CV]  max_depth=2, min_samples_leaf=6, min_samples_split=8, score=0.600, total=   0.0s
[CV] max_depth=2, min_samples_leaf=6, min_samples_split=8 ............
[CV]  max_depth=2, min_samples_leaf=6, min_samples_s

[CV]  max_depth=2, min_samples_leaf=10, min_samples_split=8, score=0.600, total=   0.0s
[CV] max_depth=2, min_samples_leaf=10, min_samples_split=8 ...........
[CV]  max_depth=2, min_samples_leaf=10, min_samples_split=8, score=0.706, total=   0.0s
[CV] max_depth=2, min_samples_leaf=10, min_samples_split=8 ...........
[CV]  max_depth=2, min_samples_leaf=10, min_samples_split=8, score=0.667, total=   0.0s
[CV] max_depth=2, min_samples_leaf=10, min_samples_split=8 ...........
[CV]  max_depth=2, min_samples_leaf=10, min_samples_split=8, score=0.560, total=   0.0s
[CV] max_depth=2, min_samples_leaf=10, min_samples_split=8 ...........
[CV]  max_depth=2, min_samples_leaf=10, min_samples_split=8, score=0.642, total=   0.0s
[CV] max_depth=2, min_samples_leaf=10, min_samples_split=10 ..........
[CV]  max_depth=2, min_samples_leaf=10, min_samples_split=10, score=0.600, total=   0.0s
[CV] max_depth=2, min_samples_leaf=10, min_samples_split=10 ..........
[CV]  max_depth=2, min_samples_leaf=10, min_s

[CV]  max_depth=4, min_samples_leaf=4, min_samples_split=8, score=0.635, total=   0.0s
[CV] max_depth=4, min_samples_leaf=4, min_samples_split=10 ...........
[CV]  max_depth=4, min_samples_leaf=4, min_samples_split=10, score=0.651, total=   0.0s
[CV] max_depth=4, min_samples_leaf=4, min_samples_split=10 ...........
[CV]  max_depth=4, min_samples_leaf=4, min_samples_split=10, score=0.686, total=   0.0s
[CV] max_depth=4, min_samples_leaf=4, min_samples_split=10 ...........
[CV]  max_depth=4, min_samples_leaf=4, min_samples_split=10, score=0.621, total=   0.0s
[CV] max_depth=4, min_samples_leaf=4, min_samples_split=10 ...........
[CV]  max_depth=4, min_samples_leaf=4, min_samples_split=10, score=0.630, total=   0.0s
[CV] max_depth=4, min_samples_leaf=4, min_samples_split=10 ...........
[CV]  max_depth=4, min_samples_leaf=4, min_samples_split=10, score=0.635, total=   0.0s
[CV] max_depth=4, min_samples_leaf=6, min_samples_split=2 ............
[CV]  max_depth=4, min_samples_leaf=6, min_samp

[CV]  max_depth=4, min_samples_leaf=8, min_samples_split=10, score=0.692, total=   0.0s
[CV] max_depth=4, min_samples_leaf=8, min_samples_split=10 ...........
[CV]  max_depth=4, min_samples_leaf=8, min_samples_split=10, score=0.621, total=   0.0s
[CV] max_depth=4, min_samples_leaf=8, min_samples_split=10 ...........
[CV]  max_depth=4, min_samples_leaf=8, min_samples_split=10, score=0.558, total=   0.0s
[CV] max_depth=4, min_samples_leaf=8, min_samples_split=10 ...........
[CV]  max_depth=4, min_samples_leaf=8, min_samples_split=10, score=0.621, total=   0.0s
[CV] max_depth=4, min_samples_leaf=10, min_samples_split=2 ...........
[CV]  max_depth=4, min_samples_leaf=10, min_samples_split=2, score=0.636, total=   0.0s
[CV] max_depth=4, min_samples_leaf=10, min_samples_split=2 ...........
[CV]  max_depth=4, min_samples_leaf=10, min_samples_split=2, score=0.692, total=   0.0s
[CV] max_depth=4, min_samples_leaf=10, min_samples_split=2 ...........
[CV]  max_depth=4, min_samples_leaf=10, min_sa

[CV]  max_depth=6, min_samples_leaf=2, min_samples_split=10, score=0.667, total=   0.0s
[CV] max_depth=6, min_samples_leaf=2, min_samples_split=10 ...........
[CV]  max_depth=6, min_samples_leaf=2, min_samples_split=10, score=0.750, total=   0.0s
[CV] max_depth=6, min_samples_leaf=4, min_samples_split=2 ............
[CV]  max_depth=6, min_samples_leaf=4, min_samples_split=2, score=0.674, total=   0.0s
[CV] max_depth=6, min_samples_leaf=4, min_samples_split=2 ............
[CV]  max_depth=6, min_samples_leaf=4, min_samples_split=2, score=0.693, total=   0.0s
[CV] max_depth=6, min_samples_leaf=4, min_samples_split=2 ............
[CV]  max_depth=6, min_samples_leaf=4, min_samples_split=2, score=0.688, total=   0.0s
[CV] max_depth=6, min_samples_leaf=4, min_samples_split=2 ............
[CV]  max_depth=6, min_samples_leaf=4, min_samples_split=2, score=0.633, total=   0.0s
[CV] max_depth=6, min_samples_leaf=4, min_samples_split=2 ............
[CV]  max_depth=6, min_samples_leaf=4, min_samples

[CV]  max_depth=6, min_samples_leaf=8, min_samples_split=4, score=0.694, total=   0.0s
[CV] max_depth=6, min_samples_leaf=8, min_samples_split=4 ............
[CV]  max_depth=6, min_samples_leaf=8, min_samples_split=4, score=0.705, total=   0.1s
[CV] max_depth=6, min_samples_leaf=8, min_samples_split=4 ............
[CV]  max_depth=6, min_samples_leaf=8, min_samples_split=4, score=0.701, total=   0.0s
[CV] max_depth=6, min_samples_leaf=8, min_samples_split=4 ............
[CV]  max_depth=6, min_samples_leaf=8, min_samples_split=4, score=0.559, total=   0.0s
[CV] max_depth=6, min_samples_leaf=8, min_samples_split=4 ............
[CV]  max_depth=6, min_samples_leaf=8, min_samples_split=4, score=0.800, total=   0.0s
[CV] max_depth=6, min_samples_leaf=8, min_samples_split=6 ............
[CV]  max_depth=6, min_samples_leaf=8, min_samples_split=6, score=0.694, total=   0.0s
[CV] max_depth=6, min_samples_leaf=8, min_samples_split=6 ............
[CV]  max_depth=6, min_samples_leaf=8, min_samples_s

[CV]  max_depth=8, min_samples_leaf=2, min_samples_split=6, score=0.725, total=   0.0s
[CV] max_depth=8, min_samples_leaf=2, min_samples_split=6 ............
[CV]  max_depth=8, min_samples_leaf=2, min_samples_split=6, score=0.765, total=   0.0s
[CV] max_depth=8, min_samples_leaf=2, min_samples_split=8 ............
[CV]  max_depth=8, min_samples_leaf=2, min_samples_split=8, score=0.702, total=   0.0s
[CV] max_depth=8, min_samples_leaf=2, min_samples_split=8 ............
[CV]  max_depth=8, min_samples_leaf=2, min_samples_split=8, score=0.673, total=   0.0s
[CV] max_depth=8, min_samples_leaf=2, min_samples_split=8 ............
[CV]  max_depth=8, min_samples_leaf=2, min_samples_split=8, score=0.659, total=   0.0s
[CV] max_depth=8, min_samples_leaf=2, min_samples_split=8 ............
[CV]  max_depth=8, min_samples_leaf=2, min_samples_split=8, score=0.733, total=   0.1s
[CV] max_depth=8, min_samples_leaf=2, min_samples_split=8 ............
[CV]  max_depth=8, min_samples_leaf=2, min_samples_s

[CV]  max_depth=8, min_samples_leaf=6, min_samples_split=10, score=0.731, total=   0.1s
[CV] max_depth=8, min_samples_leaf=6, min_samples_split=10 ...........
[CV]  max_depth=8, min_samples_leaf=6, min_samples_split=10, score=0.680, total=   0.0s
[CV] max_depth=8, min_samples_leaf=6, min_samples_split=10 ...........
[CV]  max_depth=8, min_samples_leaf=6, min_samples_split=10, score=0.660, total=   0.0s
[CV] max_depth=8, min_samples_leaf=6, min_samples_split=10 ...........
[CV]  max_depth=8, min_samples_leaf=6, min_samples_split=10, score=0.747, total=   0.0s
[CV] max_depth=8, min_samples_leaf=8, min_samples_split=2 ............
[CV]  max_depth=8, min_samples_leaf=8, min_samples_split=2, score=0.680, total=   0.0s
[CV] max_depth=8, min_samples_leaf=8, min_samples_split=2 ............
[CV]  max_depth=8, min_samples_leaf=8, min_samples_split=2, score=0.705, total=   0.0s
[CV] max_depth=8, min_samples_leaf=8, min_samples_split=2 ............
[CV]  max_depth=8, min_samples_leaf=8, min_sampl

[CV]  max_depth=10, min_samples_leaf=2, min_samples_split=2, score=0.733, total=   0.1s
[CV] max_depth=10, min_samples_leaf=2, min_samples_split=2 ...........
[CV]  max_depth=10, min_samples_leaf=2, min_samples_split=2, score=0.646, total=   0.0s
[CV] max_depth=10, min_samples_leaf=2, min_samples_split=4 ...........
[CV]  max_depth=10, min_samples_leaf=2, min_samples_split=4, score=0.688, total=   0.0s
[CV] max_depth=10, min_samples_leaf=2, min_samples_split=4 ...........
[CV]  max_depth=10, min_samples_leaf=2, min_samples_split=4, score=0.619, total=   0.0s
[CV] max_depth=10, min_samples_leaf=2, min_samples_split=4 ...........
[CV]  max_depth=10, min_samples_leaf=2, min_samples_split=4, score=0.598, total=   0.0s
[CV] max_depth=10, min_samples_leaf=2, min_samples_split=4 ...........
[CV]  max_depth=10, min_samples_leaf=2, min_samples_split=4, score=0.733, total=   0.0s
[CV] max_depth=10, min_samples_leaf=2, min_samples_split=4 ...........
[CV]  max_depth=10, min_samples_leaf=2, min_sa

[CV]  max_depth=10, min_samples_leaf=6, min_samples_split=4, score=0.673, total=   0.0s
[CV] max_depth=10, min_samples_leaf=6, min_samples_split=4 ...........
[CV]  max_depth=10, min_samples_leaf=6, min_samples_split=4, score=0.660, total=   0.0s
[CV] max_depth=10, min_samples_leaf=6, min_samples_split=4 ...........
[CV]  max_depth=10, min_samples_leaf=6, min_samples_split=4, score=0.740, total=   0.0s
[CV] max_depth=10, min_samples_leaf=6, min_samples_split=6 ...........
[CV]  max_depth=10, min_samples_leaf=6, min_samples_split=6, score=0.694, total=   0.0s
[CV] max_depth=10, min_samples_leaf=6, min_samples_split=6 ...........
[CV]  max_depth=10, min_samples_leaf=6, min_samples_split=6, score=0.724, total=   0.0s
[CV] max_depth=10, min_samples_leaf=6, min_samples_split=6 ...........
[CV]  max_depth=10, min_samples_leaf=6, min_samples_split=6, score=0.673, total=   0.0s
[CV] max_depth=10, min_samples_leaf=6, min_samples_split=6 ...........
[CV]  max_depth=10, min_samples_leaf=6, min_sa

[CV]  max_depth=10, min_samples_leaf=10, min_samples_split=6, score=0.587, total=   0.0s
[CV] max_depth=10, min_samples_leaf=10, min_samples_split=6 ..........
[CV]  max_depth=10, min_samples_leaf=10, min_samples_split=6, score=0.772, total=   0.0s
[CV] max_depth=10, min_samples_leaf=10, min_samples_split=8 ..........
[CV]  max_depth=10, min_samples_leaf=10, min_samples_split=8, score=0.681, total=   0.0s
[CV] max_depth=10, min_samples_leaf=10, min_samples_split=8 ..........
[CV]  max_depth=10, min_samples_leaf=10, min_samples_split=8, score=0.721, total=   0.0s
[CV] max_depth=10, min_samples_leaf=10, min_samples_split=8 ..........
[CV]  max_depth=10, min_samples_leaf=10, min_samples_split=8, score=0.681, total=   0.0s
[CV] max_depth=10, min_samples_leaf=10, min_samples_split=8 ..........
[CV]  max_depth=10, min_samples_leaf=10, min_samples_split=8, score=0.587, total=   0.0s
[CV] max_depth=10, min_samples_leaf=10, min_samples_split=8 ..........
[CV]  max_depth=10, min_samples_leaf=10,

[Parallel(n_jobs=1)]: Done 625 out of 625 | elapsed:   21.0s finished


In [15]:
best_clf.fit(X_train, y_train)

# Make predictions using the new model.
y_train_pred = best_clf.predict(X_train)
y_test_pred = best_clf.predict(X_test)

# Calculating accuracies
train_accuracy = accuracy_score(y_train, y_train_pred)
test_accuracy =  accuracy_score(y_test, y_test_pred)

print('The training accuracy is', train_accuracy)
print('The test accuracy is', test_accuracy)

The training accuracy is 0.8637640449438202
The test accuracy is 0.8379888268156425
