# Decision Tree Classifier

Decision Trees is a non-linear model that can be used for regression and classification. The structure of a decision tree consists of root nodes, leaf nodes and branches. 


# Training Process

Decision Trees make predictions by stratifying or segmenting at each split using a criteria. It uses a top-down greedy approach called recursive binary splitting. 

The first split (root node) is the most influential at predicting the target variable. The model considers all predictors and all possible values for the splitting point for each predictor. For classification problems the tree makes splits by finding the cut points that minimize the misclassification rate. For regression problems, the tree splits based on minimizing the residual standard error.

The model continues to make splits until a criterion is met. For example, the tree continues until no segments contain more than 10 observations. For regression, the mean predicted response in the terminal node is the final output. For classification, we detect the most frequently occurring class in the selected region.  For classification tress the Gini Index and Entropy are better to evaluate node purity. Values closer to zero imply that all training instances belong to the same class. When a region is completely pure, it cannot split further.

If left unconstrained, the model tends to learn the training data quite well (overfitting). However various hyperparameters can be tweaked to restrict the model's freedom and control overfitting.


# Decision Tree Hyperparameters

Increasing min and reducing max will apply regularization. 

- min_samples_split: The minumum number of samples in a node

- min_samples_leaf:  The minumum number of samples in a leaf

- max_leaf_nodes: maximum number of leaf nodes

- max_features: maximum number of features for splitting at each   node.




# Decision Trees Pros and Cons


**Pros**

- Easy to interpret

- Allows us to determine feature importance. This could be used as a method conduct feature extraction.


**Cons**

- Sensitive to variation in training data

- Is prone to overfitting



# 1. Libraries

In [1]:
# Importing Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_transformer
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV

In [2]:
# Import Data
df = pd.read_csv('LungCapData.csv')
df.head()

Unnamed: 0,LungCap,Age,Height,Smoke,Gender,Caesarean
0,6.475,6,62.1,no,male,no
1,10.125,18,74.7,yes,female,no
2,9.55,16,69.7,no,female,yes
3,11.125,14,71.0,no,male,no
4,4.8,5,56.9,no,male,no


# 2. Preprocessing

In [3]:
# Predictors and Target
X = df.drop(columns = ['LungCap'])
y = df['LungCap']

# Instantiate one-hot encoder
ohe = OneHotEncoder()

# columns to be one hot encoded
ct = make_column_transformer(

    (ohe, ['Smoke', 'Gender', 'Caesarean']),
    remainder = 'passthrough')

# predictors and target variable
X = np.array(ct.fit_transform(X))
y = np.array(y)

# Checck input and target variable shape
X.shape, y.shape

((725, 8), (725,))

In [4]:
# Training and Testing subsets 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 911)

# Feature Scaling
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
print('Standardized feature Mean:',  X_train.mean().round())
print('Standardized feature SD :',   X_train.std().round())

Standardized feature Mean: 0.0
Standardized feature SD : 1.0


In [5]:
X_train.shape, y_train.shape, X_test.shape, y_test.shape

((580, 8), (580,), (145, 8), (145,))

# 3. Training

In [6]:
# Training the Decision Tree Classifier on default parameters
dt = DecisionTreeRegressor(random_state = 0)
dt.fit(X_train, y_train)

DecisionTreeRegressor(random_state=0)

# 4. Testing

In [7]:
# Predicting the Test set results
y_pred = dt.predict(X_test)

# Mean squared error
print('Mean Squared Error :', mean_squared_error(y_test, y_pred))

Mean Squared Error : 1.98515625


# 5. K-Fold Cross Validation

In [8]:
# 10 fold cross validation
R2 = cross_val_score(estimator = DecisionTreeRegressor(),
                             X = X,
                             y = y,
                             cv = 10)

# Cross validation accuracy and standard deviation
print(R2)
print("R2: {:.3f} %".format(R2.mean()*100))
print("R2 Standard Deviation: {:.3f} %".format(R2.std()*100))

[0.48061774 0.71070523 0.66352599 0.5700009  0.68686148 0.77086239
 0.67520901 0.73646311 0.63624062 0.52950831]
R2: 64.600 %
R2 Standard Deviation: 8.808 %


# 6. Hyperparametric Tuning

In [9]:
# Grid Search CV

# Hyperparameters
param_grid = [{
     'max_depth': [6, 10],
     'max_features': ['auto'],
     'min_samples_leaf': [3, 5],
     'min_samples_split': [4, 6]}]


# Configure GridSearchCV
grid_search = GridSearchCV(DecisionTreeRegressor(),
                           param_grid, cv=5, scoring = 'r2',
                            n_jobs=-1)
# Initiate Search
grid_search.fit(X_train, y_train)

# Extract Tuned Parameters and Predictive Accuracy
tuned_params = grid_search.best_params_
tuned_score = grid_search.best_score_
best_estimator = grid_search.best_estimator_

# Print Results
print("Best R2: {:.2f} %".format(grid_search.best_score_*100))
print("Best Parameters:", tuned_params)

Best R2: 77.88 %
Best Parameters: {'max_depth': 6, 'max_features': 'auto', 'min_samples_leaf': 5, 'min_samples_split': 4}


In [10]:
# Randomized Search

# Hyperparameters
param_grid = {
        "max_depth": [6, 8, 10, 12, 14],
        "max_features": ['auto'],
        "min_samples_leaf": [2, 3, 4],
        "min_samples_split": [2, 3, 4, 5]}

# Randomized Search initialization
random_search = RandomizedSearchCV(DecisionTreeRegressor(), param_grid, n_iter=32,
                                        scoring="r2", cv=5,
                                        n_jobs=-1, random_state=911)
# Initiate Search
random_search.fit(X_train, y_train)

# Extract Tuned Parameters and Predictive Accuracy
tuned_params = random_search.best_params_
tuned_score = random_search.best_score_
best_estimator = random_search.best_estimator_

# Print Results
print("Best Accuracy: {:.2f} %".format(random_search.best_score_*100))
print("Best Parameters:", tuned_params)

Best Accuracy: 77.87 %
Best Parameters: {'min_samples_split': 5, 'min_samples_leaf': 4, 'max_features': 'auto', 'max_depth': 6}
