## 5.4.2 Model evaluation of decision tree
In this exercise, we shall evaluate the model built and trained earlier via the performance metrics explained previously. Perform the following steps to do that.

0. Import libraries needed and run required steps

In [12]:
import numpy as np
import pandas as pd
df = pd.read_csv('../datasets/clean_creditcard.csv')

from sklearn.tree import DecisionTreeClassifier

dt_object = DecisionTreeClassifier(max_depth=3)

X = df.drop(['Class_Category'], axis=1)
y = df[['Class_Category']]

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

dt_object.fit(X_train, y_train.values.ravel())
y_pred = dt_object.predict(X_test)

1. Calculate the classification accuracy via Python, as follows:

In [13]:
is_correct= y_pred==y_test.values.ravel()
print(np.mean(is_correct))

0.8955223880597015


2. Calculate the classification accuracy via Scikit-learn, as follows:

In [7]:
print(dt_object.score(X_test,y_test))

# or by running this code:

from sklearn import metrics
print(metrics.accuracy_score(y_test,y_pred))

0.8955223880597015
0.8955223880597015


3. Calculate true and false positive and negative rates

In [8]:
P = sum(y_test.values.ravel())
print(P)

TP = sum( (y_test.values.ravel()==1) & (y_pred==1) )
print(TP)

TPR = TP/P
print(TPR)

FN = sum( (y_test.values.ravel()==1) & (y_pred==0) )
print(FN)

FNR = FN/P
print(FNR)

N= sum(y_test.values.ravel()==0)
print(N)

TN= sum((y_test.values.ravel()==0) & (y_pred==0))
print(TN)

FP = sum((y_test.values.ravel()==0) & (y_pred==1))
print(FP)

TNR = TN/N
FPR = FP/N
print('the true negative rate is {} and the false positive rate is {}'. format(TNR,FPR))

132
109
0.8257575757575758
23
0.17424242424242425
136
131
5
the true negative rate is 0.9632352941176471 and the false positive rate is 0.03676470588235294


4. Calculate the confusion matrix.

In [9]:
from sklearn.metrics import confusion_matrix
print(f"Confusion Matrix : \n {confusion_matrix(y_test, y_pred)}")

Confusion Matrix : 
 [[131   5]
 [ 23 109]]


5. Calculate the precision, recall, and F1 score.

In [10]:
Precision = TP/ (TP+FP)
print(Precision)

Recall = TP/ (TP+FN)
print(Recall)

F1Score = 2*((Precision * Recall)/(Precision+Recall))
print(F1Score)

0.956140350877193
0.8257575757575758
0.8861788617886179


6. Generate the classification report:

In [11]:
from sklearn.metrics import classification_report
print(f"Classification Report : \n {classification_report(y_test, y_pred)}")

Classification Report : 
               precision    recall  f1-score   support

           0       0.85      0.96      0.90       136
           1       0.96      0.83      0.89       132

    accuracy                           0.90       268
   macro avg       0.90      0.89      0.89       268
weighted avg       0.90      0.90      0.89       268



We shall see from the previous results that the decision tree model performs well with the cleaned data as the accuracy, precision, recall, and f1 score for both classes are relatively good. 