#__Applying Decision Tree__

Let's look at how to construct a decision tree model.

## Step 1: Import Required Libraries and Read the Dataset

- Import NumPy, pandas, Seaborn, and matplotlib.pyplot libraries
- Configure matplotlib settings
- Read the dataset and display the first few rows


In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
df = pd.read_csv('balance-scale.data',sep=',')
df.head()

__Observation__
- In the above output, the first few rows of the dataset can be seen.

## Step 2: Analyze the Dataset

- Display information about the dataset


In [None]:
df.info()

__Observations__
- Class Name is the target variable that we are going to predict.
- You can also see that these are non null values.

## Step 3: Split the Dataset

- Split the dataset into training and testing sets


In [None]:
from sklearn.model_selection import train_test_split
X = df.drop('Class Name',axis=1)
y = df[['Class Name']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3,random_state=42)

## Step 4: Train the Decision Tree Classifier

- Import DecisionTreeClassifier from sklearn.tree
- Train the model using the training dataset


In [None]:
from sklearn.tree import DecisionTreeClassifier
clf_model = DecisionTreeClassifier(criterion="gini", random_state=42, max_depth=3, min_samples_leaf=5)   
clf_model.fit(X_train,y_train)

## Step 5: Make Predictions

- Predict the outcomes using the testing dataset


In [None]:
y_predict = clf_model.predict(X_test)

## Step 6: Evaluate the Model

- Import accuracy_score, classification_report, and confusion_matrix from sklearn.metrics
- Calculate the accuracy score


In [None]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Assuming y_test contains the true labels and y_predict contains the predicted labels
accuracy = accuracy_score(y_test, y_predict)
print("Accuracy:", accuracy)

cm = confusion_matrix(y_test, y_predict)
print("Confusion Matrix:")
print(cm)

report = classification_report(y_test, y_predict, zero_division=1)
print("Classification Report:")
print(report)

__Observation__
- In the above output, you can see the confusion matrix and the values for accuracy, precision, recall, f1-score, and support.

## Step 7: Display the Decision Tree

- Create a list of target and feature names
- Import export_text from sklearn.tree
- Display the decision tree as text


In [None]:
target = list(df['Class Name'].unique())
feature_names = list(X.columns)

__Observation__
- Now, we have created a target.

Let's plot the decision tree using an export_tree.

In [None]:
from sklearn.tree import export_text
r = export_text(clf_model, feature_names=feature_names)
print(r)

In [None]:
from sklearn.tree import plot_tree

In [None]:
plt.figure(figsize=(20,20))
plot_tree(clf_model,feature_names=X_train.columns)
plt.show()

In [None]:
clf_model.get_depth()

In [None]:
clf_model.criterion

In [None]:
clf_model.feature_importances_

In [None]:
y_train.value_counts(normalize=True)

In [None]:
1 - y_train.value_counts(normalize=True)

In [None]:
from sklearn.tree import DecisionTreeClassifier
clf_model = DecisionTreeClassifier(criterion="gini", random_state=42, max_depth=3, min_samples_leaf=5)   
clf_model.fit(X_train,y_train)

In [None]:
from sklearn.tree import DecisionTreeClassifier
clf_model = DecisionTreeClassifier(criterion="entropy", random_state=42, max_depth=3, min_samples_leaf=5)   
clf_model.fit(X_train,y_train)

In [None]:
from sklearn.model_selection import GridSearchCV

params = {"criterion":['gini','entropy'],
         "max_depth":[2,3,4,5,6],
         "min_samples_leaf": [2,3,4,5,6,7,8]}

grid = GridSearchCV(DecisionTreeClassifier(), param_grid=params, cv=5,verbose=1)

In [None]:
grid.fit(X_train,y_train)

In [None]:
grid.best_estimator_

In [None]:
grid.best_score_

In [None]:
# Recommended to Apply GridSearchCV

__Observations__
- You can see how the decision tree has spread.
- For example, in the above case, the left weight is distributed between two points: less than 2.5 and greater than 2.5.
- The right distance is divided into two points: less than 1.5 and greater than 5.
- Finally, we have a class that is making predictions, as this is how it has been predicted from top to bottom.