# üå≥ Decision Tree Classifier ‚Äî Professional Machine Learning Notebook

This notebook mirrors the structure, depth, and analytical rigor of the **Random Forest** work, delivering a corporate‚Äëgrade implementation of Decision Trees with strong evaluation, interpretability, and tuning discipline.

## 1Ô∏è‚É£ Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, roc_auc_score, roc_curve
plt.style.use('seaborn-v0_8')
print("Libraries Loaded")

## 2Ô∏è‚É£ Load Dataset
Update dataset path if required.

In [None]:
# df = pd.read_csv("your_dataset.csv")
# Display preview
# df.head()

## 3Ô∏è‚É£ Data Preparation

In [None]:
# Example split ‚Äî update target column accordingly
# X = df.drop("target", axis=1)
# y = df["target"]
# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

## 4Ô∏è‚É£ Baseline Model Training

In [None]:
dt = DecisionTreeClassifier(random_state=42)
dt.fit(X_train, y_train)
y_pred = dt.predict(X_test)
print("Baseline Model Trained")

## 5Ô∏è‚É£ Evaluation ‚Äî Classification Metrics

In [None]:
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

## 6Ô∏è‚É£ Confusion Matrix Visualization

In [None]:
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(5,4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title("Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()

## 7Ô∏è‚É£ ROC‚ÄëAUC Curve (if applicable)

In [None]:
if len(np.unique(y_test))==2:
    y_prob = dt.predict_proba(X_test)[:,1]
    fpr, tpr, _ = roc_curve(y_test, y_prob)
    plt.figure(figsize=(6,4))
    plt.plot(fpr,tpr,label=f'AUC={roc_auc_score(y_test,y_prob):.3f}')
    plt.plot([0,1],[0,1],'--')
    plt.title("ROC Curve")
    plt.xlabel("False Positive Rate")
    plt.ylabel("True Positive Rate")
    plt.legend()
    plt.show()

## 8Ô∏è‚É£ Feature Importance Analysis

In [None]:
importances = pd.Series(dt.feature_importances_, index=X.columns).sort_values(ascending=False)
plt.figure(figsize=(8,5))
sns.barplot(x=importances, y=importances.index)
plt.title("Feature Importance")
plt.show()
importances

## 9Ô∏è‚É£ Hyperparameter Tuning ‚Äî GridSearchCV

In [None]:
params = {
    'max_depth':[3,5,7,10,None],
    'min_samples_split':[2,5,10],
    'min_samples_leaf':[1,2,5]
}
grid = GridSearchCV(DecisionTreeClassifier(random_state=42), params, cv=5, scoring='accuracy', n_jobs=-1)
grid.fit(X_train, y_train)
print("Best Parameters:", grid.best_params_)

best_model = grid.best_estimator_
y_pred_best = best_model.predict(X_test)
print("\nTuned Model Accuracy:", accuracy_score(y_test, y_pred_best))

## üîé 1Ô∏è‚É£0Ô∏è‚É£ Visualizing the Decision Tree Structure

In [None]:
plt.figure(figsize=(18,8))
plot_tree(best_model, filled=True, feature_names=X.columns, class_names=True, fontsize=8)
plt.show()

## üß† Insights & Business Interpretation
- Decision Tree provides transparent model interpretability
- Feature Importance highlights key drivers
- Hyperparameter tuning improves generalization
- Suitable for operational ML and stakeholder communication