# Decision Tree Classifier from Scratch
***
## Table of Contents
***

In [64]:
from sklearn.datasets import load_breast_cancer
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Decision trees and regression trees are collectively referred to as **CART**, which stands for **Classification and Regression Trees**.

- **Decision Trees** are used for classification tasks, where the target variable is *categorical*.

- **Regression Trees** are used for regression tasks, where the target variable is *continuous*.

CART is a popular algorithm that can handle both types of tasks by optimising for different criteria:

- For classification, CART minimises classification error, Gini impurity, or entropy.

- For regression, CART minimises the mean squared error (MSE) or mean absolute error (MAE).

Both types of trees follow the same core idea of splitting the data based on conditions to create homogeneous subsets, but their objectives differ depending on the problem type.
In this notebook, we will build a predictive model using Decision Trees on the breast cancer dataset from the scikit-learn library.

## 1. Load Data

In [65]:
# Load the dataset
data = load_breast_cancer()
# data = datasets.load_iris()
X, y = data.data, data.target
feature_names = data.feature_names

# Check the shape of the data
print(f"Features shape: {X.shape}")
print(f"Target shape: {y.shape}")
print(f"Features: \n{feature_names}")

Features shape: (569, 30)
Target shape: (569,)
Features: 
['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']


In [66]:
df = pd.DataFrame(data=X, columns=feature_names)
df['diagnosis'] = y
df.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,diagnosis
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,0
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,0
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,0
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,0


In [67]:
df['diagnosis'].value_counts()

diagnosis
1    357
0    212
Name: count, dtype: int64

In this dataset, the features represent the characteristics of breast cancer (e.g., radius, texture, etc.), while the target is a boolean value indicating whether the tumour is benign (0) or malignant (1).

## 2. Train Test Split
Train test split is a fundamental model validation technique in machine learning. It divides a dataset into two separate portions: a **training set** used to train a model, and a **testing set** used to evaluate how well the model can perform on unseen data. 

The typical split ratio is 80% for training and 20% for testing, though this can vary (70/30 or 90/10 are also common). The key principle is that the test set must remain completely separated during model training process, and should never be used to make decisions about the model or tune parameters. 

The split is usually done randomly to ensure both sets are representative of the overall dataset, and many libraries (such as scikit-learn) provide build-in functions that handle this process automatically while maintaining proper randomisation.


In [68]:
def train_test_split(X: np.array, y: np.array, test_size: float = 0.2,
                     random_state: int = None) -> tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
    """
    Split arrays or matrices into random train and test subsets.

    Args:
        X (np.array): Input features, a 2D array with rows (samples) and columns (features).
        y (np.array): Target values/labels, a 1D array with rows (samples).
        test_size (float): Proportion of the dataset to include in the test split. Must be between 0.0 and 1.0. default = 0.2
        random_state (int): Seed for the random number generator to ensure reproducible results. default = None

    Returns:
        tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
        A tuple containing:
            - X_train (np.ndarray): Training set features.
            - X_test (np.ndarray): Testing set features.
            - y_train (np.ndarray): Training set target values.
            - y_test (np.ndarray): Testing set target values.
    """
    # Set a random seed if it exists
    if random_state:
        np.random.seed(random_state)

    # Create a list of numbers from 0 to len(X)
    indices = np.arange(len(X))

    # Shuffle the indices
    np.random.shuffle(indices)

    # Define the size of our test data from len(X)
    test_size = int(test_size * len(X))

    # Generate indices for test and train data
    test_indices: list[int] = indices[:test_size]
    train_indices: list[int] = indices[test_size:]

    # Return: X_train, X_test, y_train, y_test
    return X[train_indices], X[test_indices], y[train_indices], y[test_indices]

## 3. Gini Impurity and Entropy

### Gini Impurity
Gini impurity is a measure of the likelihood that a randomly chosen sample from a dataset will be incorrectly classified. It quantifies how impure a node is, with values ranging from $0$ (minimum impurity) to $0.5$ (maximum impurity) for binary classification. For multi-class problems, however, the maximum impurity occurs when all classes are equally probable, and the value depends on the number of classes. The formula for Gini impurity is:

\begin{align*}
G = 1 - \sum_{i=1}^{k} p_{i}^{2}
\end{align*}

where:
- $k$: Number of classes.
- $p_{i}$: Proportion of samples belonging to class $i$ in the node.

In [69]:
def gini(y):
    proportions = np.bincount(y) / len(y)
    return 1 - np.sum(proportions**2)

### Entropy
Entropy measures the amount of uncertainty or randomness in the data. It is based on information theory and represents the expected amount of information required to classify a sample. For binary classification ranges from $0$ (minimum entropy) to $1$ (maximum entropy). For $k$ classes, the range is from $0$ to $log_{2}(k)$. The formula for entropy is:


\begin{align*}
H = - \sum_{i=1}^{k} p_{i} log_{2}(p_{i})
\end{align*}

where:
- $k$: Number of classes.
- $p_{i}$: Proportion of samples belonging to class $i$ in the node.

Gini tends to split nodes based on the most frequent classes, while entropy provides a more nuanced measure especially in cases with many classes or highly imbalanced distributions. Both metrics provide similar results, but Gini is often preferred for computational efficiency.

In [70]:
def entropy(y):
    proportions = np.bincount(y) / len(y)
    proportions = proportions[proportions > 0]  # Avoid log(0)
    return -np.sum(proportions * np.log2(proportions))

In [71]:
print(df['diagnosis'].value_counts())

diagnosis
1    357
0    212
Name: count, dtype: int64


In [72]:
print(f"Gini Impurity: {gini(y):.5f}")
print(f"Entropy: {entropy(y):.5f}")

Gini Impurity: 0.46753
Entropy: 0.95264


## 4. Information Gain
Information Gain is a metric used to measure the effectiveness of a feature in splitting a dataset into subsets that are more pure concerning the target variable. It quantifies the reduction in entropy or Gini impurity, and a higher information gain indicates a better feature for making splits.

\begin{align*}
IG(S, A) = H(S) - \sum_{i=1}^{n} \dfrac{|S_i|}{|S|}H(S_{i})
\end{align*}

where:
- $H(S)$: Entropy (or Geni) of the original dataset $S$.
- $S_{i}$: Subset of $S$ created by splitting on feature $A$ for the $i_{th}$ value or range of the feature.
- $\dfrac{|S_i|}{|S|}$: Proportion of samples in subset $S_{i}$.
- $H(S_{i})$: Entropy (or Geni) of subset $S_{i}$.



The following `information_gain` function calculates the difference between the metric for the parent node and the weighted average of the metrics for the child nodes (left and right splits).

In [73]:
def information_gain(y, y_left, y_right, metric="gini"):
    """
    Calculate the information gain of a split.

    Args:
        y (array-like): Labels of the parent node.
        y_left (array-like): Labels of the left child node after the split.
        y_right (array-like): Labels of the right child node after the split.
        metric (str, optional): Splitting criterion, either "gini" or "entropy". Defaults to "gini".

    Returns:
        float: Information gain resulting from the split.
    """
    if metric == "gini":
        parent_metric = gini(y)
        left_metric = gini(y_left)
        right_metric = gini(y_right)
    else:  # metric == "entropy"
        parent_metric = entropy(y)
        left_metric = entropy(y_left)
        right_metric = entropy(y_right)

    weighted_metric = (
        len(y_left) / len(y) * left_metric
        + len(y_right) / len(y) * right_metric
    )
    return parent_metric - weighted_metric

## 5. Find the Best Split
This function identifies the best feature and threshold to split the data using the specified metric (Gini or Entropy).

Steps are:

1. Loop through all features.

2. For each feature, iterate over all unique thresholds.

3. Split the data into left and right subsets based on the threshold (skip invalid ones).

4. Compute the Gini/Entropy for both subsets and calculate Information Gain.

5. If the newly computed `info_gain` > `best_info_gain`, then update `best_info_gain` with the new information.

In [74]:
def best_split(X, y, metric="gini", feature_names=None):
    """
    Find the best split for a dataset.

    Args:
        X (array-like): Input features (2D array of shape [n_samples, n_features]).
        y (array-like): Labels (1D array of shape [n_samples]).
        metric (str, optional): Splitting criterion, either "gini" or "entropy". Defaults to "gini".
        feature_names (list, optional): List of feature names. If None, indices are used. Defaults to None.

    Returns:
        dict: Dictionary containing the best split with keys:
              - 'feature_index' (int): Index of the feature used for the split.
              - 'feature_name' (str/int): Name or index of the feature.
              - 'threshold' (float): Threshold value for the split.
    """
    best_info_gain = float("-inf")
    best_split = None
    n_features = X.shape[1]

    # Iterate over all features.
    for feature in range(n_features):
        # Iterate over all unique thresholds for each feature.
        thresholds = np.unique(X[:, feature])
        for threshold in thresholds:
            # Split the data into left and right subsets based on the threshold.
            left_mask = X[:, feature] <= threshold
            right_mask = X[:, feature] > threshold

            # Skip invalid splits.
            if sum(left_mask) == 0 or sum(right_mask) == 0:
                continue

            # Compute IG.
            info_gain = information_gain(
                y, y[left_mask], y[right_mask], metric)

            # Update `best_info_gain` if `info_gain` > `best_info_gain`.
            if info_gain > best_info_gain:
                best_info_gain = info_gain
                best_split = {
                    "feature_index": feature,
                    "feature_name": feature_names[feature] if feature_names is not None else feature,
                    "threshold": threshold,
                }

    return best_split

In [75]:
split = best_split(X, y, metric="gini", feature_names=feature_names)
print("Best Split:", split)

Best Split: {'feature_index': 20, 'feature_name': np.str_('worst radius'), 'threshold': np.float64(16.77)}


## 6. Build Tree
This function resursively creates the tree structure as a nested dictionary with conditions (`feature` and `threshold`) and leaf nodes.

In [76]:
def build_tree(X, y, max_depth=None, depth=0, metric="gini", feature_names=None):
    """
    Build a decision tree using recursive splitting.

    Args:
        X (array-like): Input features (2D array of shape [n_samples, n_features]).
        y (array-like): Labels (1D array of shape [n_samples]).
        max_depth (int, optional): Maximum depth of the tree. Defaults to None (unlimited depth).
        depth (int, optional): Current depth of the tree. Used internally for recursion. Defaults to 0.
        metric (str, optional): Splitting criterion, either "gini" or "entropy". Defaults to "gini".
        feature_names (list, optional): List of feature names. If None, indices are used. Defaults to None.

    Returns:
        dict: Nested dictionary representing the tree structure.
              Nodes contain keys: 'type', 'feature', 'threshold', 'left', 'right'.
              Leaf nodes contain keys: 'type', 'value'.
    """

    # Stop the recursion if all labels are identical or the maximum depth is reached.
    if len(set(y)) == 1 or (max_depth is not None and depth == max_depth):
        return {"type": "leaf", "value": np.argmax(np.bincount(y))}

    # Find the best split.
    split = best_split(X, y, metric, feature_names)
    if not split:
        return {"type": "leaf", "value": np.argmax(np.bincount(y))}

    # Split the data into left and right subsets.
    # Use feature_index for calculations.
    left_mask = X[:, split["feature_index"]] <= split["threshold"]
    right_mask = X[:, split["feature_index"]] > split["threshold"]

    # Recursively build the left and right subtrees.
    left_tree = build_tree(X[left_mask], y[left_mask],
                           max_depth, depth + 1, metric, feature_names)
    right_tree = build_tree(X[right_mask], y[right_mask],
                            max_depth, depth + 1, metric, feature_names)

    # Return the tree structure as a nested dictionary.
    return {
        "type": "node",
        "feature": split["feature_name"],
        "threshold": split["threshold"],
        "left": left_tree,
        "right": right_tree,
    }

## 7. Traverse the Tree For Prediction
This function traverses the tree to make predictions by following the tree from the root to a leaf node.

In [77]:
def traverse_tree(x, tree, feature_names=None):
    """
    Traverse a decision tree to make a prediction for a single sample.

    Args:
        x (array-like): Single sample (1D array of features).
        tree (dict): Decision tree structure.
        feature_names (list, optional): List of feature names. Needed for name-to-index mapping. Defaults to None.

    Returns:
        int: Predicted label.
    """
    if tree["type"] == "leaf":
        return tree["value"]

    # Resolve feature index if feature_names is provided
    feature_index = feature_names.index(
        tree["feature"]) if feature_names is not None else tree["feature"]

    if x[feature_index] <= tree["threshold"]:
        return traverse_tree(x, tree["left"], feature_names)
    else:
        return traverse_tree(x, tree["right"], feature_names)

## 8. Accuracy
We will also compute accuracy to evaluate our custom decision trees model.

In [78]:
def accuracy(y_true, y_pred):
    return np.mean(y_true == y_pred)

## 9. Prediction
This function predicts labels for all samples in the dataset.

In [79]:
def predict(X, tree, feature_names=None):
    """
    Predict labels for the given dataset using a decision tree.

    Args:
        X (array-like): Input features (2D array for multiple samples or 1D array for a single sample).
        tree (dict): Decision tree structure.
        feature_names (list, optional): List of feature names. Needed for name-to-index mapping. Defaults to None.

    Returns:
        array-like: Predicted labels (1D array for multiple samples or a single label for one sample).
    """
    if len(X.shape) == 1:  # If a single sample is provided
        return traverse_tree(X, tree, feature_names)
    return np.array([traverse_tree(x, tree, feature_names) for x in X])

In [80]:
# Load the dataset
import pprint
data = load_breast_cancer()
X, y = data.data, data.target
feature_names = data.feature_names.tolist()

# Build the tree using Gini impurity
tree_gini = build_tree(X, y, max_depth=3, metric="gini",
                       feature_names=feature_names)

# Display the tree
pprint.pprint(tree_gini)

{'feature': 'worst radius',
 'left': {'feature': 'worst concave points',
          'left': {'feature': 'radius error',
                   'left': {'type': 'leaf', 'value': np.int64(1)},
                   'right': {'type': 'leaf', 'value': np.int64(0)},
                   'threshold': np.float64(0.8811),
                   'type': 'node'},
          'right': {'feature': 'worst texture',
                    'left': {'type': 'leaf', 'value': np.int64(1)},
                    'right': {'type': 'leaf', 'value': np.int64(0)},
                    'threshold': np.float64(25.5),
                    'type': 'node'},
          'threshold': np.float64(0.1357),
          'type': 'node'},
 'right': {'feature': 'mean texture',
           'left': {'feature': 'mean concave points',
                    'left': {'type': 'leaf', 'value': np.int64(1)},
                    'right': {'type': 'leaf', 'value': np.int64(0)},
                    'threshold': np.float64(0.06211),
                    'type': 'nod

## 10. Evaluate Model

In [81]:
# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42)

# Build the tree on training data
tree_gini = build_tree(X_train, y_train, max_depth=3,
                       metric="gini", feature_names=feature_names)

# Predict on the test set
y_pred = predict(X_test, tree_gini, feature_names=feature_names)

# Calculate accuracy
test_accuracy = accuracy(y_test, y_pred)
print("Test Accuracy:", test_accuracy)

# Predict a single sample
single_sample = X_test[0]
single_prediction = predict(
    single_sample, tree_gini, feature_names=feature_names)
print("Predicted label for single sample:", single_prediction)
print("Actual label for single sample:", y_test[0])

Test Accuracy: 0.9647058823529412
Predicted label for single sample: 1
Actual label for single sample: 1


## 11. Encapsulation

To encapsulate the codes above, it will be a better idea to divide them into two different classes:
1. `CustomDecisionTree` containing the core logic for building, traversing and predicting using the decision tree.
2. `Node` representing a single node in the decision tree, either internal node or leaf node.

In [82]:
class Node:
    """
    A class representing a node in the decision tree.
    """

    def __init__(self, node_type, value=None, feature=None, threshold=None, left=None, right=None):
        """
        Initialise a Node instance.


        Args:
            type (str): Type of the node ("leaf" or "node").
            value (int or None): Predicted label for leaf nodes. None for internal nodes.
            feature (int or None): Index of the feature for internal nodes. None for leaf nodes.
            threshold (float or None): Threshold value for internal nodes. None for leaf nodes.
            left (Node or None): Left child node. None for leaf nodes.
            right (Node or None): Right child node. None for leaf nodes.
        """
        self.type = node_type  # "leaf" or "node"
        self.value = value  # For leaf nodes
        self.feature = feature  # For internal nodes
        self.threshold = threshold  # For internal nodes
        self.left = left  # Left child
        self.right = right  # Right child

In [83]:
class CustomDecisionTree:
    """
    A class representing the decision tree model.
    """

    def __init__(self, max_depth=None, metric='gini'):
        """
        Initialise a CustomDecisionTree instance.

        Args:
            max_depth (int or None): Maximum depth of the tree. None for unlimited depth.
            metric (str): Splitting criterion, either "gini" or "entropy".
            root (Node): Root node of the decision tree.
            feature_names (list or None): List of feature names. None if not provided.
        """
        self.max_depth = max_depth
        self.metric = metric
        self.root = None
        self.feature_names = None

    def gini(self, y):
        """
        Calculate the Gini impurity.

        Args:
            y (array-like): Array of labels.

        Returns:
            float: Gini impurity.
        """
        if len(y) == 0:
            return 0
        proportions = np.bincount(y) / len(y)
        return 1 - np.sum(proportions ** 2)

    def entropy(self, y):
        """
        Calculate the entropy.

        Args:
            y (array-like): Array of labels.

        Returns:
            float: Entropy value.
        """
        if len(y) == 0:
            return 0
        proportions = np.bincount(y) / len(y)
        proportions = proportions[proportions > 0]  # Avoid log(0)
        return -np.sum(proportions * np.log2(proportions))

    def information_gain(self, y, y_left, y_right):
        """
        Compute the information gain of a split.

        Args:
            y (array-like): Labels of the parent node.
            y_left (array-like): Labels of the left child node.
            y_right (array-like): Labels of the right child node.

        Returns:
            float: Information gain from the split.
        """
        if self.metric == "gini":
            parent_metric = self.gini(y)
            left_metric = self.gini(y_left)
            right_metric = self.gini(y_right)
        else:  # metric == "entropy"
            parent_metric = self.entropy(y)
            left_metric = self.entropy(y_left)
            right_metric = self.entropy(y_right)

        weighted_metric = (
            len(y_left) / len(y) * left_metric
            + len(y_right) / len(y) * right_metric
        )
        return parent_metric - weighted_metric

    def best_split(self, X, y):
        """
        Find the best feature and threshold to split the dataset.

        Args:
            X (array-like): Input features.
            y (array-like): Labels.

        Returns:
            dict: Best split details with keys 'feature_index' and 'threshold'.
        """
        best_info_gain = float("-inf")
        best_split = None
        n_features = X.shape[1]

        # Iterate over all features.
        for feature in range(n_features):
            # Iterate over all unique thresholds for each feature.
            thresholds = np.unique(X[:, feature])
            for threshold in thresholds:
                # Split the data into left and right subsets based on the threshold.
                left_mask = X[:, feature] <= threshold
                right_mask = X[:, feature] > threshold

                # Skip invalid splits.
                if sum(left_mask) == 0 or sum(right_mask) == 0:
                    continue

                # Compute IG.
                info_gain = self.information_gain(
                    y, y[left_mask], y[right_mask])

                # Update `best_info_gain` if `info_gain` > `best_info_gain`.
                if info_gain > best_info_gain:
                    best_info_gain = info_gain
                    best_split = {
                        "feature_index": feature,
                        "threshold": threshold,
                    }

        return best_split

    def build_tree(self, X, y, depth=0):
        """
        Build the decision tree recursively.

        Args:
            X (array-like): Input features.
            y (array-like): Labels.
            depth (int): Current depth of the tree.

        Returns:
            Node: Root node of the decision tree.
        """
        # Stop recursion if all labels are identical or max depth is reached
        if len(set(y)) == 1 or (self.max_depth is not None and depth == self.max_depth):
            return Node(node_type="leaf", value=np.argmax(np.bincount(y)))

        # Find the best split
        split = self.best_split(X, y)
        if not split:
            return Node(node_type="leaf", value=np.argmax(np.bincount(y)))

        # Split the data
        left_mask = X[:, split["feature_index"]] <= split["threshold"]
        right_mask = X[:, split["feature_index"]] > split["threshold"]

        # Recursively build the left and right subtrees
        left_tree = self.build_tree(X[left_mask], y[left_mask], depth + 1)
        right_tree = self.build_tree(X[right_mask], y[right_mask], depth + 1)

        # Store the feature index directly for easier traversal
        feature_index = split["feature_index"]

        return Node(node_type="node", feature=feature_index, threshold=split["threshold"], left=left_tree, right=right_tree)

    def fit(self, X, y, feature_names=None):
        """
        Fit the decision tree model to the given data.

        Args:
            X (array-like): Input features.
            y (array-like): Labels.
            feature_names (list, optional): Names of the features. Defaults to None.
        """
        self.feature_names = feature_names
        self.root = self.build_tree(X, y)

    def traverse_tree(self, x, node):
        """
        Traverse the decision tree to make a prediction for a single sample.

        Args:
            x (array-like): Single sample.
            node (Node): Current node.

        Returns:
            int: Predicted label.
        """
        if node.type == "leaf":
            return node.value

        # node.feature is now the feature index
        feature_index = node.feature
        if x[feature_index] <= node.threshold:
            return self.traverse_tree(x, node.left)
        else:
            return self.traverse_tree(x, node.right)

    def predict(self, X):
        """
        Predict labels for the given dataset.

        Args:
            X (array-like): Input features.

        Returns:
            array-like: Predicted labels.
        """
        if len(X.shape) == 1:
            return self.traverse_tree(X, self.root)
        return np.array([self.traverse_tree(x, self.root) for x in X])

    def accuracy(self, y_true, y_pred):
        """
        Calculate the accuracy of the predictions.

        Args:
            y_true (array-like): True labels.
            y_pred (array-like): Predicted labels.

        Returns:
            float: Accuracy score.
        """
        return np.mean(y_true == y_pred)

    def print_tree(self, node=None, depth=0, prefix="Root: "):
        """
        Print the tree structure in a readable format.

        Args:
            node (Node, optional): Current node. Defaults to the root node.
            depth (int, optional): Current depth. Defaults to 0.
            prefix (str, optional): Prefix for the current node. Defaults to "Root: ".
        """
        if node is None:
            node = self.root

        if node.type == "leaf":
            print("  " * depth + prefix + f"Predict {node.value}")
        else:
            feature_name = self.feature_names[
                node.feature] if self.feature_names is not None else f"Feature_{node.feature}"
            print("  " * depth + prefix +
                  f"{feature_name} <= {node.threshold:.4f}")
            if node.left:
                self.print_tree(node.left, depth + 1, "├─ True: ")
            if node.right:
                self.print_tree(node.right, depth + 1, "└─ False: ")

In [84]:
# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target
feature_names = data.feature_names

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42)

# Train the decision tree
tree = CustomDecisionTree(max_depth=3, metric="gini")
tree.fit(X_train, y_train, feature_names)

# Predict and evaluate
y_pred = tree.predict(X_test)
accuracy = tree.accuracy(y_test, y_pred)
print(f"Test Accuracy: {accuracy:.4f}")

# Single prediction
sample = X_test[0]
single_prediction = tree.predict(sample)
print(f"Predicted: {single_prediction}, Actual: {y_test[0]}")

# Print the tree structure
print("\nDecision Tree Structure:")
tree.print_tree()

Test Accuracy: 0.9647
Predicted: 1, Actual: 1

Decision Tree Structure:
Root: mean concave points <= 0.0507
  ├─ True: worst radius <= 16.7700
    ├─ True: radius error <= 0.6061
      ├─ True: Predict 1
      └─ False: Predict 0
    └─ False: mean texture <= 15.7000
      ├─ True: Predict 1
      └─ False: Predict 0
  └─ False: worst texture <= 19.4900
    ├─ True: mean concave points <= 0.0853
      ├─ True: Predict 1
      └─ False: Predict 0
    └─ False: worst area <= 709.0000
      ├─ True: Predict 1
      └─ False: Predict 0
