## Classification tree
- This is a sequence of if-else questions about individual features.
- The objective of this is to infer class labels.
- It is able to capture non-linear relationships between features and labels.
- Doesn't require feature scaling (ex: Standardization...)

An example of a decison tree would look like:

```mermaid
graph TD
    A[Concave points_mean <= 0.051]
    A -->|True| B[radius_mean <= 14.98]
    A -->|False| C[radius_mean <= 11.345]
    B -->|True| D["257 benign, 7 malignant\nPredict -> benign"]
    B -->|False| E["9 benign, 11 malignant\nPredict -> malignant"]
    C -->|True| F["4 benign, 0 malignant\nPredict -> benign"]
    C -->|False| G["15 benign, 152 malignant\nPredict -> malignant"]
```

### Classification tree in scikit-learn

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

dtree = DecisionTreeClassifier(max_depth=2, random_state=1)
dtree.fit(X_train, y_train)

y_pred = dtree.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

## Decision regions

Decision regions are regions in the feature space where all instances are assigned to one class label.

Decision boundaries are surfaces separating different decision regions.

## Train your first classification tree

In [3]:
import pandas as pd

# Import DecisionTreeClassifier from sklearn.tree
from sklearn.tree import DecisionTreeClassifier

# Import train_test_split from sklearn.model_selection
from sklearn.model_selection import train_test_split

data = pd.read_csv('data.csv')
X, y = data[["radius_mean", "concave points_mean"]], data["diagnosis"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Instantiate a DecisionTreeClassifier 'dt' with a maximum depth of 6
dt = DecisionTreeClassifier(max_depth=6, random_state=1)

# Fit dt to the training set
dt.fit(X_train, y_train)

# Predict test set labels
y_pred = dt.predict(X_test)
print(y_pred[0:5])

['B' 'M' 'M' 'B' 'B']
