# Chapter 6 - Decision Trees

## Classification

In [1]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

iris = load_iris()
X = iris.data[:, 2:]  # extract petal length and width
y = iris.target

tree_clf = DecisionTreeClassifier(max_depth=2)
tree_clf.fit(X, y)

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=2,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')

### Export graph

We can use the `export_graphviz` function from scikit-learn to generate a GraphViz representation of the decision tree:

In [3]:
from sklearn.tree import export_graphviz

export_graphviz(tree_clf,
                out_file="output/iris_tree.dot",
                feature_names=iris.feature_names[2:],
                class_names=iris.target_names,
                rounded=True,
                filled=True)

Convert the file to PNG format:

In [7]:
! dot -Tpng output/iris_tree.dot -o output/iris_tree.png

![Decision tree](output/iris_tree.png)

## Purity

A node is 'pure' (e.g. gini = 0) if all training instances it applies to belong to the same class.

Gini impurity is calculated as:

$
\begin{align}
G_i = 1 - \sum_{k=1}^n {p_{i,k}}^2
\end{align}
$

where $p_{i,k}$ is the ratio of class k instances among the training instances in the $i^{th}$ node.