### Decision Trees - Iris dataset

Now we will use decision trees to address one of the most known database in machine learning. It was proposed by R.A. Fisher in 1936 and the goal is to classify instances of iris plants in three subspecies: iris-virginica, iris-setosa and iris-versicolor. The attributes describing each instance are the sizes (length and width) of the petal and sepal.
The dataset contains 150 examples (each of the 3 classes is represented by exactly 50 instances). 

<img src="iris.jpg">

In order to implement the models we will use the class <a href="http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier">
DecisionTreeClassifier</a> from *sklearn.tree*. 

First we import the libraries we will need. In addition we will use the first code cell to activate the *inline* mode for the graphics generated by *matplotlib*. We also initialize the seed of the random generator. 

In [None]:
import numpy as np
import numpy.matlib as matl
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
from matplotlib.colors import ListedColormap

%matplotlib inline
np.random.seed(19)

Now we will load the Iris problem dataset. This problem is so famous that the dataset is included in the module *sklearn.datasets*.

In [None]:
from sklearn.datasets import load_iris
iris = load_iris()

Now we proceed to visualize the data. As we can observe the problem is not too hard. One of the classes (setosa) is completely separated from the other two, which are slighly overlapped.

In [None]:
plt.figure(figsize=(12,15))
n_classes = 3
plot_colors = "bry"

for pairidx, pair in enumerate([[0, 1], [0, 2], [0, 3], [1, 2], [1, 3], [2, 3]]):
    X = iris.data[:, pair]
    y = iris.target

    plt.subplot(3, 2, pairidx + 1)
    plt.xlabel(iris.feature_names[pair[0]])
    plt.ylabel(iris.feature_names[pair[1]])
    plt.grid(True)
        
    plt.plot(X[y==0,0], X[y==0,1], 'bo', label=iris.target_names[0][0:3])
    plt.plot(X[y==1,0], X[y==1,1], 'ro', label=iris.target_names[1][0:3])
    plt.plot(X[y==2,0], X[y==2,1], 'yo', label=iris.target_names[2][0:3])

plt.legend(loc=2)
plt.show()

Now we will train a <a href="http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier">
DecisionTreeClassifier</a>. The most important arguments for DecisionTreeClassifier builder are the following: 

- *criterion:* criterion for splitting the tree nodes. It can be 'gini' or 'entropy' (this last is equivalent to information gain).

- *max_depth:* maximum depth of the decision tree.

The examples reaching a tree node are used to compute statistics related
to estimate the quality of subsequent splittings at that node. The examples
are also used to compute statistics related to the class to be predicted
in case no further splittings are made. This number should be large enough
to ensure these statistics quality.
Thus requirements about the minimum amount of examples are needed:

- *min_samples_split:* minimum number of examples in a tree node required to be splitted.

- *min_samples_leaf:* minimum number of examples in a classification node

In next example we will use the Gini criterion and will let the tree grow not imposing any maximum depth.
In order to visualize the results more easely we will only use *petal_length* and *petal_width* attributes.

In [None]:
x = iris.data[:,[2,3]] # Los atributos 2 y 3 son petal_length y petal_width
y = iris.target

clf = DecisionTreeClassifier(criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1)
clf = clf.fit(x, y)

Next cell is used to visualize the constructed decision tree:

In [None]:
from libreria_aux_arboles import tree_to_code, tree_to_pseudo

startbold = '\033[1m'
endbold = '\033[0m'

tree_to_code(clf, [iris.feature_names[2], iris.feature_names[3]],
             start_bold=startbold, end_bold=endbold)

#from graphviz import Source
#Source( tree.export_graphviz(clf, out_file=None,
#                             feature_names=[iris.feature_names[2], iris.feature_names[3]],
#                             class_names=iris.target_names,
#                             filled=True, rounded=True,
#                             special_characters=True))

Finally we show the classifier's decision frontiers.

In [None]:
plt.figure(figsize=(8, 8))

plot_step = 0.02
x_min, x_max = x[:, 0].min() - 1, x[:, 0].max() + 1
y_min, y_max = x[:, 1].min() - 1, x[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step),
                     np.arange(y_min, y_max, plot_step))

z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
z = z.reshape(xx.shape)
cs = plt.contourf(xx, yy, z, cmap=plt.cm.Paired)

plt.xlabel(iris.feature_names[2])
plt.ylabel(iris.feature_names[3])

plt.plot(x[y==0,0], x[y==0,1], 'bo', label=iris.target_names[0])
plt.plot(x[y==1,0], x[y==1,1], 'ro', label=iris.target_names[1])
plt.plot(x[y==2,0], x[y==2,1], 'yo', label=iris.target_names[2])
    
plt.legend(loc=2)
plt.show()

Next we perform the following improvements:

- We collect all previous code in a few cells so it is easier to perform different experiments.
- We include a training set/test set partition of the database in order to properly validate the model. This allows to measure the predictive quality of the model by means of the scoring and the confusion matrix.

Try different parameters and respond to the questions made at the end.

In [None]:
from sklearn.model_selection import train_test_split

# training /test split
testsize = 0.3 # in the [0,1] range. 1: 100%
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=testsize, random_state=5)

# Decision tree construction ---------------------------------------------------------------------
clf = DecisionTreeClassifier(criterion='gini', max_depth=2, min_samples_split=2, min_samples_leaf=1)
clf = clf.fit(x_train, y_train)

In [None]:
tree_to_code(clf, [iris.feature_names[2], iris.feature_names[3]], start_bold=startbold, end_bold=endbold)


# Tree visualization --------------------------------------------------------------------
#Source( tree.export_graphviz(clf, out_file=None,
#                             feature_names=[iris.feature_names[2], iris.feature_names[3]],
#                             class_names=iris.target_names,
#                             filled=True, rounded=True,
#                             special_characters=True))

In [None]:
# Decision frontier -----------------------------------------------------------------------
plt.figure(figsize=(8, 8))

plot_step = 0.02
x_min, x_max = x[:, 0].min() - 1, x[:, 0].max() + 1
y_min, y_max = x[:, 1].min() - 1, x[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step),
                     np.arange(y_min, y_max, plot_step))

z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
z = z.reshape(xx.shape)
cs = plt.contourf(xx, yy, z, cmap=plt.cm.Paired)

plt.xlabel(iris.feature_names[2])
plt.ylabel(iris.feature_names[3])

plt.plot(x[y==0,0], x[y==0,1], 'bo', label=iris.target_names[0][:3])
plt.plot(x[y==1,0], x[y==1,1], 'ro', label=iris.target_names[1][:3])
plt.plot(x[y==2,0], x[y==2,1], 'yo', label=iris.target_names[2][:3])
    
plt.legend(loc=2)
plt.show()

In [None]:
# Predictive quality of the model

print("Score training = %f" % (clf.score(x_train, y_train)))
print("Score test = %f" % (clf.score(x_test, y_test)))

from sklearn.metrics import confusion_matrix
print()
print("Confusion matrix in test:")
print()
print(confusion_matrix(y_test, clf.predict(x_test))) # row: real class; column: predicted class

### Questions:


**(1)** Do we obtain different results if the *entropy* criterion is used to split the nodes, instead of *gini* criterion?

**(2)** Using *gini* criterion and *max_depth=None* (no limit in the depth) try *min_samples_split=50* and *min_samples_leaf=50* (independently). What does it happen in each case? Did you expect it?