# Decision Trees

Decision Trees are very versatile, they can perform both regression and classification tasks.

Benefits:
- Don’t require feature scaling or centering at all. 
- Easy to interpret than other models.
- Make very few assumptions about the data.

In [2]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

iris = load_iris()
x = iris.data[:, 2:]  # petal length and width
y = iris.target

tree = DecisionTreeClassifier(max_depth=2)
tree.fit(x, y)

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=2,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')

In [3]:
# Export visualization
from sklearn.tree import export_graphviz

export_graphviz(
    tree,
    out_file="iris_tree.dot",
    feature_names=iris.feature_names[2:],
    class_names=iris.target_names,
    rounded=True,
    filled=True,
)

# and convert .dot to png with command..
# dot -Tpng iris_tree.dot -o iris_tree.png

![alt](images/iris_tree.png)

To avoid overfitting the training data, one can at least restrict the maximum depth of the Decision Tree.

Other regularization parameters:
- min_samples_split: minimum number of samples a node must have before it can be split.
- min_samples_leaf (the minimum number of samples a leaf node must have).
- ...

# Ensemble Methods

By aggregating the predictions of a group of predictors (such as classifiers or regressors), one will often get better predictions than with the best individual predictor. A group of predictors is called an ensemble.

As an example of an Ensemble method, one can train a group of Decision Tree classifiers, each on a different random subset of the training set. To make predictions,
you obtain the predictions of all the individual trees, then predict the class that gets the
most votes. Such an ensemble of Decision Trees is
called a Random Forest.

The following are the most popular ensemble methods:
1) Voting Classifiers
- Suppose you have trained a few classifiers, each achieving about 80% accuracy. You may have trained a Logistic Regression classifier, a SVM classifier, a Random Forest classifier, and a few more. A prediction from each classifier is considered a vote, and the class with the most votes is the final prediction. 
2) Average/Weighted Average
- For regression problems, one can train multiple models and obtain a weighted average value of their predictions. This weighted value will be the final prediction.

Ensemble methods work best when the predictors are as independent from one another as possible. One
way to get diverse classifiers is to train them using very different algorithms. This increases the chance
that they will make very different types of errors, improving the ensemble’s accuracy.

3) Bagging and Pasting
- Another approach is to use the same training algorithm for every predictor but train them on different random subsets of the training set. When sampling is performed with replacement (the observation is returned to the population before the next one is drawn) it is called **bagging**. When sampling is performed without replacement is it called **pasting**. 
- With bagging, some instances may be sampled several times while others may not be sampled at all. Those instances that are not sampled for training can be used as our validation set. This is called **Out-of-bag Evaluation**. 

**Random Forests** is just an ensemble of Decision Trees, generally trained via the bagging (or sometimes pasting) method. Instead of searching for the very best feature when splitting a node, it searches for the best feature among a random subset of features. The algorithm results in greater
tree diversity.

4) AdaBoost
- The general idea of most boosting methods is to train predictors sequentially, each trying to correct its predecessor. T