# Decision Trees

Decision tree methods are a common baseline model for classification tasks due to their visual appeal and high interpretability. This module walks you through the theory behind decision trees and a few hands-on examples of building decision tree models for classification. You will realize the main pros and cons of these techniques. This background will be useful when you are presented with decision tree ensembles in the next module.

## Learning Objectives
- Describe and use decision trees and decision-tree ensemble models for classification
- Identify and implement common ensemble models for classification, including bagging, boosting, stacking, and random forest.
- Become familiarized with the pros and cons of decision tree methods
- Build decision trees models with sklearn

## Overview of Classifiers

### Learning Goals

In this section, we will cover:
- Overview of Classification problems
- Decision Tree Classification algorithm
- Splitting Decision Trees: entropy and information gain
- Pruning Decision Trees to address overfitting

## Introduction to Decision Trees

![](./images/36_DecisionTree.png)

![](./images/37_DCContinuousValues.png)

![](./images/38_DCCV_depth.png)



## Building a Decision Tree

![](./images/39_DCDepth.png)

![](./images/40_bestDC.png)

![](./images/41_Spliting.png)

$$
E(t) = 1 - max_i[p(i|t)]
$$

## Entropy-based Splitting

![](./images/43_EntropySpliting.png)

$$
H(t) = - \sum{i=1}{n}p(i|t)log_2[p(i|t)]
$$

![](./images/44_EntropyErrorFunction.png)

We can then take the weighted average of each as we did with that classification error, and we see that we have decreased entropy by 0.0441. So here, rather than the entropy being exactly the same, when we take the weighted average according to how much of the subset from our parent node, one into each one of the child nodes, we see that the entropy has decreased by 0.0441.

![](./images/45_EntropySumary.png)


## Other Decision Tree Splitting Criteria

![](./images/46_ClassificationErrorVsEntropy.png)

![](./images/47_vsEntropy.png)

![](./images/48_InfomationGain.png)

![](./images/49_GiniIndex.png)


## Pros and Cons of Decision Trees

![](./images/50_HighVariance.png)

![](./images/51_Strengths.png)

## Decision TreeClassifier: The Syntax
```python

# Import the class containing the classification method
from sklearn. tree import DecisionTreeClassifier

# Create an instance of the class
DTC = DecisionTreeClassifier (criterion= 'Gini', max features=10, max depth=5) # tree parameters

# Fit the instance on the data and then predict the expected value
DTC = DTC. fit (X train, y train)
y_predict = DTC. predict (X test)

#Tune parameters with cross-validation. Use DecisionTreeRegressor for regression.
```

# Summary/Review

Decision trees split your data using impurity measures. They are a greedy algorithm and are not based on statistical assumptions.

The most common splitting impurity measures are Entropy and Gini index.Decision trees tend to overfit and to be very sensitive to different data.

Cross validation and pruning sometimes help with some of this.

Great advantages of decision trees are that they are really easy to interpret and require no data preprocessing.  

