# Decision Tree

It is a type of supervised learning algorithm that is mostly used for classification problems. Surprisingly, it works for both categorical and continuous dependent variables. In this algorithm, we split the population into two or more homogeneous sets. This is done based on most significant attributes/ independent variables to make as distinct groups as possible.

### Decision Tree Regressor

In [2]:
import pandas as pd

In [3]:
df = pd.read_csv('50_Startups.csv')
df.head()

Unnamed: 0,R&D Spend,Administration,Marketing Spend,State,Profit
0,165349.2,136897.8,471784.1,New York,192261.83
1,162597.7,151377.59,443898.53,California,191792.06
2,153441.51,101145.55,407934.54,Florida,191050.39
3,144372.41,118671.85,383199.62,New York,182901.99
4,142107.34,91391.77,366168.42,Florida,166187.94


In [4]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

In [5]:
df['State']=le.fit_transform(df['State'])

In [6]:
df.head(2)

Unnamed: 0,R&D Spend,Administration,Marketing Spend,State,Profit
0,165349.2,136897.8,471784.1,2,192261.83
1,162597.7,151377.59,443898.53,0,191792.06


In [7]:
x=df.iloc[::-1]
y=df.iloc[:,-1]

In [8]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.20)

In [9]:
from sklearn.tree import DecisionTreeRegressor
model=DecisionTreeRegressor()

In [10]:
model.fit(x_train,y_train)

DecisionTreeRegressor()

In [11]:
model.predict(x_test)

array([ 96712.8 ,  81005.76, 110352.25,  71498.49,  99937.59, 134307.35,
        71498.49,  71498.49, 103282.38,  81005.76])

## Decision Tree Classifer

In [13]:
import pandas as pd
data = pd.read_csv('gini_index.csv')
data.head()

Unnamed: 0,outlook,temp,humidity,wind,decision
0,sunny,hot,high,weak,no
1,sunny,hot,high,strong,no
2,overcast,hot,high,weak,yes
3,rain,mild,high,weak,yes
4,rain,cool,normal,weak,yes


In [14]:
data.shape

(14, 5)

In [15]:
data.columns

Index(['outlook', 'temp', 'humidity', 'wind', 'decision'], dtype='object')

In [16]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

In [15]:
categorical_columns = ['outlook', 'temp', 'humidity', 'wind', 'decision'] 
# I would recommend using columns names here if you're using pandas. If you're using numpy then stick with range(n) instead

for column in categorical_columns:
    le = LabelEncoder()
    data[column] = le.fit_transform(data[column])  
# if numpy instead of pandas use X[:, column] instead

In [16]:
data

Unnamed: 0,outlook,temp,humidity,wind,decision
0,2,1,0,1,0
1,2,1,0,0,0
2,0,1,0,1,1
3,1,2,0,1,1
4,1,0,1,1,1
5,1,0,1,0,0
6,0,0,1,0,1
7,2,2,0,1,0
8,2,0,1,1,1
9,1,2,1,1,1


In [17]:
x=data.iloc[:,:-1]
y=data.iloc[:,-1]

In [18]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.20)

In [19]:
from sklearn.tree import DecisionTreeClassifier
dtc=DecisionTreeClassifier()

In [20]:
dtc.fit(x_train,y_train)

DecisionTreeClassifier()

In [21]:
dtc.predict(x_test)

array([0, 0, 1])

![](dt1.jpg)

![](dt2.jpg)

![](dt3.jpg)

![](dt4.jpg)

In [22]:
data

Unnamed: 0,outlook,temp,humidity,wind,decision
0,2,1,0,1,0
1,2,1,0,0,0
2,0,1,0,1,1
3,1,2,0,1,1
4,1,0,1,1,1
5,1,0,1,0,0
6,0,0,1,0,1
7,2,2,0,1,0
8,2,0,1,1,1
9,1,2,1,1,1


### Entropy formula

![](ef1.png)

a) Entropy using the frequency table of one attribute:

![](ef.png)

b) Entropy using the frequency table of two attributes:

![](ef2.png)


Step 1: Calculate entropy of the target. 

![](ef0.png)

Step 2: The dataset is then split on the different attributes. The entropy for each branch is calculated. The resulting entropy is subtracted from the entropy before the split. The result is the Information Gain.

![](Entropy_attributes.png)

![](ef5.png)

Step 3: Choose attribute with the largest information gain as the decision node, divide the dataset by its branches and repeat the same process on every branch.

![](ef6.png)

![](ef7.png)

Step 4a: A branch with entropy of 0 is a leaf node.

![](ef8.png)

Step 4b: A branch with entropy more than 0 needs further splitting.

![](ef9.png)

Step 5: The ID3 algorithm is run recursively on the non-leaf branches, until all data is classified.

		
### Decision Trees - Issues
- Working with continuous attributes (binning)
- Avoiding overfitting
- Super Attributes (attributes with many unique values)
- Working with missing values

![](ef3.png)

## Gini Index

So as the first step we will find the root node of our decision tree. For that Calculate the Gini index of the class variable
- Gini(S) = 1 - [(9/14)² + (5/14)²] = 0.4591

- First, consider case of Outlook

![](gn1.png)

Gini(S, outlook) = (5/14)gini(3,2) + (4/14)*gini(4,0)+ (5/14)*gini(2,3) => (5/14)(1 - (3/5)² - (2/5)²) + (4/14)*0 + (5/14)(1 - (2/5)² - (3/5)²)=> 0.171+0+0.171 => 0.342

Find for all columns
- Choose one that has lower Gini gain. Gini gain is lower for outlook. So we can choose it as our root node.

Repeat the same steps :-

## ---------------------------------------------------------------------

![](e1.jpg)

![](e2.jpg)

![](e3.jpg)

![](e4.jpg)

![](e5.jpg)

![](e6.jpg)

![](e7.jpg)

![](gi1.jpg)

![](gi2.jpg)

![](gi3.jpg)

![](gi4.jpg)

![](gi5.jpg)

### By Nikhil Yadav 