## Random Forests - Visualization

In [None]:
import sklearn.datasets as datasets
import pandas as pd
import numpy as np
# import pydotplus

from sklearn.ensemble import RandomForestClassifier
from sklearn.externals.six import StringIO  
from IPython.display import Image  
from sklearn.tree import export_graphviz
from IPython.core.display import display, HTML

import RandomForestHelper as RFH

%config IPCompleter.greedy=True

## Example

<img alt="" src="Random-Forest-Introduction.jpg" style="width:900px" />

### Main Parameters for a Random Forest:

* **n_estimators** - number of trees

* **max_features** - maximal number of features used when learning a split.

* **max_depth** - Maximal tree depth

For a full description of all parameters see [Reference](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)

## The Iris flower dataset

<ul>
    <li>Consists of 50 samples from each of the three flower species of Iris (Iris setosa, Iris virginica and Iris versicolor). </li>
    <li>Four features present for each sample: the length and the width of the sepals and petals, in centimetres. </li>
</ul>

In [None]:
#Loading the Iris dataset from sklearn
iris=datasets.load_iris()
df=pd.DataFrame(iris.data, columns=iris.feature_names)
y=iris.target

In [None]:
#IRIS Dataset
print(df.sample(10))

In [None]:
#Sampling 10 random samples
sample = df.sample(5)
sample

## The random forest classifier

Parameters being changed:
1. **Max_depth**: The deepest level to which a decision tree can go to
2. **n_estimators**: The number of decision trees that are generated and eventually used for majority voting

In [None]:
rf = RandomForestClassifier(max_depth=2, n_estimators=7)
rf = rf.fit(df,y)

## Predicting Category

1. Different decision trees are generated from the input data.
2. For each sample, the category is decided based on the decision tree.
3. The final prediction is done by some form of ensemble learning on the outputs of the different trees. In this scenario, we use majority voting (Category that gets the most votes is predicted).

In [None]:
pred = RFH.predict_category(rf, sample)
html = RFH.generate_html(rf, sample, pred)

In [None]:
HTML(html)

## Visualizing a specific Decision Tree from a Random Forest

In [None]:
legend = """Attributes: 
    <ul>
        <li><b>X0, X1, X2, X3</b>: sepal length, sepal width, petal length, petal width</li>
        <li><b>Samples</b>: Number of samples considered for building this decision tree</li>
        <li><b>Value</b>: Number of samples in each of the three categories of flowers in order</li>
    </ul>
    """

In [None]:
RFH.generate_tree(rf, 0, height=500, width=400) #ith Decision Tree: Where i is the Decision Tree number according to the above table)
HTML(legend)

In [None]:
#To get information on ith value in the pandas dataframe
df.iloc[77]

**You are encouraged to explore the RandomForestCalssifier by altering parameters passed to it.**