# Random Forest

In [1]:
using DecisionTree

## Load data

In [2]:
features, labels = DecisionTree.load_data("iris");

## Casting

In [3]:
features = float.(features)
labels = string.(labels)

150-element Array{String,1}:
 "Iris-setosa"   
 "Iris-setosa"   
 "Iris-setosa"   
 "Iris-setosa"   
 "Iris-setosa"   
 "Iris-setosa"   
 "Iris-setosa"   
 "Iris-setosa"   
 "Iris-setosa"   
 "Iris-setosa"   
 "Iris-setosa"   
 "Iris-setosa"   
 "Iris-setosa"   
 ⋮               
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"

## Model

In [4]:
model = DecisionTree.RandomForestClassifier(n_trees=50, max_depth=2)

RandomForestClassifier
n_trees:             50
n_subfeatures:       -1
partial_sampling:    0.7
max_depth:           2
min_samples_leaf:    1
min_samples_split:   2
min_purity_increase: 0.0
classes:             ensemble:            

nothing
nothing

Available models:

* `DecisionTreeClassifier`
* `DecisionTreeRegressor`
* `RandomForestClassifier`
* `RandomForestRegressor`
* `AdaBoostStumpClassifier`

In [5]:
?RandomForestClassifier

search: [0m[1mR[22m[0m[1ma[22m[0m[1mn[22m[0m[1md[22m[0m[1mo[22m[0m[1mm[22m[0m[1mF[22m[0m[1mo[22m[0m[1mr[22m[0m[1me[22m[0m[1ms[22m[0m[1mt[22m[0m[1mC[22m[0m[1ml[22m[0m[1ma[22m[0m[1ms[22m[0m[1ms[22m[0m[1mi[22m[0m[1mf[22m[0m[1mi[22m[0m[1me[22m[0m[1mr[22m



```
RandomForestClassifier(; n_subfeatures::Int=-1,
                       n_trees::Int=10,
                       partial_sampling::Float=0.7,
                       max_depth::Int=-1,
                       rng=Random.GLOBAL_RNG)
```

Random forest classification. See [DecisionTree.jl's documentation](https://github.com/bensadeghi/DecisionTree.jl)

Hyperparameters:

  * `n_subfeatures`: number of features to consider at random per split (default: -1, sqrt(# features))
  * `n_trees`: number of trees to train (default: 10)
  * `partial_sampling`: fraction of samples to train each tree on (default: 0.7)
  * `max_depth`: maximum depth of the decision trees (default: no maximum)
  * `min_samples_leaf`: the minimum number of samples each leaf needs to have
  * `min_samples_split`: the minimum number of samples in needed for a split
  * `min_purity_increase`: minimum purity needed for a split
  * `rng`: the random number generator to use. Can be an `Int`, which will be used to seed and create a new random number generator.

Implements `fit!`, `predict`, `predict_proba`, `get_classes`


## Training

In [6]:
DecisionTree.fit!(model, features, labels)

["Iris-setosa", "Iris-versicolor", "Iris-virginica"]
Ensemble of Decision Trees

RandomForestClassifier
n_trees:             50
n_subfeatures:       -1
partial_sampling:    0.7
max_depth:           2
min_samples_leaf:    1
min_samples_split:   2
min_purity_increase: 0.0
classes:             ensemble:            


Trees:      50
Avg Leaves: 3.22
Avg Depth:  2.0

## Prediction

In [7]:
new_iris = [5.9, 3.0, 5.1, 1.9]
DecisionTree.predict(model, new_iris)

"Iris-virginica"

In [8]:
DecisionTree.predict_proba(model, new_iris)

3-element Array{Float64,1}:
 0.0 
 0.16
 0.84

### the ordering of the columns in `predict_proba`'s output

In [9]:
DecisionTree.get_classes(model)

3-element Array{String,1}:
 "Iris-setosa"    
 "Iris-versicolor"
 "Iris-virginica" 

## Save model

In [10]:
using JLD2

In [11]:
@save "models/random-forest.jld2" model