# Decision Tree

In [1]:
using DecisionTree

## Load data

In [2]:
features, labels = DecisionTree.load_data("iris");

## Casting

In [3]:
features = float.(features)
labels = string.(labels)

150-element Array{String,1}:
 "Iris-setosa"   
 "Iris-setosa"   
 "Iris-setosa"   
 "Iris-setosa"   
 "Iris-setosa"   
 "Iris-setosa"   
 "Iris-setosa"   
 "Iris-setosa"   
 "Iris-setosa"   
 "Iris-setosa"   
 "Iris-setosa"   
 "Iris-setosa"   
 "Iris-setosa"   
 ⋮               
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"

## Model

In [4]:
model = DecisionTree.DecisionTreeClassifier(max_depth=2)

DecisionTreeClassifier
max_depth:                2
min_samples_leaf:         1
min_samples_split:        2
min_purity_increase:      0.0
pruning_purity_threshold: 1.0
n_subfeatures:            0
classes:                  root:                     

nothing
nothing

Available models:

* `DecisionTreeClassifier`
* `DecisionTreeRegressor`
* `RandomForestClassifier`
* `RandomForestRegressor`
* `AdaBoostStumpClassifier`

In [5]:
?DecisionTreeClassifier

search: [0m[1mD[22m[0m[1me[22m[0m[1mc[22m[0m[1mi[22m[0m[1ms[22m[0m[1mi[22m[0m[1mo[22m[0m[1mn[22m[0m[1mT[22m[0m[1mr[22m[0m[1me[22m[0m[1me[22m[0m[1mC[22m[0m[1ml[22m[0m[1ma[22m[0m[1ms[22m[0m[1ms[22m[0m[1mi[22m[0m[1mf[22m[0m[1mi[22m[0m[1me[22m[0m[1mr[22m



```
DecisionTreeClassifier(; pruning_purity_threshold=0.0,
                       max_depth::Int=-1,
                       min_samples_leaf::Int=1,
                       min_samples_split::Int=2,
                       min_purity_increase::Float=0.0,
                       n_subfeatures::Int=0,
                       rng=Random.GLOBAL_RNG)
```

Decision tree classifier. See [DecisionTree.jl's documentation](https://github.com/bensadeghi/DecisionTree.jl)

Hyperparameters:

  * `pruning_purity_threshold`: (post-pruning) merge leaves having `>=thresh` combined purity (default: no pruning)
  * `max_depth`: maximum depth of the decision tree (default: no maximum)
  * `min_samples_leaf`: the minimum number of samples each leaf needs to have (default: 1)
  * `min_samples_split`: the minimum number of samples in needed for a split (default: 2)
  * `min_purity_increase`: minimum purity needed for a split (default: 0.0)
  * `n_subfeatures`: number of features to select at random (default: keep all)
  * `rng`: the random number generator to use. Can be an `Int`, which will be used to seed and create a new random number generator.

Implements `fit!`, `predict`, `predict_proba`, `get_classes`


## Training

In [6]:
DecisionTree.fit!(model, features, labels)

DecisionTreeClassifier
max_depth:                2
min_samples_leaf:         1
min_samples_split:        2
min_purity_increase:      0.0
pruning_purity_threshold: 1.0
n_subfeatures:            0
classes:                  root:                     

["Iris-setosa", "Iris-versicolor", "Iris-virginica"]
Decision Tree
Leaves: 3
Depth:  2

### pretty print of the tree, to a depth of 5 nodes

In [7]:
DecisionTree.print_tree(model, 5)

Feature 4, Threshold 0.8
L-> Iris-setosa : 50/50
R-> Feature 4, Threshold 1.75
    L-> Iris-versicolor : 49/54
    R-> Iris-virginica : 45/46


## Prediction

In [8]:
new_iris = [5.9, 3.0, 5.1, 1.9]
DecisionTree.predict(model, new_iris)

"Iris-virginica"

In [9]:
DecisionTree.predict_proba(model, new_iris)

3-element Array{Float64,1}:
 0.0                 
 0.021739130434782608
 0.9782608695652174  

### the ordering of the columns in `predict_proba`'s output

In [10]:
DecisionTree.get_classes(model)

3-element Array{String,1}:
 "Iris-setosa"    
 "Iris-versicolor"
 "Iris-virginica" 

## Save model

In [11]:
using JLD2

In [12]:
@save "models/decision-tree.jld2" model