ranger "classification" mode still looks like "probability"

## The problem

When using a parsnip model specification and "ranger" classification, it looks like the "ranger" arguments are still "probability" instead of "classification".

## Reproducible example

``` r
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip
library(modeldata)
library(skimr)
data(cells, package = "modeldata")
library(ranger)
tidymodels_prefer()

cells$case = NULL
set.seed(1234)
ranger(class ~ ., data = cells, min.node.size = 10, classification = TRUE)
#> Ranger result
#> 
#> Call:
#>  ranger(class ~ ., data = cells, min.node.size = 10, classification = TRUE) 
#> 
#> Type:                             Classification 
#> Number of trees:                  500 
#> Sample size:                      2019 
#> Number of independent variables:  56 
#> Mtry:                             7 
#> Target node size:                 10 
#> Variable importance mode:         none 
#> Splitrule:                        gini 
#> OOB prediction error:             17.29 %


# OK, so we have an OOB error of 0.17, which isn't too shabby!
# Now lets run via tidymodels.

rf_spec = rand_forest() %>%
  set_engine("ranger") %>%
  set_mode("classification")

rf_recipe = recipe(class ~ ., data = cells) %>%
  step_dummy(class, -class)

set.seed(1234)
workflow() %>%
  add_recipe(rf_recipe) %>%
  add_model(rf_spec) %>%
  fit(data = cells)
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: rand_forest()
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 1 Recipe Step
#> 
#> • step_dummy()
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#> Ranger result
#> 
#> Call:
#>  ranger::ranger(x = maybe_data_frame(x), y = y, num.threads = 1,      verbose = FALSE, seed = sample.int(10^5, 1), probability = TRUE) 
#> 
#> Type:                             Probability estimation 
#> Number of trees:                  500 
#> Sample size:                      2019 
#> Number of independent variables:  56 
#> Mtry:                             7 
#> Target node size:                 10 
#> Variable importance mode:         none 
#> Splitrule:                        gini 
#> OOB prediction error (Brier s.):  0.1198456

# This is important! The fit object from tidymodels shows a "probability", when we specifically asked for "classification"!
# This leads to a lower OOB error.
# It isn't a *lot* lower here, but in other datasets it can make a big difference (i.e. 20% vs 45%).

# Now lets run again using ranger and "probability".

set.seed(1234)
ranger(class ~ ., data = cells, min.node.size = 10, probability = TRUE)
#> Ranger result
#> 
#> Call:
#>  ranger(class ~ ., data = cells, min.node.size = 10, probability = TRUE) 
#> 
#> Type:                             Probability estimation 
#> Number of trees:                  500 
#> Sample size:                      2019 
#> Number of independent variables:  56 
#> Mtry:                             7 
#> Target node size:                 10 
#> Variable importance mode:         none 
#> Splitrule:                        gini 
#> OOB prediction error (Brier s.):  0.119976

# Now the OOB error is down at the same level as the tidymodels result. 
# Also, the default for a "ranger" classification model has a "min.node.size = 1", whereas the tidymodels shows "10".
```

<sup>Created on 2021-08-25 by the [reprex package](https://reprex.tidyverse.org) (v2.0.0)</sup>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ranger "classification" mode still looks like "probability" #546

The problem

Reproducible example

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ranger "classification" mode still looks like "probability" #546

Description

The problem

Reproducible example

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions