New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ranger "classification" mode still looks like "probability" #546
Comments
That is correct! We default to probability forests because that is more aligned with how the rest of tidymodels makes predictions. library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
data(cells)
library(ranger)
tidymodels_prefer()
cells$case = NULL
rf_spec <- rand_forest(mode = "classification")
rf_form <- class ~ .
workflow(rf_form, rf_spec) %>% fit(data = cells)
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: rand_forest()
#>
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> class ~ .
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#> Ranger result
#>
#> Call:
#> ranger::ranger(x = maybe_data_frame(x), y = y, num.threads = 1, verbose = FALSE, seed = sample.int(10^5, 1), probability = TRUE)
#>
#> Type: Probability estimation
#> Number of trees: 500
#> Sample size: 2019
#> Number of independent variables: 56
#> Mtry: 7
#> Target node size: 10
#> Variable importance mode: none
#> Splitrule: gini
#> OOB prediction error (Brier s.): 0.1202317
ranger(x = cells[2:57], y = cells$class, probability = TRUE)
#> Ranger result
#>
#> Call:
#> ranger(x = cells[2:57], y = cells$class, probability = TRUE)
#>
#> Type: Probability estimation
#> Number of trees: 500
#> Sample size: 2019
#> Number of independent variables: 56
#> Mtry: 7
#> Target node size: 10
#> Variable importance mode: none
#> Splitrule: gini
#> OOB prediction error (Brier s.): 0.120404 Created on 2021-08-26 by the reprex package (v2.0.1) If you do not want to fit a probability tree and do not want to be able to make class probability predictions, you can set that as an engine argument: library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
data(cells)
library(ranger)
tidymodels_prefer()
cells$case = NULL
rf_spec <- rand_forest() %>%
set_engine("ranger", probability = FALSE) %>%
set_mode("classification")
rf_form <- class ~ .
workflow(rf_form, rf_spec) %>% fit(data = cells)
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: rand_forest()
#>
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> class ~ .
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#> Ranger result
#>
#> Call:
#> ranger::ranger(x = maybe_data_frame(x), y = y, probability = ~FALSE, num.threads = 1, verbose = FALSE, seed = sample.int(10^5, 1))
#>
#> Type: Classification
#> Number of trees: 500
#> Sample size: 2019
#> Number of independent variables: 56
#> Mtry: 7
#> Target node size: 1
#> Variable importance mode: none
#> Splitrule: gini
#> OOB prediction error: 16.84 %
ranger(x = cells[2:57], y = cells$class, probability = FALSE)
#> Ranger result
#>
#> Call:
#> ranger(x = cells[2:57], y = cells$class, probability = FALSE)
#>
#> Type: Classification
#> Number of trees: 500
#> Sample size: 2019
#> Number of independent variables: 56
#> Mtry: 7
#> Target node size: 1
#> Variable importance mode: none
#> Splitrule: gini
#> OOB prediction error: 16.99 % Created on 2021-08-26 by the reprex package (v2.0.1) |
Great, thanks @juliasilge !! I guess that makes sense. For me, a first pass random forest classification tree I find the OOB errors give me a good idea of the expected AUC oftentimes, but I'd like to keep the tidymodels workflow everywhere to keep consistency and help with muscle memory. |
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
The problem
When using a parsnip model specification and "ranger" classification, it looks like the "ranger" arguments are still "probability" instead of "classification".
Reproducible example
Created on 2021-08-25 by the reprex package (v2.0.0)
The text was updated successfully, but these errors were encountered: