Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predict(type = "prob") should error if outcome level is named "class" #720

Closed
simonpcouch opened this issue May 9, 2022 · 1 comment · Fixed by #723
Closed

predict(type = "prob") should error if outcome level is named "class" #720

simonpcouch opened this issue May 9, 2022 · 1 comment · Fixed by #723

Comments

@simonpcouch
Copy link
Contributor

simonpcouch commented May 9, 2022

predict(type = "prob") and predict(type = "class") result in the same column names if the outcome has a level named "class".

library(parsnip)
library(tibble)

x <- tibble(
  class = factor(sample(c("class", "class_1"), 100, replace = TRUE)),
  a = rnorm(100),
  b = rnorm(100)
)

mod <- logistic_reg() %>%
  set_mode(mode = "classification") %>%
  fit(class ~ a + b, data = x)

predict(mod, type = "class", new_data = x)
#> # A tibble: 100 × 1
#>    .pred_class
#>    <fct>      
#>  1 class_1    
#>  2 class_1    
#>  3 class      
#>  4 class_1    
#>  5 class_1    
#>  6 class      
#>  7 class      
#>  8 class      
#>  9 class      
#> 10 class      
#> # … with 90 more rows

predict(mod, type = "prob", new_data = x)
#> # A tibble: 100 × 2
#>    .pred_class .pred_class_1
#>          <dbl>         <dbl>
#>  1       0.498         0.502
#>  2       0.475         0.525
#>  3       0.556         0.444
#>  4       0.457         0.543
#>  5       0.490         0.510
#>  6       0.520         0.480
#>  7       0.516         0.484
#>  8       0.525         0.475
#>  9       0.550         0.450
#> 10       0.562         0.438
#> # … with 90 more rows

Created on 2022-05-09 by the reprex package (v2.0.1)

Some packages downstream from parsnip join these two tibbles together, resulting in issues like tidymodels/stacks#125 and tidymodels/tune#487.

@DavisVaughan and I spent some time with this this morning, and came to the conclusion that erroring in predict(type = "prob") when an outcome level is named "class" is likely the best route here. Erroring in parsnip, before the predictions are generated, means that downstream packages (tune, stacks, possibly elsewhere) need not anticipate this edge case when joining predictions. This also gives us a chance to raise the same (informative) error any time this issue comes up.

This solution doesn't feel very satisfying. Some alternatives:

These didn't sound very satisfying either.🤷

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators May 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant