Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parsnip should raise an error when users use logistic_reg() for multiclass classification #545

Closed
zenggyu opened this issue Aug 23, 2021 · 2 comments · Fixed by #916
Closed
Labels
documentation feature a feature request or enhancement

Comments

@zenggyu
Copy link

zenggyu commented Aug 23, 2021

Currently parsnip does not throw an error if a user chooses logistic_reg() to fit a multiclass classification model, but the result is wrong and misleading. Moreover, the documentation doesn't mention that logistic_reg() should only be used for binary classification. It took me a while to figure out that I should use multinom_reg() instead. It would be nice if parsnip can prevent this usage explicitly.

library(parsnip)
fit <- logistic_reg() %>%
  set_mode("classification") %>%
  set_engine("glm") %>%
  fit(Species ~ ., data = iris)
#> Warning: glm.fit: algorithm did not converge
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
predict(fit, new_data = iris)
#> # A tibble: 150 x 1
#>    .pred_class
#>    <fct>      
#>  1 setosa     
#>  2 setosa     
#>  3 setosa     
#>  4 setosa     
#>  5 setosa     
#>  6 setosa     
#>  7 setosa     
#>  8 setosa     
#>  9 setosa     
#> 10 setosa     
#> # … with 140 more rows

Created on 2021-08-23 by the reprex package (v2.0.0)

@juliasilge
Copy link
Member

We recently revamped our documentation so that logistic_reg() says:

logistic_reg() defines a generalized linear model for binary outcomes. A linear combination of the predictors is used to model the log odds of an event.

We are already highlighting that it is for binary outcomes but we could add a pointer to multinom_reg() on that landing page.

Sidenote: Should we also mention that some engines like xgboost and ranger handle multiclass classification on their own? Probably not here.

In translate.logistic_reg we could check for a two-factor outcome.

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation feature a feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants