Feature
I want to preface this by saying I'm unsure if this is technically a bug or a feature request.
I think parsnip could benefit from adding an additional check in set_mode to ensure that the user does not add an improper mode to a model Specification. For example, when initializing a linear_reg specification, it will check if mode is either c("unknown", "regression"):
library(tidymodels)
# correct specification
mod1 <- linear_reg() %>%
set_engine('lm')
mod1
#> Linear Regression Model Specification (regression)
#>
#> Computational engine: lm
#returns an error - lienar_reg cannot be used for classification
mod_error <- linear_reg('classification') %>%
set_engine('lm')
#> Error: `mode` should be one of: 'unknown', 'regression'
However, the user can still do set_mode("classification") without any warnings or errors.
mod2 <- mod1 %>%
set_mode('classification') #should this be allowed?
mod2
#> Linear Regression Model Specification (classification)
#>
#> Computational engine: lm
As a result, the user gets a really confusing downstream error when trying to use that "classification" model specification:
mod2 %>%
fit(Species ~ ., data = iris)
#> Error: formula_ is unknown.
To someone who has just discovered tidymodels and has followed many online tutorials in which setting your mode with set_mode() seems to be a common API practice, they would be under the impression that mod2 is a perfectly acceptable model spec.
I would imagine it isn't intended for users to add an incompatible mode to a model specification. So, adding something like the following function to validate proper spec modes might be desirable:
check_valid_spec_mode <- function(model, mode) {
spec_modes <- rlang::env_get(get_model_env(), paste0(model, "_modes"))
if (!(mode %in% spec_modes))
rlang::abort(glue::glue("`mode` should be one of: ",
glue::glue_collapse(glue::glue("'{spec_modes}'"),
sep = ", ")))
invisible(TRUE)
}
Incase the reader was curious, I found the source of the downstream error. The"formula_" comes from a switch statement in fit.model_spec, which is attributed to no fit$interface existing on object$method. For example, add_method is called, which then calls get_model_spec, which is then unable to find a record in linear_reg_fit of a mode equivalent to "classification" and an engine of "lm".
If the above check was in place, the user would have more feedback on how to use the model specification functions as they were intended. Similar to how tidymodels will not allow you to do a regression model where the outcome is a factor.
mod1 %>%
fit(Species ~ ., data = iris)
#> Error: For a regression model, the outcome should be numeric.
#explicitly change Species to numeric for linear model of a factor.
mod1 %>% fit(as.numeric(Species) ~ ., data = iris)
#> parsnip model object
#>
#> Fit time: 2ms
#>
#> Call:
#> stats::lm(formula = as.numeric(Species) ~ ., data = data)
#>
#> Coefficients:
#> (Intercept) Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 1.18650 -0.11191 -0.04008 0.22865 0.60925
If this is something the tidymodels team thinks should be added, I wouldn't mind starting a pull request.
Feature
I want to preface this by saying I'm unsure if this is technically a bug or a feature request.
I think
parsnipcould benefit from adding an additional check inset_modeto ensure that the user does not add an improper mode to a model Specification. For example, when initializing alinear_regspecification, it will check ifmodeis eitherc("unknown", "regression"):However, the user can still do
set_mode("classification")without any warnings or errors.As a result, the user gets a really confusing downstream error when trying to use that "classification" model specification:
To someone who has just discovered
tidymodelsand has followed many online tutorials in which setting your mode withset_mode()seems to be a common API practice, they would be under the impression thatmod2is a perfectly acceptable model spec.I would imagine it isn't intended for users to add an incompatible mode to a model specification. So, adding something like the following function to validate proper spec modes might be desirable:
Incase the reader was curious, I found the source of the downstream error. The"
formula_" comes from aswitchstatement infit.model_spec, which is attributed to nofit$interfaceexisting onobject$method. For example,add_methodis called, which then callsget_model_spec, which is then unable to find a record inlinear_reg_fitof a mode equivalent to "classification" and an engine of "lm".If the above check was in place, the user would have more feedback on how to use the model specification functions as they were intended. Similar to how
tidymodelswill not allow you to do a regression model where the outcome is a factor.If this is something the tidymodels team thinks should be added, I wouldn't mind starting a pull request.