Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional checks in set_mode prevent improper use of model specs #467

Closed
jtlandis opened this issue Apr 8, 2021 · 2 comments
Closed

Additional checks in set_mode prevent improper use of model specs #467

jtlandis opened this issue Apr 8, 2021 · 2 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@jtlandis
Copy link
Contributor

jtlandis commented Apr 8, 2021

Feature

I want to preface this by saying I'm unsure if this is technically a bug or a feature request.

I think parsnip could benefit from adding an additional check in set_mode to ensure that the user does not add an improper mode to a model Specification. For example, when initializing a linear_reg specification, it will check if mode is either c("unknown", "regression"):

library(tidymodels)

# correct specification
mod1 <- linear_reg() %>%
  set_engine('lm') 

mod1
#> Linear Regression Model Specification (regression)
#> 
#> Computational engine: lm

#returns an error - lienar_reg cannot be used for classification
mod_error <- linear_reg('classification') %>%
  set_engine('lm')
#> Error: `mode` should be one of: 'unknown', 'regression'

However, the user can still do set_mode("classification") without any warnings or errors.

mod2 <- mod1 %>%
  set_mode('classification') #should this be allowed?

mod2
#> Linear Regression Model Specification (classification)
#> 
#> Computational engine: lm

As a result, the user gets a really confusing downstream error when trying to use that "classification" model specification:

mod2 %>%
  fit(Species ~ ., data = iris)
#> Error: formula_ is unknown.

To someone who has just discovered tidymodels and has followed many online tutorials in which setting your mode with set_mode() seems to be a common API practice, they would be under the impression that mod2 is a perfectly acceptable model spec.

I would imagine it isn't intended for users to add an incompatible mode to a model specification. So, adding something like the following function to validate proper spec modes might be desirable:

check_valid_spec_mode <- function(model, mode) {
    spec_modes <- rlang::env_get(get_model_env(), paste0(model, "_modes"))
        if (!(mode %in% spec_modes)) 
            rlang::abort(glue::glue("`mode` should be one of: ", 
                glue::glue_collapse(glue::glue("'{spec_modes}'"), 
                    sep = ", ")))
    invisible(TRUE)
}

Incase the reader was curious, I found the source of the downstream error. The"formula_" comes from a switch statement in fit.model_spec, which is attributed to no fit$interface existing on object$method. For example, add_method is called, which then calls get_model_spec, which is then unable to find a record in linear_reg_fit of a mode equivalent to "classification" and an engine of "lm".

If the above check was in place, the user would have more feedback on how to use the model specification functions as they were intended. Similar to how tidymodels will not allow you to do a regression model where the outcome is a factor.

mod1 %>%
  fit(Species ~ ., data = iris)
#> Error: For a regression model, the outcome should be numeric.

#explicitly change Species to numeric for linear model of a factor.

mod1 %>% fit(as.numeric(Species) ~ ., data = iris)
#> parsnip model object
#> 
#> Fit time:  2ms 
#> 
#> Call:
#> stats::lm(formula = as.numeric(Species) ~ ., data = data)
#> 
#> Coefficients:
#> (Intercept)  Sepal.Length   Sepal.Width  Petal.Length   Petal.Width  
#>      1.18650      -0.11191      -0.04008       0.22865       0.60925  

If this is something the tidymodels team thinks should be added, I wouldn't mind starting a pull request.

@topepo
Copy link
Member

topepo commented Apr 13, 2021

A PR for check_valid_spec_mode() would be great.

I think that there is a separate bug. When parsnip determines how the model should be fitted, it is not picking up the right information during fit().

@topepo topepo added the bug an unexpected problem or unintended behavior label Apr 13, 2021
@jtlandis jtlandis closed this as completed May 3, 2021
@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators May 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants