Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion to use tidyselect as recipes selection API #572

Closed
mattwarkentin opened this issue Sep 18, 2020 · 12 comments · Fixed by #595
Closed

Suggestion to use tidyselect as recipes selection API #572

mattwarkentin opened this issue Sep 18, 2020 · 12 comments · Fixed by #595
Labels
feature a feature request or enhancement

Comments

@mattwarkentin
Copy link
Contributor

mattwarkentin commented Sep 18, 2020

Hi,

I love the {recipes} package but I still find myself confused about the best way to select variables for a given step. I often forget whether the variables selected for a step are the intersection or the union of the selectors passed to .... I also find it to be a slightly extra amount of mental friction that often I find myself selecting variables I want by defining the variables I don't want.

For example, if I want to apply a step to all numeric predictors, the way I usually do it is by selecting numeric NOT outcomes. It just feels a little backwards to me.

recipe(mpg ~ ., data = mtcars) %>%
  step_normalize(all_numeric(), -all_outcomes())

I'm wondering if it should be possible to nest the selector functions to form AND statements, and then comma separated selectors passed to ... be the OR statements. The suggested API would look something like:

recipe(mpg ~ ., data = mtcars) %>%
  step_normalize(all_predictors(all_numeric()))

# or via the pipe
recipe(mpg ~ ., data = mtcars) %>%
  step_normalize(all_numeric() %>% all_predictors())

This may just be a me problem, but I think it could be a nice addition and allow for more intuitive specifications of variable selections within steps. It wouldn't be a breaking change, I don't think, the selectors would just need to accept ... as an arbitrary number of additional selectors.

Perhaps to really ramp up the selection possibilities, the role selectors could even accept tidy selectors:

recipe(mpg ~ ., data = mtcars) %>%
  step_normalize(all_predictors(ends_with("p"))) # select disp and hp among predictors
@mattwarkentin
Copy link
Contributor Author

Also, I could probably spin up a PR if this seems like a promising idea...

@topepo
Copy link
Member

topepo commented Sep 18, 2020

We are going to do a larger review of how tidyselect is used with recipes. Our use here is pretty rudimentary since it was based on the original version of that package. It has a lot more features at this point.

I think that the changes that come out of that would solve this issue.

Do you feel like being about to use & and | would be more intuitive that what is currently available? Those don't currently work but we might be able to use them.

@mattwarkentin
Copy link
Contributor Author

mattwarkentin commented Sep 18, 2020

Thanks for the reply, Max.

Do you feel like being about to use & and | would be more intuitive that what is currently available? Those don't currently work but we might be able to use them.

I actually thought about exactly this as a possible solution right after submitting my original post. I think being able to use the full suite of tidy selection helpers that are available within dplyr::select() would be optimal and consolidate the selection APIs. This could include the operators like !, :, &, |, and c().

In addition to the main selection helpers (e.g. starts_with(), etc.), adding support for the newly added where() helper would work nicely if there were some new functions like is.predictor(), is.outcome(), is.nominal(), has_role(), etc. I'm just thinking out loud at this point. I definitely like the idea of using a consistent selection API across both {dplyr} and {recipes}.

EDIT: Since there already exists a whole bunch of good is.*() functions, this would expand the {recipes} selection API greatly (e.g. is.character(), is.numeric(), is.integer(), is.double(), is.logical(), etc.)

@kylegilde
Copy link

kylegilde commented Sep 28, 2020

@topepo
At a minimum, being able to use & and | with the current selection functions would be a huge convenience & improvement. Using & and | would be super intuitive!

@mattwarkentin
Copy link
Contributor Author

The more I think about this the more I like the idea of just fully embracing the {tidyselect} select helpers as the only selection API for {recipes}. It would create more cohesion between {recipes} and the {tidyverse}.

It would require some rewrites for terms_select using tidyselect::eval_select() and would require the creation of (at least) is.predictor() and is.outcome(), which should fail outside of a recipe pipeline unlike the other is.* predicates.

If you decided to soft-deprecate some of the old selectors, you could avoid breaking changes by rewriting the all_* functions in recipes to essentially return the equivalent tidyselect code. Something like:

all_numeric <- function() expr(where(is.numeric))

@mattwarkentin mattwarkentin changed the title Suggestion to enhance role selectors to accept nested selectors Suggestion to use tidyselect as recipes selection API Sep 30, 2020
@topepo
Copy link
Member

topepo commented Oct 1, 2020

@lionel- is giving this a look and will be giving us technical feedback on how we can/should update recipes.

@lionel-
Copy link
Collaborator

lionel- commented Oct 2, 2020

Do you feel like being about to use & and | would be more intuitive that what is currently available? Those don't currently work but we might be able to use them.

Maybe I'm missing something but these tidyselect features should already work as long as you have tidyselect 1.0 installed:

recipe(mpg ~ ., data = mtcars) %>%
  step_normalize(all_numeric() & !all_outcomes())
#> Data Recipe
#>
#> Inputs:
#>
#>       role #variables
#>    outcome          1
#>  predictor         10
#>
#> Operations:
#>
#> Centering and scaling for all_numeric() & !all_outcomes()

You can also use all_of() and any_of(). The only feature that requires changing the backend is where(). Changing the backend should be very simple and doesn't require any changes to the user interface.

@lionel-
Copy link
Collaborator

lionel- commented Oct 2, 2020

Maybe I'm missing something

@DavisVaughan informed me that & and | eventually produce an error because they are not part of the recipes:::selectors whitelist. Simply whitelisting them should be sufficient to add support. They should work with the existing tidymodels selections helpers without changes.

@juliasilge juliasilge added the feature a feature request or enhancement label Oct 2, 2020
@mattwarkentin
Copy link
Contributor Author

Thanks for looking into this so quickly, @lionel-.

@topepo Please let me know if there is any way I can contribute to this proposed update.

@kylegilde
Copy link

Hello,

  1. I was wondering if there is a way for the user to update the recipes whitelist with R. This approach doesn't work:

recipes:::selectors <- c(recipes:::selectors, "&")
Error in recipes:::selectors <- c(recipes:::selectors, "&") :
object 'recipes' not found

  1. If I am able to update the whitelist in this way, is this going to be reliable?

Maybe I'm missing something

@DavisVaughan informed me that & and | eventually produce an error because they are not part of the recipes:::selectors whitelist. Simply whitelisting them should be sufficient to add support. They should work with the existing tidymodels selections helpers without changes.

@kylegilde
Copy link

kylegilde commented Oct 29, 2020

for # 1, it looks like assignInNamespace("selectors", c(recipes:::selectors, "&"), "recipes") will do that.

but for # 2, is this going to be somewhat reliable?

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org

@github-actions github-actions bot locked and limited conversation to collaborators Feb 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants