Skip to content

Suggestion to use tidyselect as recipes selection API #572

@mattwarkentin

Description

@mattwarkentin

Hi,

I love the {recipes} package but I still find myself confused about the best way to select variables for a given step. I often forget whether the variables selected for a step are the intersection or the union of the selectors passed to .... I also find it to be a slightly extra amount of mental friction that often I find myself selecting variables I want by defining the variables I don't want.

For example, if I want to apply a step to all numeric predictors, the way I usually do it is by selecting numeric NOT outcomes. It just feels a little backwards to me.

recipe(mpg ~ ., data = mtcars) %>%
  step_normalize(all_numeric(), -all_outcomes())

I'm wondering if it should be possible to nest the selector functions to form AND statements, and then comma separated selectors passed to ... be the OR statements. The suggested API would look something like:

recipe(mpg ~ ., data = mtcars) %>%
  step_normalize(all_predictors(all_numeric()))

# or via the pipe
recipe(mpg ~ ., data = mtcars) %>%
  step_normalize(all_numeric() %>% all_predictors())

This may just be a me problem, but I think it could be a nice addition and allow for more intuitive specifications of variable selections within steps. It wouldn't be a breaking change, I don't think, the selectors would just need to accept ... as an arbitrary number of additional selectors.

Perhaps to really ramp up the selection possibilities, the role selectors could even accept tidy selectors:

recipe(mpg ~ ., data = mtcars) %>%
  step_normalize(all_predictors(ends_with("p"))) # select disp and hp among predictors

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions