-
Notifications
You must be signed in to change notification settings - Fork 123
Description
Hi,
I love the {recipes} package but I still find myself confused about the best way to select variables for a given step. I often forget whether the variables selected for a step are the intersection or the union of the selectors passed to .... I also find it to be a slightly extra amount of mental friction that often I find myself selecting variables I want by defining the variables I don't want.
For example, if I want to apply a step to all numeric predictors, the way I usually do it is by selecting numeric NOT outcomes. It just feels a little backwards to me.
recipe(mpg ~ ., data = mtcars) %>%
step_normalize(all_numeric(), -all_outcomes())I'm wondering if it should be possible to nest the selector functions to form AND statements, and then comma separated selectors passed to ... be the OR statements. The suggested API would look something like:
recipe(mpg ~ ., data = mtcars) %>%
step_normalize(all_predictors(all_numeric()))
# or via the pipe
recipe(mpg ~ ., data = mtcars) %>%
step_normalize(all_numeric() %>% all_predictors())This may just be a me problem, but I think it could be a nice addition and allow for more intuitive specifications of variable selections within steps. It wouldn't be a breaking change, I don't think, the selectors would just need to accept ... as an arbitrary number of additional selectors.
Perhaps to really ramp up the selection possibilities, the role selectors could even accept tidy selectors:
recipe(mpg ~ ., data = mtcars) %>%
step_normalize(all_predictors(ends_with("p"))) # select disp and hp among predictors