Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support modern tidyselect #595

Merged
merged 16 commits into from
Nov 8, 2020
Merged

Support modern tidyselect #595

merged 16 commits into from
Nov 8, 2020

Conversation

DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented Nov 2, 2020

Closes #572

Don't be alarmed by the number of files changed. Most changes are in selections.R, recipe.R, and roles.R.

With this PR, recipes now supports the entire modern tidyselect DSL. This includes:

  • The ops &, |, ! and -

  • The new where() function

  • Using c() to specify multiple variables at once

  • User created specified selector functions

  • We still support the recipes specific selector functions

This is supported in all of the step_*() functions. Additionally, it is supported in add_role(), update_role(), and remove_role(). It is also supported in bake() and juice().

Notably, step_interact() has gained support for where(), but cannot use the tidyselect ops like & or |. Mixing these with the formula syntax is just too weird. I've updated the selection docs to note that it is more restricted.

This is accomplished by essentially deprecating terms_select() in favor of eval_select_recipes(). We no longer use terms_select() or element_check() internally, but since we export terms_select(), it has to stick around.

In the process of updating bake() and juice(), I had to do a little rewriting of those functions. One side effect of this is that when you have a skipped step that would have otherwise dropped columns from the test set, the ordering of those columns that didn't get dropped has now changed slightly. It used to be relative to the training set (where the dropping happened), but now it is relative to the original recipe, which I think is more intuitive.

These are fairly big changes, so it would be nice to run full revdeps with this branch before merging. It would also be nice to run against extrachecks and maybe the book.

@DavisVaughan DavisVaughan marked this pull request as draft November 3, 2020 13:12
@DavisVaughan DavisVaughan marked this pull request as ready for review November 4, 2020 19:57
@DavisVaughan DavisVaughan marked this pull request as draft November 4, 2020 19:57
@DavisVaughan DavisVaughan changed the title Allow tidyselect &, |, !, and - in selectors Support modern tidyselect Nov 4, 2020
@DavisVaughan DavisVaughan marked this pull request as ready for review November 4, 2020 21:57
@topepo
Copy link
Member

topepo commented Nov 5, 2020

MachineShop errored in the revdeps:

  Running examples in ‘MachineShop-Ex.R’ failed
  The error most likely occurred in:
  
  > ### Name: ModeledInput
  > ### Title: ModeledInput Classes
  > ### Aliases: ModeledInput ModeledFrame ModeledRecipe ModeledInput.formula
  > ###   ModeledInput.matrix ModeledInput.ModelFrame ModeledInput.recipe
  > ###   ModeledInput.MLModel ModeledInput.MLModelFunction
  > 
  > ### ** Examples
  > 
  > ## Modeled model frame
  > mod_mf <- ModeledInput(sale_amount ~ ., data = ICHomes, model = GLMModel)
  > fit(mod_mf)
  
  Call:  stats::glm(formula = formula, family = family, data = data, weights = weights, 
      control = control)
  
  Coefficients:
                    (Intercept)                      sale_year  
                     -1.742e+07                      2.513e+03  
                     sale_month                          built  
                     -7.918e+02                      3.029e+02  
                     styleCondo      construction1 Story Brick  
                     -1.774e+04                     -1.176e+04  
      construction1 Story Condo      construction1 Story Frame  
                     -6.658e+04                     -6.416e+04  
      construction2 Story Brick      construction2 Story Condo  
                      1.131e+05                      9.514e+03  
      construction2 Story Frame  constructionSplit Foyer Frame  
                      2.323e+04                     -5.714e+04  
  constructionSplit Level Frame                      base_size  
                     -3.302e+04                      1.635e+02  
                       add_size                   garage1_size  
                      1.433e+02                      5.151e+01  
                   garage2_size                       lot_size  
                      6.838e+01                      1.232e+00  
                       bedrooms                    basementYes  
                     -1.557e+03                      2.387e+04  
                          acYes                       atticYes  
                     -2.600e+03                      4.763e+04  
                            lon                            lat  
                      5.430e+03                      2.949e+05  
  
  Degrees of Freedom: 752 Total (i.e. Null);  729 Residual
  Null Deviance:	    6.323e+12 
  Residual Deviance: 1.76e+12 	AIC: 18430
  > 
  > ## Modeled recipe
  > library(recipes)
  Loading required package: dplyr
  
  Attaching package: ‘dplyr’
  
  The following objects are masked from ‘package:stats’:
  
      filter, lag
  
  The following objects are masked from ‘package:base’:
  
      intersect, setdiff, setequal, union
  
  
  Attaching package: ‘recipes’
  
  The following object is masked from ‘package:stats’:
  
      step
  
  > 
  > rec <- recipe(sale_amount ~ ., data = ICHomes)
  > mod_rec <- ModeledInput(rec, model = GLMModel)
  > fit(mod_rec)
  Error: `x` and `y` must share the same src, set `copy` = TRUE (may be slow).
  Backtrace:
       █
    1. ├─MachineShop::fit(mod_rec)
    2. └─MachineShop:::fit.recipe(mod_rec)
    3.   ├─MachineShop:::.fit(x, model)
    4.   └─MachineShop:::.fit.ModeledRecipe(x, model)
    5.     ├─MachineShop::fit(as(x, "ModelRecipe"), model = x@model)
    6.     └─MachineShop:::fit.recipe(as(x, "ModelRecipe"), model = x@model)
    7.       ├─MachineShop:::.fit(x, model)
    8.       └─MachineShop:::.fit.recipe(x, model)
    9.         ├─MachineShop:::.fit(model, ModelRecipe(x))
   10.         └─MachineShop:::.fit.MLModel(model, ModelRecipe(x))
   11.           ├─MachineShop::ModelFrame(inputs_prep, na.rm = FALSE)
   12.           └─MachineShop:::ModelFrame.recipe(inputs_prep, na.rm = FALSE)
   13.             ├─MachineShop:::juice(x)
   14.             └─MachineShop:::juice.ModelRecipe(x)
   15.               ├─recipes::bake(x, x$template)
   16.               └─MachineShop:::bake.ModelRecipe(x, x$template)
   17.                 ├─recipes::bake(as(object, "recipe"), new_data = prep_recipe_data(new_data))
   18.                 └─recipes:::bake.recipe(as(object, "recipe"), new_data = prep_recipe_data(new_data))
   19.                   └─recipes:::eval_select_recipes(terms, new_data, info)
   20.                     ├─dplyr::left_join(data_info, info, by = "variable")
   21.                     └─dplyr:::left_join.data.frame(data_info, info, by = "variable")
   22.                       └─dplyr::auto_copy(x, y, copy = copy)
   23.                         └─dplyr:::glubort(...)
  Execution halted

@topepo topepo merged commit df7c643 into tidymodels:master Nov 8, 2020
@DavisVaughan DavisVaughan deleted the fix/tidyselect-updates branch November 9, 2020 16:15
@DavisVaughan
Copy link
Member Author

Added an issue to MachineShop brian-j-smith/MachineShop#4

@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org

@github-actions github-actions bot locked and limited conversation to collaborators Feb 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suggestion to use tidyselect as recipes selection API
2 participants