Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign up`juice()` should be able to return a 0 column data frame rather than abort() #298
Comments
|
Would you mind describing how this helps out in |
|
Example: library(recipes)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#>
#> step
recipe(~ ., data = iris) %>% prep() %>% juice(all_outcomes())
#> # A tibble: 0 x 0Created on 2019-07-22 by the reprex package (v0.2.1) For hardhat::mold(recipes::recipe(~ ., iris), iris)
#> No variables or terms were selected.
sessionInfo(package = "recipes")
#> R version 3.6.0 (2019-04-26)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS High Sierra 10.13.6
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> character(0)
#>
#> other attached packages:
#> [1] recipes_0.1.6
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.1 pillar_1.4.2 compiler_3.6.0
#> [4] gower_0.2.0 highr_0.8 methods_3.6.0
#> [7] class_7.3-15 utils_3.6.0 tools_3.6.0
#> [10] grDevices_3.6.0 zeallot_0.1.0 rpart_4.1-15
#> [13] digest_0.6.20 ipred_0.9-8 lubridate_1.7.4
#> [16] evaluate_0.14 tibble_2.1.3 lattice_0.20-38
#> [19] pkgconfig_2.0.2 rlang_0.4.0.9000 Matrix_1.2-17
#> [22] yaml_2.2.0 prodlim_2018.04.18 xfun_0.8
#> [25] withr_2.1.2 stringr_1.4.0 dplyr_0.8.3
#> [28] knitr_1.23 generics_0.0.2 vctrs_0.2.0.9000
#> [31] graphics_3.6.0 datasets_3.6.0 stats_3.6.0
#> [34] nnet_7.3-12 grid_3.6.0 tidyselect_0.2.5
#> [37] glue_1.3.1 base_3.6.0 R6_2.4.0
#> [40] survival_2.44-1.1 rmarkdown_1.14 lava_1.6.5
#> [43] tidyr_0.8.3 purrr_0.3.2 magrittr_1.5
#> [46] splines_3.6.0 backports_1.1.4 htmltools_0.3.6
#> [49] MASS_7.3-51.4 assertthat_0.2.1 hardhat_0.0.0.9000
#> [52] timeDate_3043.102 stringi_1.4.3 crayon_1.3.4Created on 2019-07-22 by the reprex package (v0.2.1) and after hardhat::mold(recipes::recipe(~ ., iris), iris)
#> $predictors
#> # A tibble: 150 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # … with 140 more rows
#>
#> $outcomes
#> # A tibble: 0 x 0
#>
#> $blueprint
#> Recipe blueprint:
#>
#> # Predictors: 5
#> # Outcomes: 0
#> Intercept: FALSE
#>
#> $extras
#> $extras$roles
#> NULL
sessionInfo(package = "recipes")
#> R version 3.6.0 (2019-04-26)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS High Sierra 10.13.6
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> character(0)
#>
#> other attached packages:
#> [1] recipes_0.1.6.9000
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.1 pillar_1.4.2 compiler_3.6.0
#> [4] gower_0.2.0 highr_0.8 methods_3.6.0
#> [7] class_7.3-15 utils_3.6.0 tools_3.6.0
#> [10] grDevices_3.6.0 zeallot_0.1.0 rpart_4.1-15
#> [13] digest_0.6.20 ipred_0.9-8 lubridate_1.7.4
#> [16] evaluate_0.14 tibble_2.1.3 lattice_0.20-38
#> [19] pkgconfig_2.0.2 rlang_0.4.0.9000 Matrix_1.2-17
#> [22] cli_1.1.0 yaml_2.2.0 prodlim_2018.04.18
#> [25] xfun_0.8 withr_2.1.2 stringr_1.4.0
#> [28] dplyr_0.8.3 knitr_1.23 generics_0.0.2
#> [31] vctrs_0.2.0.9000 graphics_3.6.0 datasets_3.6.0
#> [34] stats_3.6.0 nnet_7.3-12 grid_3.6.0
#> [37] tidyselect_0.2.5 glue_1.3.1 base_3.6.0
#> [40] R6_2.4.0 fansi_0.4.0 survival_2.44-1.1
#> [43] rmarkdown_1.14 lava_1.6.5 tidyr_0.8.3
#> [46] purrr_0.3.2 magrittr_1.5 splines_3.6.0
#> [49] backports_1.1.4 htmltools_0.3.6 MASS_7.3-51.4
#> [52] assertthat_0.2.1 hardhat_0.0.0.9000 timeDate_3043.102
#> [55] utf8_1.1.4 stringi_1.4.3 crayon_1.3.4Created on 2019-07-22 by the reprex package (v0.2.1) |
I think it would be appropriate for
juice()to return a 0 column tibble rather than abort when you try and use a selector that doesn't return any columns. This would match the behavior ofdplyr::select()and would be useful for me in hardhat.If you look at
dplyr::select(), a wrongly spelled column is an error, but a selector that returns 0 cols is fine.juice()would still maintain the ability to error if someone didjuice(rec, non_existant_column).I think we can get this behavior by simply removing the abort from
terms_select()here:https://github.com/tidymodels/recipes/blob/master/R/selections.R#L196
That is a pretty commonly used function though, so maybe we'd want a version that is strict (the current behavior), and a version that isn't as strict (the suggested behavior).