Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

juice() should be able to return a 0 column data frame rather than abort() #298

Closed
DavisVaughan opened this issue Mar 15, 2019 · 3 comments

Comments

@DavisVaughan
Copy link
Member

I think it would be appropriate for juice() to return a 0 column tibble rather than abort when you try and use a selector that doesn't return any columns. This would match the behavior of dplyr::select() and would be useful for me in hardhat.

suppressPackageStartupMessages(library(recipes))

rec <- recipe(~ Sepal.Width, iris) %>%
  prep(iris)

juice(rec, all_predictors())
#> # A tibble: 150 x 1
#>    Sepal.Width
#>          <dbl>
#>  1         3.5
#>  2         3  
#>  3         3.2
#>  4         3.1
#>  5         3.6
#>  6         3.9
#>  7         3.4
#>  8         3.4
#>  9         2.9
#> 10         3.1
#> # … with 140 more rows

# should return tibble with 0 cols and 150 rows
juice(rec, all_outcomes())
#> Error: No variables or terms were selected.

If you look at dplyr::select(), a wrongly spelled column is an error, but a selector that returns 0 cols is fine.

dplyr::select(iris, "x")
#> Error: Unknown column `x`

dplyr::select(iris, dplyr::matches("x"))
#> data frame with 0 columns and 150 rows

juice() would still maintain the ability to error if someone did juice(rec, non_existant_column).

I think we can get this behavior by simply removing the abort from terms_select() here:
https://github.com/tidymodels/recipes/blob/master/R/selections.R#L196

That is a pretty commonly used function though, so maybe we'd want a version that is strict (the current behavior), and a version that isn't as strict (the suggested behavior).

@alexpghayes
Copy link
Contributor

Would you mind describing how this helps out in hardhat()? I think returning a 0-column tibble is dangerous and the kind of thing you want to error out on as early as possible to make debugging your pipeline as easy as possible.

@topepo
Copy link
Member

topepo commented Jul 22, 2019

Example:

library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
recipe(~ ., data = iris) %>% prep() %>% juice(all_outcomes())
#> # A tibble: 0 x 0

Created on 2019-07-22 by the reprex package (v0.2.1)

For hardhat... before:

hardhat::mold(recipes::recipe(~ ., iris), iris)
#> No variables or terms were selected.
sessionInfo(package = "recipes")
#> R version 3.6.0 (2019-04-26)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS High Sierra 10.13.6
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> character(0)
#> 
#> other attached packages:
#> [1] recipes_0.1.6
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.1         pillar_1.4.2       compiler_3.6.0    
#>  [4] gower_0.2.0        highr_0.8          methods_3.6.0     
#>  [7] class_7.3-15       utils_3.6.0        tools_3.6.0       
#> [10] grDevices_3.6.0    zeallot_0.1.0      rpart_4.1-15      
#> [13] digest_0.6.20      ipred_0.9-8        lubridate_1.7.4   
#> [16] evaluate_0.14      tibble_2.1.3       lattice_0.20-38   
#> [19] pkgconfig_2.0.2    rlang_0.4.0.9000   Matrix_1.2-17     
#> [22] yaml_2.2.0         prodlim_2018.04.18 xfun_0.8          
#> [25] withr_2.1.2        stringr_1.4.0      dplyr_0.8.3       
#> [28] knitr_1.23         generics_0.0.2     vctrs_0.2.0.9000  
#> [31] graphics_3.6.0     datasets_3.6.0     stats_3.6.0       
#> [34] nnet_7.3-12        grid_3.6.0         tidyselect_0.2.5  
#> [37] glue_1.3.1         base_3.6.0         R6_2.4.0          
#> [40] survival_2.44-1.1  rmarkdown_1.14     lava_1.6.5        
#> [43] tidyr_0.8.3        purrr_0.3.2        magrittr_1.5      
#> [46] splines_3.6.0      backports_1.1.4    htmltools_0.3.6   
#> [49] MASS_7.3-51.4      assertthat_0.2.1   hardhat_0.0.0.9000
#> [52] timeDate_3043.102  stringi_1.4.3      crayon_1.3.4

Created on 2019-07-22 by the reprex package (v0.2.1)

and after

hardhat::mold(recipes::recipe(~ ., iris), iris)
#> $predictors
#> # A tibble: 150 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # … with 140 more rows
#> 
#> $outcomes
#> # A tibble: 0 x 0
#> 
#> $blueprint
#> Recipe blueprint: 
#>  
#> # Predictors: 5 
#>   # Outcomes: 0 
#>    Intercept: FALSE 
#> 
#> $extras
#> $extras$roles
#> NULL
sessionInfo(package = "recipes")
#> R version 3.6.0 (2019-04-26)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS High Sierra 10.13.6
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> character(0)
#> 
#> other attached packages:
#> [1] recipes_0.1.6.9000
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.1         pillar_1.4.2       compiler_3.6.0    
#>  [4] gower_0.2.0        highr_0.8          methods_3.6.0     
#>  [7] class_7.3-15       utils_3.6.0        tools_3.6.0       
#> [10] grDevices_3.6.0    zeallot_0.1.0      rpart_4.1-15      
#> [13] digest_0.6.20      ipred_0.9-8        lubridate_1.7.4   
#> [16] evaluate_0.14      tibble_2.1.3       lattice_0.20-38   
#> [19] pkgconfig_2.0.2    rlang_0.4.0.9000   Matrix_1.2-17     
#> [22] cli_1.1.0          yaml_2.2.0         prodlim_2018.04.18
#> [25] xfun_0.8           withr_2.1.2        stringr_1.4.0     
#> [28] dplyr_0.8.3        knitr_1.23         generics_0.0.2    
#> [31] vctrs_0.2.0.9000   graphics_3.6.0     datasets_3.6.0    
#> [34] stats_3.6.0        nnet_7.3-12        grid_3.6.0        
#> [37] tidyselect_0.2.5   glue_1.3.1         base_3.6.0        
#> [40] R6_2.4.0           fansi_0.4.0        survival_2.44-1.1 
#> [43] rmarkdown_1.14     lava_1.6.5         tidyr_0.8.3       
#> [46] purrr_0.3.2        magrittr_1.5       splines_3.6.0     
#> [49] backports_1.1.4    htmltools_0.3.6    MASS_7.3-51.4     
#> [52] assertthat_0.2.1   hardhat_0.0.0.9000 timeDate_3043.102 
#> [55] utf8_1.1.4         stringi_1.4.3      crayon_1.3.4

Created on 2019-07-22 by the reprex package (v0.2.1)

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Feb 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants