Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in many bake methods when newdata has class grouped_df #125

Closed
jrnold opened this issue Feb 15, 2018 · 4 comments
Closed

Errors in many bake methods when newdata has class grouped_df #125

jrnold opened this issue Feb 15, 2018 · 4 comments

Comments

@jrnold
Copy link
Contributor

jrnold commented Feb 15, 2018

Many bake methods will either raise an error or return an empty dataset if newdata is a grouped data frame (classgrouped_df as returned by dplyr::group_by). The reproducible example below provides a few examples. This appears to affect any bake method which uses the pattern,

newdata <- cbind(newdata, ...)

The problem occurs because when newdata is "grouped_df" then this code will return a list rather than the expected data frame object.

Replacing instances of cbind with dplyr::bind_cols seems to solve the problem - it passes all tests and produces the desired results. I could submit a pull request, but I haven't fully grokked recipes, and I don't know if grouped data frames have other implications for recipes. Should groupings be removed? Have special treatment?

Minimal, runnable code:

library("tidyverse")
#> ── Attaching packages ────────────────────────────────────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 2.2.1          ✔ purrr   0.2.4     
#> ✔ tibble  1.4.2          ✔ dplyr   0.7.4.9000
#> ✔ tidyr   0.8.0          ✔ stringr 1.2.0     
#> ✔ readr   1.1.1          ✔ forcats 0.2.0
#> ── Conflicts ───────────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
library("recipes")
#> Loading required package: broom
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stringr':
#> 
#>     fixed
#> The following object is masked from 'package:stats':
#> 
#>     step
foo <- tibble(bar = letters, qux = runif(length(bar)), baz = rep(1:2, length = length(bar))) %>% 
  group_by(baz)

# error in step_dummy
rec <- recipe(~bar, data = foo) %>% step_dummy(bar)

trained_rec <- prep(rec, training = foo)

# error in merging indicators
try(bake(trained_rec, newdata = foo))
#> Warning in cbind(newdata, as_tibble(indicators)): number of rows of result
#> is not a multiple of vector length (arg 1)

# error step_poly
rec <- recipe(~qux, data = foo) %>% step_poly(qux, options = list(degree = 2))

trained_rec <- prep(rec, training = foo)

# No error: but all
try(bake(trained_rec, newdata = foo))
#> # A tibble: 2 x 0


# error step_bs
rec <- recipe(~qux, data = foo) %>% step_bs(qux)

trained_rec <- prep(rec, training = foo)

try(bake(trained_rec, newdata = foo))
#> Warning in cbind(newdata, as_tibble(bs_values)): number of rows of result
#> is not a multiple of vector length (arg 1)
#> # A tibble: 3 x 0

# error in step_ns
rec <- recipe(~qux, data = foo) %>% step_ns(qux)

trained_rec <- prep(rec, training = foo)

# No error: but all rows dropped
try(bake(trained_rec, newdata = foo))
#> # A tibble: 2 x 0

Session Info:

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] recipes_0.1.2    broom_0.4.3      forcats_0.2.0    stringr_1.2.0   
 [5] dplyr_0.7.4.9000 purrr_0.2.4      readr_1.1.1      tidyr_0.8.0     
 [9] tibble_1.4.2     ggplot2_2.2.1    tidyverse_1.2.1  fortunes_1.5-4  
[13] testthat_2.0.0   rmarkdown_1.8    knitr_1.19       roxygen2_6.0.1  
[17] devtools_1.13.4 

loaded via a namespace (and not attached):
 [1] httr_1.3.1        ddalpha_1.3.1.1   splines_3.4.3     sfsmisc_1.1-1    
 [5] jsonlite_1.5      prodlim_1.6.1     modelr_0.1.1      assertthat_0.2.0 
 [9] DRR_0.0.3         cellranger_1.1.0  yaml_2.1.16       robustbase_0.92-8
[13] ipred_0.9-6       pillar_1.1.0      backports_1.1.2   lattice_0.20-35  
[17] glue_1.2.0        digest_0.6.15     rvest_0.3.2       colorspace_1.3-2 
[21] htmltools_0.3.6   Matrix_1.2-12     plyr_1.8.4        psych_1.7.8      
[25] timeDate_3042.101 pkgconfig_2.0.1   CVST_0.2-1        haven_1.1.1      
[29] scales_0.5.0      gower_0.1.2       lava_1.6          withr_2.1.1      
[33] nnet_7.3-12       lazyeval_0.2.1    cli_1.0.0         mnormt_1.5-5     
[37] survival_2.41-3   magrittr_1.5      crayon_1.3.4      readxl_1.0.0     
[41] memoise_1.1.0     evaluate_0.10.1   nlme_3.1-131      MASS_7.3-48      
[45] xml2_1.2.0        dimRed_0.1.0      foreign_0.8-69    class_7.3-14     
[49] tools_3.4.3       hms_0.4.1         kernlab_0.9-25    munsell_0.4.3    
[53] bindrcpp_0.2      compiler_3.4.3    RcppRoll_0.2.2    rlang_0.1.6.9003 
[57] grid_3.4.3        rstudioapi_0.7    gtable_0.2.0      reshape2_1.4.3   
[61] R6_2.2.2          lubridate_1.7.2   utf8_1.1.3        bindr_0.1        
[65] commonmark_1.4    rprojroot_1.3-2   stringi_1.1.6     parallel_3.4.3   
[69] Rcpp_0.12.15      rpart_4.1-12      DEoptimR_1.0-8    tidyselect_0.2.3 
jrnold added a commit to jrnold/recipes that referenced this issue Feb 15, 2018
Replace all cases of cbind with bind_cols.

See issue tidymodels#125.
jrnold added a commit to jrnold/recipes that referenced this issue Feb 15, 2018
Replace all cases of cbind with bind_cols.

See issue tidymodels#125.
@topepo
Copy link
Member

topepo commented Feb 21, 2018

Thanks for catching this. That does solve the issue. Fire away if you want to submit the PR otherwise I can commit my changes (but if you want the credit...)

@jrnold
Copy link
Contributor Author

jrnold commented Feb 21, 2018

Will do. The changes are already in the linked commits.

jrnold added a commit to jrnold/recipes that referenced this issue Feb 21, 2018
For Issue tidymodels#125. This is only one test, though the original bug affected multiple functions.
@jrnold jrnold mentioned this issue Feb 21, 2018
@jrnold
Copy link
Contributor Author

jrnold commented Feb 21, 2018

Opened PR #128

@jrnold jrnold closed this as completed Feb 21, 2018
jrnold added a commit to jrnold/recipes that referenced this issue Feb 21, 2018
For Issue tidymodels#125. This is only one test, though the original bug affected multiple functions.
@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Feb 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants