Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in many `bake` methods when `newdata` has class `grouped_df` #125

Closed
jrnold opened this issue Feb 15, 2018 · 3 comments
Closed

Errors in many `bake` methods when `newdata` has class `grouped_df` #125

jrnold opened this issue Feb 15, 2018 · 3 comments

Comments

@jrnold
Copy link
Contributor

@jrnold jrnold commented Feb 15, 2018

Many bake methods will either raise an error or return an empty dataset if newdata is a grouped data frame (classgrouped_df as returned by dplyr::group_by). The reproducible example below provides a few examples. This appears to affect any bake method which uses the pattern,

newdata <- cbind(newdata, ...)

The problem occurs because when newdata is "grouped_df" then this code will return a list rather than the expected data frame object.

Replacing instances of cbind with dplyr::bind_cols seems to solve the problem - it passes all tests and produces the desired results. I could submit a pull request, but I haven't fully grokked recipes, and I don't know if grouped data frames have other implications for recipes. Should groupings be removed? Have special treatment?

Minimal, runnable code:

library("tidyverse")
#> ── Attaching packages ────────────────────────────────────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 2.2.1          ✔ purrr   0.2.4     
#> ✔ tibble  1.4.2          ✔ dplyr   0.7.4.9000
#> ✔ tidyr   0.8.0          ✔ stringr 1.2.0     
#> ✔ readr   1.1.1          ✔ forcats 0.2.0
#> ── Conflicts ───────────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
library("recipes")
#> Loading required package: broom
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stringr':
#> 
#>     fixed
#> The following object is masked from 'package:stats':
#> 
#>     step
foo <- tibble(bar = letters, qux = runif(length(bar)), baz = rep(1:2, length = length(bar))) %>% 
  group_by(baz)

# error in step_dummy
rec <- recipe(~bar, data = foo) %>% step_dummy(bar)

trained_rec <- prep(rec, training = foo)

# error in merging indicators
try(bake(trained_rec, newdata = foo))
#> Warning in cbind(newdata, as_tibble(indicators)): number of rows of result
#> is not a multiple of vector length (arg 1)

# error step_poly
rec <- recipe(~qux, data = foo) %>% step_poly(qux, options = list(degree = 2))

trained_rec <- prep(rec, training = foo)

# No error: but all
try(bake(trained_rec, newdata = foo))
#> # A tibble: 2 x 0


# error step_bs
rec <- recipe(~qux, data = foo) %>% step_bs(qux)

trained_rec <- prep(rec, training = foo)

try(bake(trained_rec, newdata = foo))
#> Warning in cbind(newdata, as_tibble(bs_values)): number of rows of result
#> is not a multiple of vector length (arg 1)
#> # A tibble: 3 x 0

# error in step_ns
rec <- recipe(~qux, data = foo) %>% step_ns(qux)

trained_rec <- prep(rec, training = foo)

# No error: but all rows dropped
try(bake(trained_rec, newdata = foo))
#> # A tibble: 2 x 0

Session Info:

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] recipes_0.1.2    broom_0.4.3      forcats_0.2.0    stringr_1.2.0   
 [5] dplyr_0.7.4.9000 purrr_0.2.4      readr_1.1.1      tidyr_0.8.0     
 [9] tibble_1.4.2     ggplot2_2.2.1    tidyverse_1.2.1  fortunes_1.5-4  
[13] testthat_2.0.0   rmarkdown_1.8    knitr_1.19       roxygen2_6.0.1  
[17] devtools_1.13.4 

loaded via a namespace (and not attached):
 [1] httr_1.3.1        ddalpha_1.3.1.1   splines_3.4.3     sfsmisc_1.1-1    
 [5] jsonlite_1.5      prodlim_1.6.1     modelr_0.1.1      assertthat_0.2.0 
 [9] DRR_0.0.3         cellranger_1.1.0  yaml_2.1.16       robustbase_0.92-8
[13] ipred_0.9-6       pillar_1.1.0      backports_1.1.2   lattice_0.20-35  
[17] glue_1.2.0        digest_0.6.15     rvest_0.3.2       colorspace_1.3-2 
[21] htmltools_0.3.6   Matrix_1.2-12     plyr_1.8.4        psych_1.7.8      
[25] timeDate_3042.101 pkgconfig_2.0.1   CVST_0.2-1        haven_1.1.1      
[29] scales_0.5.0      gower_0.1.2       lava_1.6          withr_2.1.1      
[33] nnet_7.3-12       lazyeval_0.2.1    cli_1.0.0         mnormt_1.5-5     
[37] survival_2.41-3   magrittr_1.5      crayon_1.3.4      readxl_1.0.0     
[41] memoise_1.1.0     evaluate_0.10.1   nlme_3.1-131      MASS_7.3-48      
[45] xml2_1.2.0        dimRed_0.1.0      foreign_0.8-69    class_7.3-14     
[49] tools_3.4.3       hms_0.4.1         kernlab_0.9-25    munsell_0.4.3    
[53] bindrcpp_0.2      compiler_3.4.3    RcppRoll_0.2.2    rlang_0.1.6.9003 
[57] grid_3.4.3        rstudioapi_0.7    gtable_0.2.0      reshape2_1.4.3   
[61] R6_2.2.2          lubridate_1.7.2   utf8_1.1.3        bindr_0.1        
[65] commonmark_1.4    rprojroot_1.3-2   stringi_1.1.6     parallel_3.4.3   
[69] Rcpp_0.12.15      rpart_4.1-12      DEoptimR_1.0-8    tidyselect_0.2.3 
jrnold added a commit to jrnold/recipes that referenced this issue Feb 15, 2018
Replace all cases of cbind with bind_cols.

See issue tidymodels#125.
jrnold added a commit to jrnold/recipes that referenced this issue Feb 15, 2018
Replace all cases of cbind with bind_cols.

See issue tidymodels#125.
@topepo
Copy link
Collaborator

@topepo topepo commented Feb 21, 2018

Thanks for catching this. That does solve the issue. Fire away if you want to submit the PR otherwise I can commit my changes (but if you want the credit...)

@jrnold
Copy link
Contributor Author

@jrnold jrnold commented Feb 21, 2018

Will do. The changes are already in the linked commits.

jrnold added a commit to jrnold/recipes that referenced this issue Feb 21, 2018
For Issue tidymodels#125. This is only one test, though the original bug affected multiple functions.
@jrnold jrnold mentioned this issue Feb 21, 2018
@jrnold
Copy link
Contributor Author

@jrnold jrnold commented Feb 21, 2018

Opened PR #128

@jrnold jrnold closed this Feb 21, 2018
jrnold added a commit to jrnold/recipes that referenced this issue Feb 21, 2018
For Issue tidymodels#125. This is only one test, though the original bug affected multiple functions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.