Skip to content

Errors in many bake methods when newdata has class grouped_df #125

@jrnold

Description

@jrnold

Many bake methods will either raise an error or return an empty dataset if newdata is a grouped data frame (classgrouped_df as returned by dplyr::group_by). The reproducible example below provides a few examples. This appears to affect any bake method which uses the pattern,

newdata <- cbind(newdata, ...)

The problem occurs because when newdata is "grouped_df" then this code will return a list rather than the expected data frame object.

Replacing instances of cbind with dplyr::bind_cols seems to solve the problem - it passes all tests and produces the desired results. I could submit a pull request, but I haven't fully grokked recipes, and I don't know if grouped data frames have other implications for recipes. Should groupings be removed? Have special treatment?

Minimal, runnable code:

library("tidyverse")
#> ── Attaching packages ────────────────────────────────────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 2.2.1          ✔ purrr   0.2.4     
#> ✔ tibble  1.4.2          ✔ dplyr   0.7.4.9000
#> ✔ tidyr   0.8.0          ✔ stringr 1.2.0     
#> ✔ readr   1.1.1          ✔ forcats 0.2.0
#> ── Conflicts ───────────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
library("recipes")
#> Loading required package: broom
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stringr':
#> 
#>     fixed
#> The following object is masked from 'package:stats':
#> 
#>     step
foo <- tibble(bar = letters, qux = runif(length(bar)), baz = rep(1:2, length = length(bar))) %>% 
  group_by(baz)

# error in step_dummy
rec <- recipe(~bar, data = foo) %>% step_dummy(bar)

trained_rec <- prep(rec, training = foo)

# error in merging indicators
try(bake(trained_rec, newdata = foo))
#> Warning in cbind(newdata, as_tibble(indicators)): number of rows of result
#> is not a multiple of vector length (arg 1)

# error step_poly
rec <- recipe(~qux, data = foo) %>% step_poly(qux, options = list(degree = 2))

trained_rec <- prep(rec, training = foo)

# No error: but all
try(bake(trained_rec, newdata = foo))
#> # A tibble: 2 x 0


# error step_bs
rec <- recipe(~qux, data = foo) %>% step_bs(qux)

trained_rec <- prep(rec, training = foo)

try(bake(trained_rec, newdata = foo))
#> Warning in cbind(newdata, as_tibble(bs_values)): number of rows of result
#> is not a multiple of vector length (arg 1)
#> # A tibble: 3 x 0

# error in step_ns
rec <- recipe(~qux, data = foo) %>% step_ns(qux)

trained_rec <- prep(rec, training = foo)

# No error: but all rows dropped
try(bake(trained_rec, newdata = foo))
#> # A tibble: 2 x 0

Session Info:

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] recipes_0.1.2    broom_0.4.3      forcats_0.2.0    stringr_1.2.0   
 [5] dplyr_0.7.4.9000 purrr_0.2.4      readr_1.1.1      tidyr_0.8.0     
 [9] tibble_1.4.2     ggplot2_2.2.1    tidyverse_1.2.1  fortunes_1.5-4  
[13] testthat_2.0.0   rmarkdown_1.8    knitr_1.19       roxygen2_6.0.1  
[17] devtools_1.13.4 

loaded via a namespace (and not attached):
 [1] httr_1.3.1        ddalpha_1.3.1.1   splines_3.4.3     sfsmisc_1.1-1    
 [5] jsonlite_1.5      prodlim_1.6.1     modelr_0.1.1      assertthat_0.2.0 
 [9] DRR_0.0.3         cellranger_1.1.0  yaml_2.1.16       robustbase_0.92-8
[13] ipred_0.9-6       pillar_1.1.0      backports_1.1.2   lattice_0.20-35  
[17] glue_1.2.0        digest_0.6.15     rvest_0.3.2       colorspace_1.3-2 
[21] htmltools_0.3.6   Matrix_1.2-12     plyr_1.8.4        psych_1.7.8      
[25] timeDate_3042.101 pkgconfig_2.0.1   CVST_0.2-1        haven_1.1.1      
[29] scales_0.5.0      gower_0.1.2       lava_1.6          withr_2.1.1      
[33] nnet_7.3-12       lazyeval_0.2.1    cli_1.0.0         mnormt_1.5-5     
[37] survival_2.41-3   magrittr_1.5      crayon_1.3.4      readxl_1.0.0     
[41] memoise_1.1.0     evaluate_0.10.1   nlme_3.1-131      MASS_7.3-48      
[45] xml2_1.2.0        dimRed_0.1.0      foreign_0.8-69    class_7.3-14     
[49] tools_3.4.3       hms_0.4.1         kernlab_0.9-25    munsell_0.4.3    
[53] bindrcpp_0.2      compiler_3.4.3    RcppRoll_0.2.2    rlang_0.1.6.9003 
[57] grid_3.4.3        rstudioapi_0.7    gtable_0.2.0      reshape2_1.4.3   
[61] R6_2.2.2          lubridate_1.7.2   utf8_1.1.3        bindr_0.1        
[65] commonmark_1.4    rprojroot_1.3-2   stringi_1.1.6     parallel_3.4.3   
[69] Rcpp_0.12.15      rpart_4.1-12      DEoptimR_1.0-8    tidyselect_0.2.3 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions