-
Notifications
You must be signed in to change notification settings - Fork 123
Closed
Description
Many bake methods will either raise an error or return an empty dataset if newdata is a grouped data frame (classgrouped_df as returned by dplyr::group_by). The reproducible example below provides a few examples. This appears to affect any bake method which uses the pattern,
newdata <- cbind(newdata, ...)
The problem occurs because when newdata is "grouped_df" then this code will return a list rather than the expected data frame object.
Replacing instances of cbind with dplyr::bind_cols seems to solve the problem - it passes all tests and produces the desired results. I could submit a pull request, but I haven't fully grokked recipes, and I don't know if grouped data frames have other implications for recipes. Should groupings be removed? Have special treatment?
Minimal, runnable code:
library("tidyverse")
#> ── Attaching packages ────────────────────────────────────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 2.2.1 ✔ purrr 0.2.4
#> ✔ tibble 1.4.2 ✔ dplyr 0.7.4.9000
#> ✔ tidyr 0.8.0 ✔ stringr 1.2.0
#> ✔ readr 1.1.1 ✔ forcats 0.2.0
#> ── Conflicts ───────────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
library("recipes")
#> Loading required package: broom
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stringr':
#>
#> fixed
#> The following object is masked from 'package:stats':
#>
#> step
foo <- tibble(bar = letters, qux = runif(length(bar)), baz = rep(1:2, length = length(bar))) %>%
group_by(baz)
# error in step_dummy
rec <- recipe(~bar, data = foo) %>% step_dummy(bar)
trained_rec <- prep(rec, training = foo)
# error in merging indicators
try(bake(trained_rec, newdata = foo))
#> Warning in cbind(newdata, as_tibble(indicators)): number of rows of result
#> is not a multiple of vector length (arg 1)
# error step_poly
rec <- recipe(~qux, data = foo) %>% step_poly(qux, options = list(degree = 2))
trained_rec <- prep(rec, training = foo)
# No error: but all
try(bake(trained_rec, newdata = foo))
#> # A tibble: 2 x 0
# error step_bs
rec <- recipe(~qux, data = foo) %>% step_bs(qux)
trained_rec <- prep(rec, training = foo)
try(bake(trained_rec, newdata = foo))
#> Warning in cbind(newdata, as_tibble(bs_values)): number of rows of result
#> is not a multiple of vector length (arg 1)
#> # A tibble: 3 x 0
# error in step_ns
rec <- recipe(~qux, data = foo) %>% step_ns(qux)
trained_rec <- prep(rec, training = foo)
# No error: but all rows dropped
try(bake(trained_rec, newdata = foo))
#> # A tibble: 2 x 0Session Info:
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] recipes_0.1.2 broom_0.4.3 forcats_0.2.0 stringr_1.2.0
[5] dplyr_0.7.4.9000 purrr_0.2.4 readr_1.1.1 tidyr_0.8.0
[9] tibble_1.4.2 ggplot2_2.2.1 tidyverse_1.2.1 fortunes_1.5-4
[13] testthat_2.0.0 rmarkdown_1.8 knitr_1.19 roxygen2_6.0.1
[17] devtools_1.13.4
loaded via a namespace (and not attached):
[1] httr_1.3.1 ddalpha_1.3.1.1 splines_3.4.3 sfsmisc_1.1-1
[5] jsonlite_1.5 prodlim_1.6.1 modelr_0.1.1 assertthat_0.2.0
[9] DRR_0.0.3 cellranger_1.1.0 yaml_2.1.16 robustbase_0.92-8
[13] ipred_0.9-6 pillar_1.1.0 backports_1.1.2 lattice_0.20-35
[17] glue_1.2.0 digest_0.6.15 rvest_0.3.2 colorspace_1.3-2
[21] htmltools_0.3.6 Matrix_1.2-12 plyr_1.8.4 psych_1.7.8
[25] timeDate_3042.101 pkgconfig_2.0.1 CVST_0.2-1 haven_1.1.1
[29] scales_0.5.0 gower_0.1.2 lava_1.6 withr_2.1.1
[33] nnet_7.3-12 lazyeval_0.2.1 cli_1.0.0 mnormt_1.5-5
[37] survival_2.41-3 magrittr_1.5 crayon_1.3.4 readxl_1.0.0
[41] memoise_1.1.0 evaluate_0.10.1 nlme_3.1-131 MASS_7.3-48
[45] xml2_1.2.0 dimRed_0.1.0 foreign_0.8-69 class_7.3-14
[49] tools_3.4.3 hms_0.4.1 kernlab_0.9-25 munsell_0.4.3
[53] bindrcpp_0.2 compiler_3.4.3 RcppRoll_0.2.2 rlang_0.1.6.9003
[57] grid_3.4.3 rstudioapi_0.7 gtable_0.2.0 reshape2_1.4.3
[61] R6_2.2.2 lubridate_1.7.2 utf8_1.1.3 bindr_0.1
[65] commonmark_1.4 rprojroot_1.3-2 stringi_1.1.6 parallel_3.4.3
[69] Rcpp_0.12.15 rpart_4.1-12 DEoptimR_1.0-8 tidyselect_0.2.3 Metadata
Metadata
Assignees
Labels
No labels