Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

step_scale() not respecting prior dropped cols when run after a prep() step #143

Closed
stephlocke opened this issue Apr 9, 2018 · 2 comments

Comments

@stephlocke
Copy link

The following code removes a variable before a prep step. The step_scale() function ignores the fact that the column was previously dropped and output of the prep will show center step works on 3 cols, and scale on 4 cols (or in dev version says Sepal.Width is being scaled). Comment out the first prep() and the scale will show it's working on 3 columns. Very weird issue!

This is done in a single pipeline but represents the building of building different recipes in stages.

library(recipes)

iris %>% 
  recipe() %>% 
  step_rm(Sepal.Width) %>% 
  # comment out the next line to see the correct behaviour
  prep() %>% 
  step_center(all_numeric()) %>% 
  step_scale(all_numeric()) %>% 
  prep()

Session Info:

Tried on latest CRAN version with everything fully up to date and then latest github of recipes

Session info ─────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.4.4 (2018-03-15)
 os       Windows >= 8 x64            
 system   x86_64, mingw32             
 ui       RStudio                     
 language (EN)                        
 collate  English_United Kingdom.1252 
 tz       Europe/London               
 date     2018-04-09Packages ─────────────────────────────────────────────────────────────
 package     * version    date       source                            
 abind         1.4-5      2016-07-21 CRAN (R 3.4.1)                    
 assertthat    0.2.0      2017-04-11 CRAN (R 3.4.1)                    
 bindr         0.1.1      2018-03-13 CRAN (R 3.4.4)                    
 bindrcpp      0.2.2      2018-03-29 CRAN (R 3.4.4)                    
 broom       * 0.4.4      2018-03-29 CRAN (R 3.4.4)                    
 class         7.3-14     2015-08-30 CRAN (R 3.4.4)                    
 clisymbols    1.2.0      2017-05-21 CRAN (R 3.4.0)                    
 CVST          0.2-1      2013-12-10 CRAN (R 3.4.1)                    
 ddalpha       1.3.2      2018-04-08 CRAN (R 3.4.4)                    
 DEoptimR      1.0-8      2016-11-19 CRAN (R 3.4.0)                    
 dimRed        0.1.0.9001 2018-04-09 Github (gdkrmr/dimRed@cae270c)    
 dplyr       * 0.7.4      2017-09-28 CRAN (R 3.4.3)                    
 DRR           0.0.3      2018-01-06 CRAN (R 3.4.3)                    
 foreign       0.8-69     2017-06-21 CRAN (R 3.4.0)                    
 geometry      0.3-6      2015-09-09 CRAN (R 3.4.4)                    
 glue          1.2.0      2017-10-29 CRAN (R 3.4.2)                    
 gower         0.1.2      2017-02-23 CRAN (R 3.4.1)                    
 ipred         0.9-6      2017-03-01 CRAN (R 3.4.1)                    
 kernlab       0.9-25     2016-10-03 CRAN (R 3.4.0)                    
 lattice       0.20-35    2017-03-25 CRAN (R 3.4.4)                    
 lava          1.6.1      2018-03-28 CRAN (R 3.4.4)                    
 lubridate     1.7.3      2018-02-27 CRAN (R 3.4.4)                    
 magic         1.5-8      2018-01-26 CRAN (R 3.4.3)                    
 magrittr      1.5        2014-11-22 CRAN (R 3.4.1)                    
 MASS          7.3-49     2018-02-23 CRAN (R 3.4.4)                    
 Matrix        1.2-13     2018-04-02 CRAN (R 3.4.4)                    
 mnormt        1.5-5      2016-10-15 CRAN (R 3.4.0)                    
 nlme          3.1-131.1  2018-02-16 CRAN (R 3.4.4)                    
 nnet          7.3-12     2016-02-02 CRAN (R 3.4.4)                    
 pillar        1.2.1      2018-02-27 CRAN (R 3.4.4)                    
 pkgconfig     2.0.1      2017-03-21 CRAN (R 3.4.1)                    
 plyr          1.8.4      2016-06-08 CRAN (R 3.4.1)                    
 prodlim       1.6.1      2017-03-06 CRAN (R 3.4.1)                    
 psych         1.8.3.3    2018-03-30 CRAN (R 3.4.4)                    
 purrr         0.2.4      2017-10-18 CRAN (R 3.4.2)                    
 R6            2.2.2      2017-06-17 CRAN (R 3.4.1)                    
 Rcpp          0.12.16    2018-03-13 CRAN (R 3.4.4)                    
 RcppRoll      0.2.2      2015-04-05 CRAN (R 3.4.1)                    
 recipes     * 0.1.2.9000 2018-04-09 Github (topepo/recipes@a95175e)   
 reshape2      1.4.3      2017-12-11 CRAN (R 3.4.3)                    
 rlang         0.2.0      2018-02-20 CRAN (R 3.4.3)                    
 robustbase    0.92-8     2017-11-01 CRAN (R 3.4.2)                    
 rpart         4.1-13     2018-02-23 CRAN (R 3.4.4)                    
 sessioninfo   1.0.1.9000 2017-11-22 Github (r-lib/sessioninfo@c871d01)
 sfsmisc       1.1-2      2018-03-05 CRAN (R 3.4.4)                    
 stringi       1.1.7      2018-03-12 CRAN (R 3.4.4)                    
 stringr       1.3.0      2018-02-19 CRAN (R 3.4.3)                    
 survival      2.41-3     2017-04-04 CRAN (R 3.4.1)                    
 tibble        1.4.2      2018-01-22 CRAN (R 3.4.3)                    
 tidyr         0.8.0      2018-01-29 CRAN (R 3.4.3)                    
 tidyselect    0.2.4      2018-02-26 CRAN (R 3.4.4)                    
 timeDate      3043.102   2018-02-21 CRAN (R 3.4.3)                    
 withr         2.1.2      2018-03-15 CRAN (R 3.4.4)                    
 yaml          2.1.18     2018-03-08 CRAN (R 3.4.4)
@topepo
Copy link
Member

topepo commented Jun 2, 2018

It turns out that we need to use retain = TRUE in the initial prep call. I've added an error trap for when this happens. With the devel version:

> iris %>% 
+     recipe() %>% 
+     step_rm(Sepal.Width) %>% 
+     # comment out the next line to see the correct behaviour
+     prep() %>% 
+     step_center(all_numeric()) %>% 
+     step_scale(all_numeric()) %>% 
+     prep()
Error: To prep new steps after prepping the original recipe, `retain = TRUE` must be set each time that the recipe is trained.
> library(recipes)
> 
> iris %>% 
+     recipe() %>% 
+     step_rm(Sepal.Width) %>% 
+     # comment out the next line to see the correct behaviour
+     prep(retain = TRUE) %>% 
+     step_center(all_numeric()) %>% 
+     step_scale(all_numeric()) %>% 
+     prep()
Data Recipe

Inputs:

  5 variables (no declared roles)

Training data contained 150 data points and no missing data.

Operations:

Variables removed Sepal.Width [trained]
Centering for Sepal.Length, Petal.Length, Petal.Width [trained]
Scaling for Sepal.Length, Petal.Length, Petal.Width [trained]

@topepo topepo closed this as completed in edb5d06 Jun 2, 2018
@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Feb 24, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants