Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign up`step_scale()` not respecting prior dropped cols when run after a `prep()` step #143
Comments
|
It turns out that we need to use > iris %>%
+ recipe() %>%
+ step_rm(Sepal.Width) %>%
+ # comment out the next line to see the correct behaviour
+ prep() %>%
+ step_center(all_numeric()) %>%
+ step_scale(all_numeric()) %>%
+ prep()
Error: To prep new steps after prepping the original recipe, `retain = TRUE` must be set each time that the recipe is trained.
> library(recipes)
>
> iris %>%
+ recipe() %>%
+ step_rm(Sepal.Width) %>%
+ # comment out the next line to see the correct behaviour
+ prep(retain = TRUE) %>%
+ step_center(all_numeric()) %>%
+ step_scale(all_numeric()) %>%
+ prep()
Data Recipe
Inputs:
5 variables (no declared roles)
Training data contained 150 data points and no missing data.
Operations:
Variables removed Sepal.Width [trained]
Centering for Sepal.Length, Petal.Length, Petal.Width [trained]
Scaling for Sepal.Length, Petal.Length, Petal.Width [trained] |
The following code removes a variable before a prep step. The
step_scale()function ignores the fact that the column was previously dropped and output of the prep will show center step works on 3 cols, and scale on 4 cols (or in dev version says Sepal.Width is being scaled). Comment out the firstprep()and the scale will show it's working on 3 columns. Very weird issue!This is done in a single pipeline but represents the building of building different recipes in stages.
Session Info:
Tried on latest CRAN version with everything fully up to date and then latest github of recipes