The skip = TRUE argument of step_rm() doesn't seem to work with bake() as the variable still gets removed from the baked dataset. Additionally, the predict() function throws an error despite the fact that the removed variable isn't really needed in the model and the interaction that involved that variable, and is needed by the model, is available in the dataset to be predicted on.
library(tidyverse)
library(recipes)
library(caret)
rec_1 <- recipe(Species ~ ., data = iris) %>%
step_interact(terms = ~ Sepal.Length:Sepal.Width) %>%
step_rm(Sepal.Length, skip = TRUE)
train_ctrl <- trainControl(method = "none", classProbs = TRUE)
rf_1 <- train(rec_1, data=iris, method = "rf", trControl = train_ctrl)
#> Loading required namespace: randomForest
prep_rec <- prep(rec_1, iris, retain=TRUE)
iris_juiced <- juice(prep_rec)
iris_baked <- bake(prep_rec, newdata = iris)
colnames(iris_juiced)
#> [1] "Sepal.Width" "Petal.Length"
#> [3] "Petal.Width" "Species"
#> [5] "Sepal.Length_x_Sepal.Width"
colnames(iris_baked)
#> [1] "Sepal.Width" "Petal.Length"
#> [3] "Petal.Width" "Species"
#> [5] "Sepal.Length_x_Sepal.Width"
predict(rf_1, iris_juiced, type = "response")
#> Error in eval(predvars, data, env): object 'Sepal.Length' not found
predict(rf_1, iris_baked, type = "response")
#> Error in eval(predvars, data, env): object 'Sepal.Length' not found
Created on 2018-11-05 by the reprex package (v0.2.1)
Session info
devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────
#> setting value
#> version R version 3.5.1 (2018-07-02)
#> os macOS High Sierra 10.13.6
#> system x86_64, darwin15.6.0
#> ui X11
#>
#> ─ Packages ──────────────────────────────────────────────
#> package * version date lib
#> tidyverse * 1.2.1 2017-11-14 [1]
#> recipes * 0.1.3.9002 2018-10-26 [1]
#> caret * 6.0-80 2018-05-26 [1]
The skip = TRUE argument of step_rm() doesn't seem to work with bake() as the variable still gets removed from the baked dataset. Additionally, the predict() function throws an error despite the fact that the removed variable isn't really needed in the model and the interaction that involved that variable, and is needed by the model, is available in the dataset to be predicted on.
Created on 2018-11-05 by the reprex package (v0.2.1)
Session info