Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upstep_rm(skip=TRUE) not working with bake() leading predict() to fail #239
Comments
|
Internally, the step is skipped. The issue is that the list of variables that should be retained after processing the steps is defined in keepers <- terms_select(terms = terms, info = object$term_info)where > object$term_info
# A tibble: 1 x 4
variable type role source
<chr> <chr> <chr> <chr>
1 x2 numeric predictor originalbecause that was all that was left after prepping the recipes. Clearly wrong in this case so we'll need to think this though. |
|
I think that the line keepers <- terms_select(terms = terms, info = object$term_info)needs to occur after |
|
If my understanding is correct, this would keep the variable after using bake if skip=TRUE. But why does predict() still need the variable to be present in the data despite that it was removed at the end of the recipe and isn't used in the model? |
It was used in the model since the skip only affects the processing of new data (via |
|
I'll merge in a fix for this in a minute. Here is what things look like after the changes: library(tidyverse)
library(recipes)
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stringr':
#>
#> fixed
#> The following object is masked from 'package:stats':
#>
#> step
library(caret)
#> Loading required package: lattice
#>
#> Attaching package: 'caret'
#> The following object is masked from 'package:purrr':
#>
#> lift
rec_1 <- recipe(Species ~ ., data = iris) %>%
step_interact(terms = ~ Sepal.Length:Sepal.Width) %>%
step_rm(Sepal.Length, skip = TRUE)
train_ctrl <- trainControl(method = "none", classProbs = TRUE)
rf_1 <- train(rec_1, data=iris, method = "rf", trControl = train_ctrl)
#> Loading required namespace: randomForest
prep_rec <- prep(rec_1, iris, retain=TRUE)
iris_juiced <- juice(prep_rec)
iris_baked <- bake(prep_rec, newdata = iris)
colnames(iris_juiced)
#> [1] "Sepal.Width" "Petal.Length"
#> [3] "Petal.Width" "Species"
#> [5] "Sepal.Length_x_Sepal.Width"
colnames(iris_baked)
#> [1] "Sepal.Length" "Sepal.Width"
#> [3] "Petal.Length" "Petal.Width"
#> [5] "Species" "Sepal.Length_x_Sepal.Width"Created on 2018-11-12 by the reprex package (v0.2.1) |
|
Works great! Thank you Max. |
The skip = TRUE argument of step_rm() doesn't seem to work with bake() as the variable still gets removed from the baked dataset. Additionally, the predict() function throws an error despite the fact that the removed variable isn't really needed in the model and the interaction that involved that variable, and is needed by the model, is available in the dataset to be predicted on.
Created on 2018-11-05 by the reprex package (v0.2.1)
Session info