-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update names of imputation steps to have common text first #614
Conversation
I used the CRAN version of recipes and ran the example code for tidy(ratio_recipe2). # <- object from CRAN version
# A tibble: 1 x 6
number operation type trained skip id
<int> <chr> <chr> <lgl> <lgl> <chr>
1 1 step knnimpute TRUE FALSE knnimpute_iyPXM
> bake(ratio_recipe2, biomass_te)
Error in UseMethod("bake") :
no applicable method for 'bake' applied to an object of class "c('step_knnimpute', 'step')"
> methods(class = "step_knnimpute")
no methods found We'll need to keep S3 methods around for A collection of objects from the old class names should be kept around and used in tests to make sure that they still work. |
Ah yes, good catch! I did not think enough about backward compatibility for existing #' @export
prep.step_knnimpute <- prep.step_impute_knn and so forth. |
I believe that will work (but we'll see) |
Here is a reprex for testing the backward compatibility: install.packages("recipes", quiet = TRUE)
library(recipes)
library(modeldata)
data(biomass)
biomass_tr <- biomass[biomass$dataset == "Training", ]
biomass_te <- biomass[biomass$dataset == "Testing", ]
biomass_te_whole <- biomass_te
# induce some missing data at random
set.seed(9039)
carb_missing <- sample(1:nrow(biomass_te), 3)
nitro_missing <- sample(1:nrow(biomass_te), 3)
biomass_te$carbon[carb_missing] <- NA
biomass_te$nitrogen[nitro_missing] <- NA
rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
data = biomass_tr)
ratio_recipe <- rec %>%
step_knnimpute(all_predictors(), neighbors = 3)
ratio_recipe2 <- prep(ratio_recipe, training = biomass_tr)
imputed <- bake(ratio_recipe2, biomass_te)
# how well did it work?
summary(biomass_te_whole$carbon)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 27.80 44.24 47.30 47.96 49.00 79.34
cbind(before = biomass_te_whole$carbon[carb_missing],
after = imputed$carbon[carb_missing])
#> before after
#> [1,] 46.83 47.43000
#> [2,] 47.80 47.53333
#> [3,] 46.40 46.21000
summary(biomass_te_whole$nitrogen)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.010 0.295 0.690 1.092 1.450 4.790
cbind(before = biomass_te_whole$nitrogen[nitro_missing],
after = imputed$nitrogen[nitro_missing])
#> before after
#> [1,] 1.24 0.59333333
#> [2,] 0.30 0.92333333
#> [3,] 0.06 0.04666667
tidy(ratio_recipe, number = 1)
#> # A tibble: 1 x 4
#> terms predictors neighbors id
#> <chr> <chr> <dbl> <chr>
#> 1 all_predictors() <NA> 3 knnimpute_iyPXM
tidy(ratio_recipe2, number = 1)
#> # A tibble: 20 x 4
#> terms predictors neighbors id
#> <chr> <chr> <dbl> <chr>
#> 1 carbon hydrogen 3 knnimpute_iyPXM
#> 2 carbon oxygen 3 knnimpute_iyPXM
#> 3 carbon nitrogen 3 knnimpute_iyPXM
#> 4 carbon sulfur 3 knnimpute_iyPXM
#> 5 hydrogen carbon 3 knnimpute_iyPXM
#> 6 hydrogen oxygen 3 knnimpute_iyPXM
#> 7 hydrogen nitrogen 3 knnimpute_iyPXM
#> 8 hydrogen sulfur 3 knnimpute_iyPXM
#> 9 oxygen carbon 3 knnimpute_iyPXM
#> 10 oxygen hydrogen 3 knnimpute_iyPXM
#> 11 oxygen nitrogen 3 knnimpute_iyPXM
#> 12 oxygen sulfur 3 knnimpute_iyPXM
#> 13 nitrogen carbon 3 knnimpute_iyPXM
#> 14 nitrogen hydrogen 3 knnimpute_iyPXM
#> 15 nitrogen oxygen 3 knnimpute_iyPXM
#> 16 nitrogen sulfur 3 knnimpute_iyPXM
#> 17 sulfur carbon 3 knnimpute_iyPXM
#> 18 sulfur hydrogen 3 knnimpute_iyPXM
#> 19 sulfur oxygen 3 knnimpute_iyPXM
#> 20 sulfur nitrogen 3 knnimpute_iyPXM
saveRDS(ratio_recipe2, "~/Desktop/recipe.rds")
remotes::install_github("mattwarkentin/recipes", quiet = TRUE)
x <- readRDS("~/Desktop/recipe.rds")
tidy(x)
#> # A tibble: 1 x 6
#> number operation type trained skip id
#> <int> <chr> <chr> <lgl> <lgl> <chr>
#> 1 1 step knnimpute TRUE FALSE knnimpute_iyPXM
bake(x, biomass_te)
#> # A tibble: 80 x 6
#> carbon hydrogen oxygen nitrogen sulfur HHV
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 46.4 5.67 47.2 0.3 0.22 18.3
#> 2 43.2 5.5 48.1 2.85 0.34 17.6
#> 3 42.7 5.5 49.1 2.4 0.3 17.2
#> 4 46.2 6.1 37.3 1.8 0.5 18.9
#> 5 48.8 6.32 42.8 0.2 0 20.5
#> 6 44.3 5.5 41.7 0.7 0.2 18.5
#> 7 38.9 5.23 54.1 1.19 0.51 15.1
#> 8 42.1 4.66 33.8 0.95 0.2 16.2
#> 9 29.2 4.4 31.1 0.14 4.9 11.1
#> 10 27.8 3.77 23.7 4.63 1.05 10.8
#> # … with 70 more rows
methods(class = "step_knnimpute")
#> [1] bake prep tidy tunable
#> see '?methods' for accessing help and source code Created on 2020-12-14 by the reprex package (v0.3.0) |
Thanks for doing this! |
This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org |
Summary
Updated all of the imputation steps to use snake-case and common text first
step_lmimpute
#555 (comment)Also renamed the files themselves for organizational reasons
Soft-deprecated the old names and also added a descriptive note in
@details
sectionsHad to update a few tests to clean up check errors