Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update names of imputation steps to have common text first #614

Merged
merged 2 commits into from
Dec 29, 2020

Conversation

mattwarkentin
Copy link
Contributor

Summary

@topepo topepo self-requested a review December 7, 2020 21:22
@topepo
Copy link
Member

topepo commented Dec 14, 2020

I used the CRAN version of recipes and ran the example code for step_knnimpute(), saved them, installed this PR, loaded the old files and ran the rest of the examples. There are a lot of missing S3 methods for the old step classes. For example:

 tidy(ratio_recipe2). # <- object from CRAN version
# A tibble: 1 x 6
  number operation type      trained skip  id             
   <int> <chr>     <chr>     <lgl>   <lgl> <chr>          
1      1 step      knnimpute TRUE    FALSE knnimpute_iyPXM
> bake(ratio_recipe2, biomass_te)
 Error in UseMethod("bake") : 
  no applicable method for 'bake' applied to an object of class "c('step_knnimpute', 'step')" 
> methods(class = "step_knnimpute")
no methods found

We'll need to keep S3 methods around for bake(), prep(), tidy(), tunable(), and print() for the old methods. There may be more that I've missed too so I'll keep looking.

A collection of objects from the old class names should be kept around and used in tests to make sure that they still work.

@mattwarkentin
Copy link
Contributor Author

Ah yes, good catch! I did not think enough about backward compatibility for existing recipes. This shouldn't be too difficult. Should just need to create the old methods and point them to the new methods, right?

#' @export
prep.step_knnimpute <- prep.step_impute_knn

and so forth.

@topepo
Copy link
Member

topepo commented Dec 14, 2020

I believe that will work (but we'll see)

@mattwarkentin
Copy link
Contributor Author

mattwarkentin commented Dec 14, 2020

Here is a reprex for testing the backward compatibility:

install.packages("recipes", quiet = TRUE)

library(recipes)
library(modeldata)
data(biomass)

biomass_tr <- biomass[biomass$dataset == "Training", ]
biomass_te <- biomass[biomass$dataset == "Testing", ]
biomass_te_whole <- biomass_te

# induce some missing data at random
set.seed(9039)
carb_missing <- sample(1:nrow(biomass_te), 3)
nitro_missing <- sample(1:nrow(biomass_te), 3)

biomass_te$carbon[carb_missing] <- NA
biomass_te$nitrogen[nitro_missing] <- NA

rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
              data = biomass_tr)

ratio_recipe <- rec %>%
  step_knnimpute(all_predictors(), neighbors = 3)
ratio_recipe2 <- prep(ratio_recipe, training = biomass_tr)
imputed <- bake(ratio_recipe2, biomass_te)

# how well did it work?
summary(biomass_te_whole$carbon)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   27.80   44.24   47.30   47.96   49.00   79.34
cbind(before = biomass_te_whole$carbon[carb_missing],
      after = imputed$carbon[carb_missing])
#>      before    after
#> [1,]  46.83 47.43000
#> [2,]  47.80 47.53333
#> [3,]  46.40 46.21000

summary(biomass_te_whole$nitrogen)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   0.010   0.295   0.690   1.092   1.450   4.790
cbind(before = biomass_te_whole$nitrogen[nitro_missing],
      after = imputed$nitrogen[nitro_missing])
#>      before      after
#> [1,]   1.24 0.59333333
#> [2,]   0.30 0.92333333
#> [3,]   0.06 0.04666667

tidy(ratio_recipe, number = 1)
#> # A tibble: 1 x 4
#>   terms            predictors neighbors id             
#>   <chr>            <chr>          <dbl> <chr>          
#> 1 all_predictors() <NA>               3 knnimpute_iyPXM
tidy(ratio_recipe2, number = 1)
#> # A tibble: 20 x 4
#>    terms    predictors neighbors id             
#>    <chr>    <chr>          <dbl> <chr>          
#>  1 carbon   hydrogen           3 knnimpute_iyPXM
#>  2 carbon   oxygen             3 knnimpute_iyPXM
#>  3 carbon   nitrogen           3 knnimpute_iyPXM
#>  4 carbon   sulfur             3 knnimpute_iyPXM
#>  5 hydrogen carbon             3 knnimpute_iyPXM
#>  6 hydrogen oxygen             3 knnimpute_iyPXM
#>  7 hydrogen nitrogen           3 knnimpute_iyPXM
#>  8 hydrogen sulfur             3 knnimpute_iyPXM
#>  9 oxygen   carbon             3 knnimpute_iyPXM
#> 10 oxygen   hydrogen           3 knnimpute_iyPXM
#> 11 oxygen   nitrogen           3 knnimpute_iyPXM
#> 12 oxygen   sulfur             3 knnimpute_iyPXM
#> 13 nitrogen carbon             3 knnimpute_iyPXM
#> 14 nitrogen hydrogen           3 knnimpute_iyPXM
#> 15 nitrogen oxygen             3 knnimpute_iyPXM
#> 16 nitrogen sulfur             3 knnimpute_iyPXM
#> 17 sulfur   carbon             3 knnimpute_iyPXM
#> 18 sulfur   hydrogen           3 knnimpute_iyPXM
#> 19 sulfur   oxygen             3 knnimpute_iyPXM
#> 20 sulfur   nitrogen           3 knnimpute_iyPXM

saveRDS(ratio_recipe2, "~/Desktop/recipe.rds")

remotes::install_github("mattwarkentin/recipes", quiet = TRUE)
x <- readRDS("~/Desktop/recipe.rds")
tidy(x)
#> # A tibble: 1 x 6
#>   number operation type      trained skip  id             
#>    <int> <chr>     <chr>     <lgl>   <lgl> <chr>          
#> 1      1 step      knnimpute TRUE    FALSE knnimpute_iyPXM
bake(x, biomass_te)
#> # A tibble: 80 x 6
#>    carbon hydrogen oxygen nitrogen sulfur   HHV
#>     <dbl>    <dbl>  <dbl>    <dbl>  <dbl> <dbl>
#>  1   46.4     5.67   47.2     0.3    0.22  18.3
#>  2   43.2     5.5    48.1     2.85   0.34  17.6
#>  3   42.7     5.5    49.1     2.4    0.3   17.2
#>  4   46.2     6.1    37.3     1.8    0.5   18.9
#>  5   48.8     6.32   42.8     0.2    0     20.5
#>  6   44.3     5.5    41.7     0.7    0.2   18.5
#>  7   38.9     5.23   54.1     1.19   0.51  15.1
#>  8   42.1     4.66   33.8     0.95   0.2   16.2
#>  9   29.2     4.4    31.1     0.14   4.9   11.1
#> 10   27.8     3.77   23.7     4.63   1.05  10.8
#> # … with 70 more rows
methods(class = "step_knnimpute")
#> [1] bake    prep    tidy    tunable
#> see '?methods' for accessing help and source code

Created on 2020-12-14 by the reprex package (v0.3.0)

@mattwarkentin mattwarkentin changed the title Updated names of imputation steps to have common text first Update names of imputation steps to have common text first Dec 14, 2020
@topepo topepo merged commit a5d4405 into tidymodels:master Dec 29, 2020
@topepo
Copy link
Member

topepo commented Dec 29, 2020

Thanks for doing this!

@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org

@github-actions github-actions bot locked and limited conversation to collaborators Feb 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants