Update names of imputation steps to have common text first #614

mattwarkentin · 2020-12-07T18:29:58Z

Summary

Updated all of the imputation steps to use snake-case and common text first
- Related to Harmonize recipe steps to use snake-case and common-text-first format #585 and PR: step_lmimpute #555 (comment)
Also renamed the files themselves for organizational reasons
Soft-deprecated the old names and also added a descriptive note in @details sections
Had to update a few tests to clean up check errors

…ed old names

topepo · 2020-12-14T17:57:49Z

I used the CRAN version of recipes and ran the example code for step_knnimpute(), saved them, installed this PR, loaded the old files and ran the rest of the examples. There are a lot of missing S3 methods for the old step classes. For example:

 tidy(ratio_recipe2). # <- object from CRAN version
# A tibble: 1 x 6
  number operation type      trained skip  id             
   <int> <chr>     <chr>     <lgl>   <lgl> <chr>          
1      1 step      knnimpute TRUE    FALSE knnimpute_iyPXM
> bake(ratio_recipe2, biomass_te)
 Error in UseMethod("bake") : 
  no applicable method for 'bake' applied to an object of class "c('step_knnimpute', 'step')" 
> methods(class = "step_knnimpute")
no methods found

We'll need to keep S3 methods around for bake(), prep(), tidy(), tunable(), and print() for the old methods. There may be more that I've missed too so I'll keep looking.

A collection of objects from the old class names should be kept around and used in tests to make sure that they still work.

mattwarkentin · 2020-12-14T18:04:29Z

Ah yes, good catch! I did not think enough about backward compatibility for existing recipes. This shouldn't be too difficult. Should just need to create the old methods and point them to the new methods, right?

#' @export
prep.step_knnimpute <- prep.step_impute_knn

and so forth.

topepo · 2020-12-14T18:56:58Z

I believe that will work (but we'll see)

mattwarkentin · 2020-12-14T22:55:50Z

Here is a reprex for testing the backward compatibility:

install.packages("recipes", quiet = TRUE)

library(recipes)
library(modeldata)
data(biomass)

biomass_tr <- biomass[biomass$dataset == "Training", ]
biomass_te <- biomass[biomass$dataset == "Testing", ]
biomass_te_whole <- biomass_te

# induce some missing data at random
set.seed(9039)
carb_missing <- sample(1:nrow(biomass_te), 3)
nitro_missing <- sample(1:nrow(biomass_te), 3)

biomass_te$carbon[carb_missing] <- NA
biomass_te$nitrogen[nitro_missing] <- NA

rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
              data = biomass_tr)

ratio_recipe <- rec %>%
  step_knnimpute(all_predictors(), neighbors = 3)
ratio_recipe2 <- prep(ratio_recipe, training = biomass_tr)
imputed <- bake(ratio_recipe2, biomass_te)

# how well did it work?
summary(biomass_te_whole$carbon)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   27.80   44.24   47.30   47.96   49.00   79.34
cbind(before = biomass_te_whole$carbon[carb_missing],
      after = imputed$carbon[carb_missing])
#>      before    after
#> [1,]  46.83 47.43000
#> [2,]  47.80 47.53333
#> [3,]  46.40 46.21000

summary(biomass_te_whole$nitrogen)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   0.010   0.295   0.690   1.092   1.450   4.790
cbind(before = biomass_te_whole$nitrogen[nitro_missing],
      after = imputed$nitrogen[nitro_missing])
#>      before      after
#> [1,]   1.24 0.59333333
#> [2,]   0.30 0.92333333
#> [3,]   0.06 0.04666667

tidy(ratio_recipe, number = 1)
#> # A tibble: 1 x 4
#>   terms            predictors neighbors id             
#>   <chr>            <chr>          <dbl> <chr>          
#> 1 all_predictors() <NA>               3 knnimpute_iyPXM
tidy(ratio_recipe2, number = 1)
#> # A tibble: 20 x 4
#>    terms    predictors neighbors id             
#>    <chr>    <chr>          <dbl> <chr>          
#>  1 carbon   hydrogen           3 knnimpute_iyPXM
#>  2 carbon   oxygen             3 knnimpute_iyPXM
#>  3 carbon   nitrogen           3 knnimpute_iyPXM
#>  4 carbon   sulfur             3 knnimpute_iyPXM
#>  5 hydrogen carbon             3 knnimpute_iyPXM
#>  6 hydrogen oxygen             3 knnimpute_iyPXM
#>  7 hydrogen nitrogen           3 knnimpute_iyPXM
#>  8 hydrogen sulfur             3 knnimpute_iyPXM
#>  9 oxygen   carbon             3 knnimpute_iyPXM
#> 10 oxygen   hydrogen           3 knnimpute_iyPXM
#> 11 oxygen   nitrogen           3 knnimpute_iyPXM
#> 12 oxygen   sulfur             3 knnimpute_iyPXM
#> 13 nitrogen carbon             3 knnimpute_iyPXM
#> 14 nitrogen hydrogen           3 knnimpute_iyPXM
#> 15 nitrogen oxygen             3 knnimpute_iyPXM
#> 16 nitrogen sulfur             3 knnimpute_iyPXM
#> 17 sulfur   carbon             3 knnimpute_iyPXM
#> 18 sulfur   hydrogen           3 knnimpute_iyPXM
#> 19 sulfur   oxygen             3 knnimpute_iyPXM
#> 20 sulfur   nitrogen           3 knnimpute_iyPXM

saveRDS(ratio_recipe2, "~/Desktop/recipe.rds")

remotes::install_github("mattwarkentin/recipes", quiet = TRUE)
x <- readRDS("~/Desktop/recipe.rds")
tidy(x)
#> # A tibble: 1 x 6
#>   number operation type      trained skip  id             
#>    <int> <chr>     <chr>     <lgl>   <lgl> <chr>          
#> 1      1 step      knnimpute TRUE    FALSE knnimpute_iyPXM
bake(x, biomass_te)
#> # A tibble: 80 x 6
#>    carbon hydrogen oxygen nitrogen sulfur   HHV
#>     <dbl>    <dbl>  <dbl>    <dbl>  <dbl> <dbl>
#>  1   46.4     5.67   47.2     0.3    0.22  18.3
#>  2   43.2     5.5    48.1     2.85   0.34  17.6
#>  3   42.7     5.5    49.1     2.4    0.3   17.2
#>  4   46.2     6.1    37.3     1.8    0.5   18.9
#>  5   48.8     6.32   42.8     0.2    0     20.5
#>  6   44.3     5.5    41.7     0.7    0.2   18.5
#>  7   38.9     5.23   54.1     1.19   0.51  15.1
#>  8   42.1     4.66   33.8     0.95   0.2   16.2
#>  9   29.2     4.4    31.1     0.14   4.9   11.1
#> 10   27.8     3.77   23.7     4.63   1.05  10.8
#> # … with 70 more rows
methods(class = "step_knnimpute")
#> [1] bake    prep    tidy    tunable
#> see '?methods' for accessing help and source code

^{Created on 2020-12-14 by the reprex package (v0.3.0)}

topepo · 2020-12-29T19:27:41Z

Thanks for doing this!

github-actions · 2021-02-20T00:02:54Z

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org

Updated names of imputation steps to have common text first; deprecat…

25d0045

…ed old names

topepo self-requested a review December 7, 2020 21:22

Fixed backwards compatibility issue by adding back old S3 methods

9c7d277

mattwarkentin changed the title ~~Updated names of imputation steps to have common text first~~ Update names of imputation steps to have common text first Dec 14, 2020

topepo merged commit a5d4405 into tidymodels:master Dec 29, 2020

juliasilge mentioned this pull request Jan 7, 2021

Harmonize recipe steps to use snake-case and common-text-first format #585

Closed

github-actions bot locked and limited conversation to collaborators Feb 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update names of imputation steps to have common text first #614

Update names of imputation steps to have common text first #614

mattwarkentin commented Dec 7, 2020

topepo commented Dec 14, 2020

mattwarkentin commented Dec 14, 2020

topepo commented Dec 14, 2020

mattwarkentin commented Dec 14, 2020 •

edited

Loading

topepo commented Dec 29, 2020

github-actions bot commented Feb 20, 2021

Update names of imputation steps to have common text first #614

Update names of imputation steps to have common text first #614

Conversation

mattwarkentin commented Dec 7, 2020

Summary

topepo commented Dec 14, 2020

mattwarkentin commented Dec 14, 2020

topepo commented Dec 14, 2020

mattwarkentin commented Dec 14, 2020 • edited Loading

topepo commented Dec 29, 2020

github-actions bot commented Feb 20, 2021

mattwarkentin commented Dec 14, 2020 •

edited

Loading