Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug on step_num2factor() #425

Closed
hermandr opened this issue Dec 8, 2019 · 1 comment
Closed

Bug on step_num2factor() #425

hermandr opened this issue Dec 8, 2019 · 1 comment

Comments

@hermandr
Copy link

@hermandr hermandr commented Dec 8, 2019

step_num2factor() ignores parameter levels.

Minimal, reproducible example:

library(tidymodels)
#> Registered S3 method overwritten by 'xts':
#>   method     from
#>   as.zoo.xts zoo
#> -- Attaching packages ------------------------------------------------------------ tidymodels 0.0.3 --
#> v broom     0.5.2       v purrr     0.3.2  
#> v dials     0.0.3       v recipes   0.1.7  
#> v dplyr     0.8.3       v rsample   0.0.5  
#> v ggplot2   3.2.1       v tibble    2.1.3  
#> v infer     0.5.0       v yardstick 0.0.4  
#> v parsnip   0.0.3.1
#> -- Conflicts --------------------------------------------------------------- tidymodels_conflicts() --
#> x purrr::discard()  masks scales::discard()
#> x dplyr::filter()   masks stats::filter()
#> x dplyr::lag()      masks stats::lag()
#> x ggplot2::margin() masks dials::margin()
#> x dials::offset()   masks stats::offset()
#> x recipes::step()   masks stats::step()
library(caret)
#> Loading required package: lattice
#> 
#> Attaching package: 'caret'
#> The following objects are masked from 'package:yardstick':
#> 
#>     precision, recall
#> The following object is masked from 'package:purrr':
#> 
#>     lift

data(PimaIndiansDiabetes, package="mlbench")

# Change the target variable from factor to numeric
PimaIndiansDiabetes$diabetes <- as.numeric(PimaIndiansDiabetes$diabetes)

data_split <- initial_split(PimaIndiansDiabetes)

recipe_obj <- training(data_split) %>%
  recipe(diabetes ~ .) %>%
  step_num2factor(diabetes,levels=c("pos","neg"))

recipe_obj %>% prep() %>% juice()   
#> # A tibble: 576 x 9
#>    pregnant glucose pressure triceps insulin  mass pedigree   age diabetes
#>       <dbl>   <dbl>    <dbl>   <dbl>   <dbl> <dbl>    <dbl> <dbl> <fct>   
#>  1        6     148       72      35       0  33.6    0.627    50 2       
#>  2        1      85       66      29       0  26.6    0.351    31 1       
#>  3        8     183       64       0       0  23.3    0.672    32 2       
#>  4        1      89       66      23      94  28.1    0.167    21 1       
#>  5        0     137       40      35     168  43.1    2.29     33 2       
#>  6        5     116       74       0       0  25.6    0.201    30 1       
#>  7        3      78       50      32      88  31      0.248    26 2       
#>  8       10     115        0       0       0  35.3    0.134    29 1       
#>  9        2     197       70      45     543  30.5    0.158    53 2       
#> 10        8     125       96       0       0   0      0.232    54 2       
#> # ... with 566 more rows

Created on 2019-12-08 by the reprex package (v0.3.0)

Minimal Reproducible Dataset:

data(PimaIndiansDiabetes, package="mlbench")#### Minimal, runnable code:

Session Info:

sessionInfo()
#> R version 3.6.1 (2019-07-05)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18362)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_Singapore.1252  LC_CTYPE=English_Singapore.1252   
#> [3] LC_MONETARY=English_Singapore.1252 LC_NUMERIC=C                      
#> [5] LC_TIME=English_Singapore.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_3.6.1  magrittr_1.5    tools_3.6.1     htmltools_0.4.0
#>  [5] yaml_2.2.0      Rcpp_1.0.2      stringi_1.4.3   rmarkdown_1.16 
#>  [9] highr_0.8       knitr_1.25      stringr_1.4.0   xfun_0.10      
#> [13] digest_0.6.21   rlang_0.4.0     evaluate_0.14

Created on 2019-12-08 by the reprex package (v0.3.0)

Be sure to test your chunks of code in an empty R session before submitting your issue!

@topepo
Copy link
Collaborator

@topepo topepo commented Dec 17, 2019

I've checked in changes that fixes the issue. However, this required some surgery on the code and there are some breaking changes:

  • If the input are not already integers, the transform function should convert them to integer.

  • The levels are required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.