Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error message when step_dummy on two or more variables and one of them have NA #133

Closed
LluisRamon opened this issue Mar 19, 2018 · 4 comments

Comments

@LluisRamon
Copy link

@LluisRamon LluisRamon commented Mar 19, 2018

Thank you for the package, I find it very useful!

There is an error when using step_dummy with two or more variables and one of them have NA (not sure if it is the expected behaviour).

Error message relates to differing number of rows which is hard to understand which is the problem. Especially if you have a long or complex recipe.

I attach a reproducible example.

library("recipes")
library("dplyr")

data(okc)
okc <- okc[complete.cases(okc),]

# Two variables to dummy -> Works fine
okc$sunny_location <- sample(c("Florida", "Barcelona", "California"), nrow(okc), replace = TRUE)

rec <- recipe(age ~ ., data = okc)

dummies <- rec %>% step_dummy(diet, sunny_location)
dummies <- prep(dummies, training = okc)

dummy_data <- bake(dummies, newdata = okc)

# Variable with some missing values -> Not clear error
okc$sunny_location <- sample(c("Florida", "Barcelona", "California"), nrow(okc), replace = TRUE)

# Add some missing values
okc$sunny_location[1:5] <- NA

rec <- recipe(age ~ ., data = okc)

dummies <- rec %>% step_dummy(diet, sunny_location)
dummies <- prep(dummies, training = okc)

dummy_data <- bake(dummies, newdata = okc)
# Error in data.frame(..., check.names = FALSE) : 
#   arguments imply differing number of rows: 35495, 35490

My sessionInfo in case you need it.

R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] recipes_0.1.2 broom_0.4.2   dplyr_0.7.4  

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15      ddalpha_1.3.1     gower_0.1.2       pillar_1.2.1      compiler_3.4.3    DEoptimR_1.0-8    plyr_1.8.4        bindr_0.1        
 [9] class_7.3-14      tools_3.4.3       rpart_4.1-11      ipred_0.9-6       lubridate_1.6.0   tibble_1.4.2      nlme_3.1-131      lattice_0.20-35  
[17] pkgconfig_2.0.1   rlang_0.2.0       Matrix_1.2-12     psych_1.7.8       cli_1.0.0         rstudioapi_0.7    yaml_2.1.14       parallel_3.4.3   
[25] RcppRoll_0.2.2    prodlim_1.6.1     bindrcpp_0.2      stringr_1.2.0     tidyselect_0.2.2  nnet_7.3-12       CVST_0.2-1        grid_3.4.3       
[33] glue_1.2.0        robustbase_0.92-7 R6_2.2.2          survival_2.41-3   foreign_0.8-69    lava_1.5.1        kernlab_0.9-25    DRR_0.0.2        
[41] tidyr_0.7.2       reshape2_1.4.2    purrr_0.2.4       magrittr_1.5      splines_3.4.3     MASS_7.3-47       sfsmisc_1.1-1     dimRed_0.1.0     
[49] assertthat_0.2.0  mnormt_1.5-5      timeDate_3012.100 utf8_1.1.3        stringi_1.1.5     crayon_1.3.4   
@jlopezper
Copy link

@jlopezper jlopezper commented May 30, 2018

Hi @LluisRamon, how do you finally dealt with it? Is there any way to overcome this issue without renounce of using recipes? Thanks!

topepo added a commit that referenced this issue Jun 1, 2018
@topepo
Copy link
Collaborator

@topepo topepo commented Jun 1, 2018

This should solve the problem by not generating an error but instead assigning missing values to the resulting indicator variables.

@topepo topepo closed this Jun 3, 2018
@jlopezper
Copy link

@jlopezper jlopezper commented Jun 3, 2018

Thank you!

@LluisRamon
Copy link
Author

@LluisRamon LluisRamon commented Jun 4, 2018

Great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.