Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error message when step_dummy on two or more variables and one of them have NA #133

Closed
LluisRamon opened this issue Mar 19, 2018 · 5 comments

Comments

@LluisRamon
Copy link

Thank you for the package, I find it very useful!

There is an error when using step_dummy with two or more variables and one of them have NA (not sure if it is the expected behaviour).

Error message relates to differing number of rows which is hard to understand which is the problem. Especially if you have a long or complex recipe.

I attach a reproducible example.

library("recipes")
library("dplyr")

data(okc)
okc <- okc[complete.cases(okc),]

# Two variables to dummy -> Works fine
okc$sunny_location <- sample(c("Florida", "Barcelona", "California"), nrow(okc), replace = TRUE)

rec <- recipe(age ~ ., data = okc)

dummies <- rec %>% step_dummy(diet, sunny_location)
dummies <- prep(dummies, training = okc)

dummy_data <- bake(dummies, newdata = okc)

# Variable with some missing values -> Not clear error
okc$sunny_location <- sample(c("Florida", "Barcelona", "California"), nrow(okc), replace = TRUE)

# Add some missing values
okc$sunny_location[1:5] <- NA

rec <- recipe(age ~ ., data = okc)

dummies <- rec %>% step_dummy(diet, sunny_location)
dummies <- prep(dummies, training = okc)

dummy_data <- bake(dummies, newdata = okc)
# Error in data.frame(..., check.names = FALSE) : 
#   arguments imply differing number of rows: 35495, 35490

My sessionInfo in case you need it.

R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] recipes_0.1.2 broom_0.4.2   dplyr_0.7.4  

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15      ddalpha_1.3.1     gower_0.1.2       pillar_1.2.1      compiler_3.4.3    DEoptimR_1.0-8    plyr_1.8.4        bindr_0.1        
 [9] class_7.3-14      tools_3.4.3       rpart_4.1-11      ipred_0.9-6       lubridate_1.6.0   tibble_1.4.2      nlme_3.1-131      lattice_0.20-35  
[17] pkgconfig_2.0.1   rlang_0.2.0       Matrix_1.2-12     psych_1.7.8       cli_1.0.0         rstudioapi_0.7    yaml_2.1.14       parallel_3.4.3   
[25] RcppRoll_0.2.2    prodlim_1.6.1     bindrcpp_0.2      stringr_1.2.0     tidyselect_0.2.2  nnet_7.3-12       CVST_0.2-1        grid_3.4.3       
[33] glue_1.2.0        robustbase_0.92-7 R6_2.2.2          survival_2.41-3   foreign_0.8-69    lava_1.5.1        kernlab_0.9-25    DRR_0.0.2        
[41] tidyr_0.7.2       reshape2_1.4.2    purrr_0.2.4       magrittr_1.5      splines_3.4.3     MASS_7.3-47       sfsmisc_1.1-1     dimRed_0.1.0     
[49] assertthat_0.2.0  mnormt_1.5-5      timeDate_3012.100 utf8_1.1.3        stringi_1.1.5     crayon_1.3.4   
@jlopezper
Copy link

jlopezper commented May 30, 2018

Hi @LluisRamon, how do you finally dealt with it? Is there any way to overcome this issue without renounce of using recipes? Thanks!

@topepo
Copy link
Member

topepo commented Jun 1, 2018

This should solve the problem by not generating an error but instead assigning missing values to the resulting indicator variables.

@topepo topepo closed this as completed Jun 3, 2018
@jlopezper
Copy link

Thank you!

@LluisRamon
Copy link
Author

Great, thanks!

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Feb 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants