-
Notifications
You must be signed in to change notification settings - Fork 635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
caret::dummyVars reoccurring pattern in column name causes errors in dummy variable names #390
Comments
These changes should fix the issue. Thanks. |
Thank you :) |
No problem... |
I run into this issue with the latest version of caret (6.0-81) :
Should return: Returns: |
I reran @JanLauGe 's MWE and it is indeed fixed. EDIT: function |
I noticed that dummyVars is producing erroneous variable names when creating (predicting) dummy variables if one of the column names in the original dataset matches the start of the name string of a subsequent column name. For these cases, the new dummy variable names get split in the wrong place. Part of the column names of the partly matching subsequent column name is put with the factor level name.
As far as I can tell the function still delivers the right result, just with a confusing name.
Minimal dataset:
Minimal, runnable code:
The same is true when using 'levelsOnly = TRUE' by the way. With this option, dummy variable names become 1, 2, 3, Bar4, Bar5, Bar6, 7, 8, 9.
This is my first bug report on github. Please point out anything that is missing or should be done better. Thanks for all the effort that went into this fantastic and super helpful package!
Session Info:
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin14.0.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.4.3 caret_6.0-64 ggplot2_2.0.0 lattice_0.20-33 plyr_1.8.3
loaded via a namespace (and not attached):
[1] Rcpp_0.12.3 magrittr_1.5 splines_3.2.3 MASS_7.3-45 munsell_0.4.2
[6] colorspace_1.2-6 R6_2.1.2 foreach_1.4.3 minqa_1.2.4 stringr_1.0.0
[11] car_2.1-1 tools_3.2.3 nnet_7.3-11 pbkrtest_0.4-6 parallel_3.2.3
[16] grid_3.2.3 gtable_0.1.2 nlme_3.1-124 mgcv_1.8-11 quantreg_5.19
[21] DBI_0.3.1 MatrixModels_0.4-1 iterators_1.0.8 assertthat_0.1 lme4_1.1-10
[26] Matrix_1.2-3 nloptr_1.0.4 reshape2_1.4.1 codetools_0.2-14 stringi_1.0-1
[31] scales_0.3.0 stats4_3.2.3 SparseM_1.7
The text was updated successfully, but these errors were encountered: