Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upcaret::dummyVars reoccurring pattern in column name causes errors in dummy variable names #390
Comments
|
These changes should fix the issue. Thanks. |
|
Thank you :) |
|
No problem... |
|
I run into this issue with the latest version of caret (6.0-81) :
Should return: Returns: |
|
I reran @JanLauGe 's MWE and it is indeed fixed. EDIT: function |
I noticed that dummyVars is producing erroneous variable names when creating (predicting) dummy variables if one of the column names in the original dataset matches the start of the name string of a subsequent column name. For these cases, the new dummy variable names get split in the wrong place. Part of the column names of the partly matching subsequent column name is put with the factor level name.
As far as I can tell the function still delivers the right result, just with a confusing name.
Minimal dataset:
Minimal, runnable code:
The same is true when using 'levelsOnly = TRUE' by the way. With this option, dummy variable names become 1, 2, 3, Bar4, Bar5, Bar6, 7, 8, 9.
This is my first bug report on github. Please point out anything that is missing or should be done better. Thanks for all the effort that went into this fantastic and super helpful package!
Session Info:
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin14.0.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.4.3 caret_6.0-64 ggplot2_2.0.0 lattice_0.20-33 plyr_1.8.3
loaded via a namespace (and not attached):
[1] Rcpp_0.12.3 magrittr_1.5 splines_3.2.3 MASS_7.3-45 munsell_0.4.2
[6] colorspace_1.2-6 R6_2.1.2 foreach_1.4.3 minqa_1.2.4 stringr_1.0.0
[11] car_2.1-1 tools_3.2.3 nnet_7.3-11 pbkrtest_0.4-6 parallel_3.2.3
[16] grid_3.2.3 gtable_0.1.2 nlme_3.1-124 mgcv_1.8-11 quantreg_5.19
[21] DBI_0.3.1 MatrixModels_0.4-1 iterators_1.0.8 assertthat_0.1 lme4_1.1-10
[26] Matrix_1.2-3 nloptr_1.0.4 reshape2_1.4.1 codetools_0.2-14 stringi_1.0-1
[31] scales_0.3.0 stats4_3.2.3 SparseM_1.7