First, my many, many thanks for your wonderful contributions to the R community. caret has saved me many hours over the years.
The issue I've found occurs only on the first call to createFolds() after a fresh R session (or a restart). The second and all subsequent iterations of the call will behave consistently. This error has been reproduced on versions of MRO from 3.2.3 - 3.2.5 and GNU R from 3.2.3 - 3.3.1 using caret 6.0-52, 6.0-64, and 6.0-70 (current CRAN version). The error does not occur when the package is loaded via library(), but does occur after a fresh restart if the createFolds() function is called via caret::createFolds(). Unfortunately in this case, the latter convention has become a standard of mine. Again, the inconsistent result only happens on the very first call after a restart and only when using the :: calling convention.
I only found this error after banging my head against my desk reviewing some model results in a rather nasty nested CV that I was trying to step back through without a full restart. Of course, since I had not restarted, I got the normal/expected result on run 2, 3, ... , etc., so I was totally baffled why I could not reproduce my results on the batch run I'd just completed. With another restart, I saw the difference in my outer fold assignments on the first run only. Some colleagues that were still in the office have reproduced the error using the junk example below, but we each played with several variations (e.g., straight calls one after the other, apply(), for{}, etc.) and found the issue present in every setup. It's late, so perhaps I'm missing something...
Minimal, runnable code:
# BAD: Restart and run this...
lapply(1:5, function(x) {
set.seed(1234)
head(caret::createFolds(mtcars$cyl, k=3, list=FALSE))
})
# Returns: 2 1 2 3 3 3 then 3 2 3 1 3 2, 3 2 3 1 3 2, . . . on FIRST run.
# But all subsequent resubmissions of the code will return expected results.
# GOOD: Now restart and run this
library(caret)
lapply(1:5, function(x) {
set.seed(1234)
head(createFolds(mtcars$cyl, k=3, list=FALSE))
})
# Returns: 3 2 3 1 3 2 for all iterations and all subsequent resubmissions of the code
Example run:
Restarting R session...
> # BAD: Restart and run this...
> lapply(1:5, function(x) {
+ set.seed(1234)
+ head(caret::createFolds(mtcars$cyl, k=3, list=FALSE))
+ })
[[1]]
[1] 2 1 2 3 3 3
[[2]]
[1] 3 2 3 1 3 2
[[3]]
[1] 3 2 3 1 3 2
[[4]]
[1] 3 2 3 1 3 2
[[5]]
[1] 3 2 3 1 3 2
Restarting R session...
> # GOOD: Now restart and run this
> library(caret)
Loading required package: lattice
Loading required package: ggplot2
> lapply(1:5, function(x) {
+ set.seed(1234)
+ head(createFolds(mtcars$cyl, k=3, list=FALSE))
+ })
[[1]]
[1] 3 2 3 1 3 2
[[2]]
[1] 3 2 3 1 3 2
[[3]]
[1] 3 2 3 1 3 2
[[4]]
[1] 3 2 3 1 3 2
[[5]]
[1] 3 2 3 1 3 2
Session Info:
> sessionInfo()
R version 3.2.5 (2016-04-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] RevoUtilsMath_3.2.5 Rfiglet_1.0 fortunes_1.5-2
loaded via a namespace (and not attached):
[1] Rcpp_0.12.5 magrittr_1.5 splines_3.2.5 MASS_7.3-45 munsell_0.4.3
[6] colorspace_1.2-6 lattice_0.20-33 foreach_1.4.3 minqa_1.2.4 stringr_1.0.0
[11] car_2.1-1 plyr_1.8.4 tools_3.2.5 parallel_3.2.5 nnet_7.3-12
[16] pbkrtest_0.4-6 caret_6.0-70 grid_3.2.5 gtable_0.2.0 nlme_3.1-125
[21] mgcv_1.8-12 quantreg_5.21 MatrixModels_0.4-1 iterators_1.0.8 lme4_1.1-11
[26] Matrix_1.2-4 nloptr_1.0.4 reshape2_1.4.1 ggplot2_2.1.0 codetools_0.2-14
[31] stringi_1.1.1 scales_0.4.0 stats4_3.2.5 SparseM_1.7
update.packages(oldPkgs="caret", ask=FALSE)sessionInfo()First, my many, many thanks for your wonderful contributions to the R community.
carethas saved me many hours over the years.The issue I've found occurs only on the first call to
createFolds()after a fresh R session (or a restart). The second and all subsequent iterations of the call will behave consistently. This error has been reproduced on versions of MRO from 3.2.3 - 3.2.5 and GNU R from 3.2.3 - 3.3.1 using caret 6.0-52, 6.0-64, and 6.0-70 (current CRAN version). The error does not occur when the package is loaded vialibrary(), but does occur after a fresh restart if the createFolds() function is called viacaret::createFolds(). Unfortunately in this case, the latter convention has become a standard of mine. Again, the inconsistent result only happens on the very first call after a restart and only when using the::calling convention.I only found this error after banging my head against my desk reviewing some model results in a rather nasty nested CV that I was trying to step back through without a full restart. Of course, since I had not restarted, I got the normal/expected result on run 2, 3, ... , etc., so I was totally baffled why I could not reproduce my results on the batch run I'd just completed. With another restart, I saw the difference in my outer fold assignments on the first run only. Some colleagues that were still in the office have reproduced the error using the junk example below, but we each played with several variations (e.g., straight calls one after the other,
apply(),for{}, etc.) and found the issue present in every setup. It's late, so perhaps I'm missing something...Minimal, runnable code:
Example run:
Session Info: