Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unreliable random numbers produced when using doFuture backend #377

mbac opened this issue May 2, 2021 · 9 comments · Fixed by #383

Unreliable random numbers produced when using doFuture backend #377

mbac opened this issue May 2, 2021 · 9 comments · Fixed by #383
upkeep maintenance, infrastructure, and similar


Copy link

mbac commented May 2, 2021


I’m using doFuture as a foreach backend. This is an approximation of the code I’m trying to run:

Edit: forgot the doFuture code:


cores <- parallelly::availableCores()
# Only option available on Macs, as I understand it:

tr_te_split <- initial_split(cells %>% select(-case), prop = 3/4)
cell_train <- training(tr_te_split)
cell_test  <- testing(tr_te_split)

folds <- vfold_cv(cell_train, v = 10)

cell_rec <- recipe(
    class ~ .,
    data = cell_train

boost_forest_mod <- boost_tree(
    mtry = tune(),
    trees = tune(),
    min_n = tune(),
    learn_rate = tune(),
    tree_depth = tune(),
    loss_reduction = tune(),
    sample_size = tune(),
    stop_iter = tune()
) %>%
    set_engine("xgboost") %>%

workflow_cells <- workflow() %>%
    add_recipe(cell_rec) %>%

workflow_cells_tuned <- workflow_cells %>%
        grid = 20,
        metrics = metric_set(roc_auc, precision, recall)

The tuning procedure seems to work, but I’m getting warnings for each iteration of the doFuture backend (I guess):

UNRELIABLE VALUE: One of the foreach() iterations (‘doFuture-7’) unexpectedly generated random numbers without
declaring so. There is a risk that those random numbers are not statistically sound and the overall results might be
invalid. To fix this, use ‘%dorng%’ from the ‘doRNG’ package instead of ‘%dopar%’. This ensures that proper,
parallel-safe random numbers are produced via the L’Ecuyer-CMRG method. To disable this check, set option
‘future.rng.onMisuse’ to “ignore”.

My session info:

> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] xgboost_1.4.1.1          rcompanion_2.4.0         doFuture_0.12.0-9000    
 [4] future_1.21.0            foreach_1.5.1            multilevelmod_0.0.0.9000
 [7] REDCapR_0.11.1.9004      dtplyr_1.1.0             readxl_1.3.1            
[10] yardstick_0.0.8          workflowsets_0.0.2       workflows_0.2.2         
[13] tune_0.1.5               rsample_0.0.9            recipes_0.1.16          
[16] parsnip_0.1.5.9002       modeldata_0.1.0          infer_0.5.4             
[19] dials_0.0.9.9000         scales_1.1.1             broom_0.7.6             
[22] tidymodels_0.1.3.9000    forcats_0.5.1            stringr_1.4.0           
[25] dplyr_1.0.5              purrr_0.3.4              readr_1.4.0             
[28] tidyr_1.1.3              tibble_3.1.1             ggplot2_3.3.3           
[31] tidyverse_1.3.1          pacman_0.5.1             devtools_2.4.0          
[34] usethis_2.0.1           

loaded via a namespace (and not attached):
  [1] backports_1.2.1    plyr_1.8.6         splines_4.0.4      listenv_0.8.0     
  [5] TH.data_1.0-10     digest_0.6.27      fansi_0.4.2        checkmate_2.0.0   
  [9] magrittr_2.0.1     memoise_2.0.0      remotes_2.3.0      globals_0.14.0    
 [13] modelr_0.1.8       gower_0.2.2        matrixStats_0.58.0 sandwich_3.0-0    
 [17] hardhat_0.1.5      prettyunits_1.1.1  colorspace_2.0-0   rvest_1.0.0       
 [21] haven_2.4.0        xfun_0.22          callr_3.7.0        crayon_1.4.1      
 [25] jsonlite_1.7.2     libcoin_1.0-8      Exact_2.1          zoo_1.8-9         
 [29] survival_3.2-10    iterators_1.0.13   glue_1.4.2         gtable_0.3.0      
 [33] ipred_0.9-11       pkgbuild_1.2.0     mvtnorm_1.1-1      DBI_1.1.1         
 [37] Rcpp_1.0.6         GPfit_1.0-8        proxy_0.4-25       stats4_4.0.4      
 [41] lava_1.6.9         prodlim_2019.11.13 httr_1.4.2         modeltools_0.2-23 
 [45] ellipsis_0.3.2     farver_2.1.0       pkgconfig_2.0.3    multcompView_0.1-8
 [49] nnet_7.3-15        dbplyr_2.1.1       utf8_1.2.1         labeling_0.4.2    
 [53] tidyselect_1.1.1   rlang_0.4.11       DiceDesign_1.9     munsell_0.5.0     
 [57] cellranger_1.1.0   tools_4.0.4        cachem_1.0.4       cli_2.5.0         
 [61] generics_0.1.0     EMT_1.1            fastmap_1.1.0      processx_3.5.2    
 [65] knitr_1.33         fs_1.5.0           coin_1.4-1         rootSolve_1.8.2.1 
 [69] tictoc_1.0         xml2_1.3.2         compiler_4.0.4     rstudioapi_0.13   
 [73] curl_4.3.1         e1071_1.7-6        testthat_3.0.2     reprex_2.0.0      
 [77] lhs_1.1.1          DescTools_0.99.41  stringi_1.5.3      ps_1.6.0          
 [81] desc_1.3.0         lattice_0.20-41    Matrix_1.3-2       conflicted_1.0.4  
 [85] vctrs_0.3.8        pillar_1.6.0       lifecycle_1.0.0    furrr_0.2.2       
 [89] lmtest_0.9-38      data.table_1.14.0  lmom_2.8           R6_2.5.0          
 [93] parallelly_1.25.0  gld_2.6.2          sessioninfo_1.1.1  codetools_0.2-18  
 [97] boot_1.3-27        MASS_7.3-53.1      assertthat_0.2.1   pkgload_1.2.1     
[101] rprojroot_2.0.2    nortest_1.0-4      withr_2.4.2        multcomp_1.4-17   
[105] expm_0.999-6       parallel_4.0.4     hms_1.0.0          grid_4.0.4        
[109] rpart_4.1-15       timeDate_3043.102  class_7.3-18       pROC_1.17.0.1     
[113] lubridate_1.7.10
Copy link

In #349 we switched to generating seeds with L'Ecuyer-CMRG, which is parallel safe. I am pretty sure this is a false positive warning here, but is there a way for us to not trigger this warning?

Copy link

It looks like doFuture hard codes the seed argument of the future() call it makes to FALSE, which triggers the warning if any RNG manipulating code is run in the expression it runs in parallel.

If the user is using %dorng% through doRNG, it looks like an exception is made for that - and for BiocParallel?

Unfortunately it doesn't look like there is another way to say "hey we are already using parallel safe rng here", even though we are.

We generate L'Ecuyer-CMRG seeds in the main process here:

seeds <- generate_seeds(rng, n_resamples)

And assign them here:

assign(".Random.seed", seed, envir = globalenv())

@HenrikBengtsson do you have any thoughts on how we can avoid the warning? We can't use doRNG directly, because we use nested foreach loops and those aren't supported by doRNG. Instead, we do what is suggested by doRNG's vignette in section 5.1 here

Copy link

topepo commented May 4, 2021

Could we run the foreach blocks within a with_options(list(future.rng.onMisuse = "ignore"))?

Copy link

Could we run the foreach blocks within a with_options(list(future.rng.onMisuse = "ignore"))?

Yes, that was going to be my suggestion. Or, the slightly better one doFuture.rng.onMisuse = "ignore" - that's a bit more specific on what it targets.

The long-term real solution for foreach and pRNG? It's in RevolutionAnalytics/foreach#6

@juliasilge juliasilge added the upkeep maintenance, infrastructure, and similar label May 4, 2021
Copy link

JalalAl-Tamimi commented May 21, 2021

Hi all, I am following up on this to potentially report a bug? I ran the code below, and when using "registerDoRNG(123456)", and specify "parallel_over" either NULL, "resamples" or "everything", I get an error in the final step:

> wkfl_tidym_final <- last_fit(wkfl_tidym_best, split = tr_te_split)
Error in (function (obj, ex)  : 
  nested/conditional foreach loops are not supported yet.
See the package's vignette for a work around.

If instead, I use 
I do not get the error. However, this does not allow to replicate the results, when running separate R sessions.. 

I though one needs to use:

with the doRNG package, no?


Here is a working example:

doFuture.rng.onMisuse = "ignore"
ncores <- availableCores()
cat(paste0("Number of cores available for model calculations set to ", ncores, "."))
cl <- parallelly::makeClusterPSOCK(ncores)
plan(cluster, workers = cl)


tr_te_split <- initial_split(iris, strata = "Species", prop = 3/4)
iris_train <- training(tr_te_split)
iris_test  <- testing(tr_te_split)

folds <- vfold_cv(iris_train, v = 10)

iris_rec <- iris_train %>% 
  Species ~ .) %>% 

engine_tidym <- rand_forest(
  mode = "classification",
  mtry = 2,
  trees = 500,
  min_n = 1
) %>% 
  set_engine("ranger", importance = "permutation", sample.fraction = 0.632,
             replace = FALSE, write.forest = T, splitrule = "extratrees",
             scale.permutation.importance = FALSE) # we add engine specific settings

workflow_iris <- workflow() %>%
  add_recipe(iris_rec) %>%

workflow_iris_tuned <- 
    resamples = folds,
    grid = 2,
    metrics = metric_set(roc_auc, precision, recall),
    control = control_grid(save_pred = TRUE, parallel_over = "everything")

grid_tidym_best <- select_best(workflow_iris_tuned, metric = "roc_auc")
wkfl_tidym_best <- finalize_workflow(workflow_iris, grid_tidym_best)
wkfl_tidym_final <- last_fit(wkfl_tidym_best, split = tr_te_split)

Copy link

topepo commented May 28, 2021

You do not need to use doRNG; if you comment out the line calling registerDoRNG(), you will be fine.

The warning about unreliable values is a false positive; we manually set the seeds inside the workers so that we can get reproducible results.

Copy link

And silence the warning with:

options(doFuture.rng.onMisuse = "ignore")

Copy link

JalalAl-Tamimi commented May 28, 2021 via email

topepo added a commit that referenced this issue Jun 1, 2021
topepo added a commit that referenced this issue Jun 5, 2021
* Increment version number

* use future option for #377

* dev version bump

* news update

* one-line solution from Davis
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jun 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
upkeep maintenance, infrastructure, and similar
None yet

Successfully merging a pull request may close this issue.

6 participants