-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XGBoost parallelization fails when twoClassSummary is the first summaryFunc used #1106
Comments
I also see a difference but my times aren't nearly so bad (at least on my setup, see below). I would stay away form benchmarking with r-devel unless it is very close to release. Perhaps @HenrikBengtsson might have an idea. library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2
set.seed(1)
# create fake data
test_dat <- twoClassSim(20)
library("doFuture")
#> Loading required package: globals
#> Loading required package: future
#>
#> Attaching package: 'future'
#> The following object is masked from 'package:caret':
#>
#> cluster
#> Loading required package: foreach
#> Loading required package: iterators
#> Loading required package: parallel
registerDoFuture()
plan(multiprocess, workers = parallel::detectCores() - 1)
#> Warning: [ONE-TIME WARNING] Forked processing ('multicore') is disabled
#> in future (>= 1.13.0) when running R from RStudio, because it is
#> considered unstable. Because of this, plan("multicore") will fall
#> back to plan("sequential"), and plan("multiprocess") will fall back to
#> plan("multisession") - not plan("multicore") as in the past. For more details,
#> how to control forked processing or not, and how to silence this warning in
#> future R sessions, see ?future::supportsMulticore
system.time({
set.seed(252)
mod <- train(
Class ~ .,
data = test_dat,
method = "xgbTree",
tuneLength = 1,
nthread = 1,
trControl = trainControl(search = "random")
)
})
#> user system elapsed
#> 0.739 0.050 27.367
train_control <- trainControl(search = 'random',
classProbs = TRUE,
summaryFunction = twoClassSummary)
system.time({
set.seed(252)
mod <-
train(
Class ~ .,
data = test_dat,
method = "xgbTree",
tuneLength = 10,
nthreads = 1,
trControl = train_control,
metric = 'ROC'
)
})
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> user system elapsed
#> 0.905 0.040 2.196 Created on 2020-01-01 by the reprex package (v0.3.0) Session infodevtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 3.6.1 (2019-07-05)
#> os macOS Mojave 10.14.6
#> system x86_64, darwin15.6.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2020-01-01
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
#> backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.0)
#> callr 3.4.0 2019-12-09 [1] CRAN (R 3.6.0)
#> caret * 6.0-84.9000 2020-01-02 [1] Github (topepo/caret@1a2fcb2)
#> class 7.3-15 2019-01-01 [1] CRAN (R 3.6.0)
#> cli 2.0.0 2019-12-09 [1] CRAN (R 3.6.0)
#> codetools 0.2-16 2018-12-24 [1] CRAN (R 3.6.1)
#> colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
#> data.table 1.12.6 2019-10-18 [1] CRAN (R 3.6.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
#> devtools 2.2.1 2019-09-24 [1] CRAN (R 3.6.0)
#> dials 0.0.4 2019-12-02 [1] CRAN (R 3.6.1)
#> DiceDesign 1.8-1 2019-07-31 [1] CRAN (R 3.6.0)
#> digest 0.6.23 2019-11-23 [1] CRAN (R 3.6.1)
#> doFuture * 0.8.1 2019-07-20 [1] CRAN (R 3.6.0)
#> dplyr 0.8.3 2019-07-04 [1] CRAN (R 3.6.0)
#> e1071 1.7-3 2019-11-26 [1] CRAN (R 3.6.0)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
#> fansi 0.4.0 2018-10-05 [1] CRAN (R 3.6.0)
#> foreach * 1.4.7 2019-07-27 [1] CRAN (R 3.6.0)
#> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
#> future * 1.15.1 2019-11-25 [1] CRAN (R 3.6.0)
#> generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0)
#> ggplot2 * 3.2.1 2019-08-10 [1] CRAN (R 3.6.0)
#> globals * 0.12.5 2019-12-07 [1] CRAN (R 3.6.0)
#> glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0)
#> gower 0.2.1 2019-05-14 [1] CRAN (R 3.6.0)
#> GPfit 1.0-8 2019-02-08 [1] CRAN (R 3.6.0)
#> gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.0)
#> highr 0.8 2019-03-20 [1] CRAN (R 3.6.0)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.0)
#> ipred 0.9-9 2019-04-28 [1] CRAN (R 3.6.0)
#> iterators * 1.0.12 2019-07-26 [1] CRAN (R 3.6.0)
#> knitr 1.26 2019-11-12 [1] CRAN (R 3.6.0)
#> lattice * 0.20-38 2018-11-04 [1] CRAN (R 3.6.1)
#> lava 1.6.6 2019-08-01 [1] CRAN (R 3.6.0)
#> lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.0)
#> lhs 1.0.1 2019-02-03 [1] CRAN (R 3.6.0)
#> lifecycle 0.1.0 2019-08-01 [1] CRAN (R 3.6.0)
#> listenv 0.8.0 2019-12-05 [1] CRAN (R 3.6.1)
#> lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.6.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
#> MASS 7.3-51.5 2019-12-20 [1] CRAN (R 3.6.0)
#> Matrix 1.2-18 2019-11-27 [1] CRAN (R 3.6.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
#> ModelMetrics 1.2.2 2018-11-03 [1] CRAN (R 3.6.0)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.0)
#> nlme 3.1-140 2019-05-12 [1] CRAN (R 3.6.1)
#> nnet 7.3-12 2016-02-02 [1] CRAN (R 3.6.0)
#> parsnip 0.0.4.9000 2019-12-14 [1] local
#> pillar 1.4.3 2019-12-20 [1] Github (r-lib/pillar@e2e7926)
#> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.0)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
#> plyr 1.8.5 2019-12-10 [1] CRAN (R 3.6.0)
#> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0)
#> pROC 1.15.3 2019-07-21 [1] CRAN (R 3.6.0)
#> processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.0)
#> prodlim 2019.11.13 2019-11-17 [1] CRAN (R 3.6.0)
#> ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0)
#> purrr 0.3.3 2019-10-18 [1] CRAN (R 3.6.0)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.0)
#> Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.0)
#> recipes 0.1.8 2019-12-18 [1] local
#> remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0)
#> reshape2 1.4.3 2017-12-11 [1] CRAN (R 3.6.0)
#> rlang 0.4.2 2019-11-23 [1] CRAN (R 3.6.0)
#> rmarkdown 2.0 2019-12-12 [1] CRAN (R 3.6.0)
#> rpart 4.1-15 2019-04-12 [1] CRAN (R 3.6.0)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
#> scales 1.1.0 2019-11-18 [1] CRAN (R 3.6.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
#> stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
#> survival 2.40-1 2016-10-30 [1] CRAN (R 3.6.1)
#> testthat 2.3.1 2019-12-01 [1] CRAN (R 3.6.0)
#> tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.0)
#> tidyr 1.0.0 2019-09-11 [1] CRAN (R 3.6.0)
#> tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.6.0)
#> timeDate 3043.102 2018-02-21 [1] CRAN (R 3.6.0)
#> usethis 1.5.1.9000 2019-12-18 [1] Github (r-lib/usethis@b2e894e)
#> vctrs 0.2.1 2019-12-17 [1] CRAN (R 3.6.1)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
#> workflows 0.1.0 2019-12-30 [1] CRAN (R 3.6.1)
#> xfun 0.11 2019-11-12 [1] CRAN (R 3.6.0)
#> xgboost 0.90.0.2 2019-08-01 [1] CRAN (R 3.6.0)
#> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0)
#> zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.0)
#>
#> [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library and library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2
set.seed(1)
# create fake data
test_dat <- twoClassSim(20)
library("doFuture")
#> Loading required package: globals
#> Loading required package: future
#>
#> Attaching package: 'future'
#> The following object is masked from 'package:caret':
#>
#> cluster
#> Loading required package: foreach
#> Loading required package: iterators
#> Loading required package: parallel
registerDoFuture()
plan(multiprocess, workers = parallel::detectCores() - 1)
#> Warning: [ONE-TIME WARNING] Forked processing ('multicore') is disabled
#> in future (>= 1.13.0) when running R from RStudio, because it is
#> considered unstable. Because of this, plan("multicore") will fall
#> back to plan("sequential"), and plan("multiprocess") will fall back to
#> plan("multisession") - not plan("multicore") as in the past. For more details,
#> how to control forked processing or not, and how to silence this warning in
#> future R sessions, see ?future::supportsMulticore
# system.time({
# set.seed(252)
# mod <- train(
# Class ~ .,
# data = test_dat,
# method = "xgbTree",
# tuneLength = 1,
# nthread = 1,
# trControl = trainControl(search = "random")
# )
# })
train_control <- trainControl(search = 'random',
classProbs = TRUE,
summaryFunction = twoClassSummary)
system.time({
set.seed(252)
mod <-
train(
Class ~ .,
data = test_dat,
method = "xgbTree",
tuneLength = 10,
nthreads = 1,
trControl = train_control,
metric = 'ROC'
)
})
#> Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
#> There were missing values in resampled performance measures.
#> user system elapsed
#> 0.782 0.047 27.792 Created on 2020-01-01 by the reprex package (v0.3.0) Session infodevtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 3.6.1 (2019-07-05)
#> os macOS Mojave 10.14.6
#> system x86_64, darwin15.6.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2020-01-01
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
#> backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.0)
#> callr 3.4.0 2019-12-09 [1] CRAN (R 3.6.0)
#> caret * 6.0-84.9000 2020-01-02 [1] Github (topepo/caret@1a2fcb2)
#> class 7.3-15 2019-01-01 [1] CRAN (R 3.6.0)
#> cli 2.0.0 2019-12-09 [1] CRAN (R 3.6.0)
#> codetools 0.2-16 2018-12-24 [1] CRAN (R 3.6.1)
#> colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
#> data.table 1.12.6 2019-10-18 [1] CRAN (R 3.6.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
#> devtools 2.2.1 2019-09-24 [1] CRAN (R 3.6.0)
#> dials 0.0.4 2019-12-02 [1] CRAN (R 3.6.1)
#> DiceDesign 1.8-1 2019-07-31 [1] CRAN (R 3.6.0)
#> digest 0.6.23 2019-11-23 [1] CRAN (R 3.6.1)
#> doFuture * 0.8.1 2019-07-20 [1] CRAN (R 3.6.0)
#> dplyr 0.8.3 2019-07-04 [1] CRAN (R 3.6.0)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
#> fansi 0.4.0 2018-10-05 [1] CRAN (R 3.6.0)
#> foreach * 1.4.7 2019-07-27 [1] CRAN (R 3.6.0)
#> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
#> future * 1.15.1 2019-11-25 [1] CRAN (R 3.6.0)
#> generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0)
#> ggplot2 * 3.2.1 2019-08-10 [1] CRAN (R 3.6.0)
#> globals * 0.12.5 2019-12-07 [1] CRAN (R 3.6.0)
#> glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0)
#> gower 0.2.1 2019-05-14 [1] CRAN (R 3.6.0)
#> GPfit 1.0-8 2019-02-08 [1] CRAN (R 3.6.0)
#> gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.0)
#> highr 0.8 2019-03-20 [1] CRAN (R 3.6.0)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.0)
#> ipred 0.9-9 2019-04-28 [1] CRAN (R 3.6.0)
#> iterators * 1.0.12 2019-07-26 [1] CRAN (R 3.6.0)
#> knitr 1.26 2019-11-12 [1] CRAN (R 3.6.0)
#> lattice * 0.20-38 2018-11-04 [1] CRAN (R 3.6.1)
#> lava 1.6.6 2019-08-01 [1] CRAN (R 3.6.0)
#> lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.0)
#> lhs 1.0.1 2019-02-03 [1] CRAN (R 3.6.0)
#> lifecycle 0.1.0 2019-08-01 [1] CRAN (R 3.6.0)
#> listenv 0.8.0 2019-12-05 [1] CRAN (R 3.6.1)
#> lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.6.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
#> MASS 7.3-51.5 2019-12-20 [1] CRAN (R 3.6.0)
#> Matrix 1.2-18 2019-11-27 [1] CRAN (R 3.6.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
#> ModelMetrics 1.2.2 2018-11-03 [1] CRAN (R 3.6.0)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.0)
#> nlme 3.1-140 2019-05-12 [1] CRAN (R 3.6.1)
#> nnet 7.3-12 2016-02-02 [1] CRAN (R 3.6.0)
#> parsnip 0.0.4.9000 2019-12-14 [1] local
#> pillar 1.4.3 2019-12-20 [1] Github (r-lib/pillar@e2e7926)
#> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.0)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
#> plyr 1.8.5 2019-12-10 [1] CRAN (R 3.6.0)
#> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0)
#> pROC 1.15.3 2019-07-21 [1] CRAN (R 3.6.0)
#> processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.0)
#> prodlim 2019.11.13 2019-11-17 [1] CRAN (R 3.6.0)
#> ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0)
#> purrr 0.3.3 2019-10-18 [1] CRAN (R 3.6.0)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.0)
#> Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.0)
#> recipes 0.1.8 2019-12-18 [1] local
#> remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0)
#> reshape2 1.4.3 2017-12-11 [1] CRAN (R 3.6.0)
#> rlang 0.4.2 2019-11-23 [1] CRAN (R 3.6.0)
#> rmarkdown 2.0 2019-12-12 [1] CRAN (R 3.6.0)
#> rpart 4.1-15 2019-04-12 [1] CRAN (R 3.6.0)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
#> scales 1.1.0 2019-11-18 [1] CRAN (R 3.6.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
#> stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
#> survival 2.40-1 2016-10-30 [1] CRAN (R 3.6.1)
#> testthat 2.3.1 2019-12-01 [1] CRAN (R 3.6.0)
#> tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.0)
#> tidyr 1.0.0 2019-09-11 [1] CRAN (R 3.6.0)
#> tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.6.0)
#> timeDate 3043.102 2018-02-21 [1] CRAN (R 3.6.0)
#> usethis 1.5.1.9000 2019-12-18 [1] Github (r-lib/usethis@b2e894e)
#> vctrs 0.2.1 2019-12-17 [1] CRAN (R 3.6.1)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
#> workflows 0.1.0 2019-12-30 [1] CRAN (R 3.6.1)
#> xfun 0.11 2019-11-12 [1] CRAN (R 3.6.0)
#> xgboost 0.90.0.2 2019-08-01 [1] CRAN (R 3.6.0)
#> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0)
#> zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.0)
#>
#> [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library |
Here they are when run in the terminal instead of the RStudio IDE (where I see no difference) > library(caret)
Loading required package: lattice
Loading required package: ggplot2
set.seed(1)
# create fake data
test_dat <- twoClassSim(20)
library("doFuture")
registerDoFuture()
plan(multiprocess, workers = parallel::detectCores() - 1)
system.time({
set.seed(252)
mod <- train(
Class ~ .,
data = test_dat,
method = "xgbTree",
tuneLength = 1,
nthread = 1,
trControl = trainControl(search = "random")
)
})
train_control <- trainControl(search = 'random',
classProbs = TRUE,
summaryFunction = twoClassSummary)
system.time({
set.seed(252)
mod <-
train(
Class ~ .,
data = test_dat,
method = "xgbTree",
tuneLength = 10,
nthreads = 1,
trControl = train_control,
metric = 'ROC'
)
})
> set.seed(1)
>
> # create fake data
> test_dat <- twoClassSim(20)
>
> library("doFuture")
Loading required package: globals
Loading required package: future
Attaching package: ‘future’
The following object is masked from ‘package:caret’:
cluster
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
> registerDoFuture()
> plan(multiprocess, workers = parallel::detectCores() - 1)
>
> system.time({
+ set.seed(252)
+ mod <- train(
+ Class ~ .,
+ data = test_dat,
+ method = "xgbTree",
+ tuneLength = 1,
+ nthread = 1,
+ trControl = trainControl(search = "random")
+ )
+ })
user system elapsed
3.595 1.820 1.202
>
> train_control <- trainControl(search = 'random',
+ classProbs = TRUE,
+ summaryFunction = twoClassSummary)
>
> system.time({
+ set.seed(252)
+ mod <-
+ train(
+ Class ~ .,
+ data = test_dat,
+ method = "xgbTree",
+ tuneLength = 10,
+ nthreads = 1,
+ trControl = train_control,
+ metric = 'ROC'
+ )
+ })
user system elapsed
17.026 5.149 2.237
Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures. and > library(caret)
Loading required package: lattice
Loading required package: ggplot2
set.seed(1)
# create fake data
test_dat <- twoClassSim(20)
library("doFuture")
registerDoFuture()
plan(multiprocess, workers = parallel::detectCores() - 1)
# system.time({
# set.seed(252)
# mod <- train(
# Class ~ .,
# data = test_dat,
# method = "xgbTree",
# tuneLength = 1,
# nthread = 1,
# trControl = trainControl(search = "random")
# )
# })
train_control <- trainControl(search = 'random',
classProbs = TRUE,
summaryFunction = twoClassSummary)
system.time({
set.seed(252)
mod <-
train(
Class ~ .,
data = test_dat,
method = "xgbTree",
tuneLength = 10,
nthreads = 1,
trControl = train_control,
metric = 'ROC'
)
})
> set.seed(1)
>
> # create fake data
> test_dat <- twoClassSim(20)
>
> library("doFuture")
Loading required package: globals
Loading required package: future
Attaching package: ‘future’
The following object is masked from ‘package:caret’:
cluster
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
> registerDoFuture()
> plan(multiprocess, workers = parallel::detectCores() - 1)
>
> # system.time({
> # set.seed(252)
> # mod <- train(
> # Class ~ .,
> # data = test_dat,
> # method = "xgbTree",
> # tuneLength = 1,
> # nthread = 1,
> # trControl = trainControl(search = "random")
> # )
> # })
>
> train_control <- trainControl(search = 'random',
+ classProbs = TRUE,
+ summaryFunction = twoClassSummary)
>
> system.time({
+ set.seed(252)
+ mod <-
+ train(
+ Class ~ .,
+ data = test_dat,
+ method = "xgbTree",
+ tuneLength = 10,
+ nthreads = 1,
+ trControl = train_control,
+ metric = 'ROC'
+ )
+ })
user system elapsed
17.392 5.128 2.261
Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures. |
Hi, there could be multiple problems going on here. First, because 'xgboost' uses multi-threading internally, I suspect it is not safe to use forked parallel processing. Second, it looks like there are some non-exportable object/globals needed in the parallel processing that corrupts the R state. For troubleshooting, it's useful to understand which foreach backends end up using forked processing (aka "multicore" processing). All of the following end up use the same forked processes framework of the 'parallel' package, which is also what
Comment: Note that For simplicity, let's use: doFuture::registerDoFuture()
future::plan("multicore", workers = 2L) Using this, I can reproduce the original problem - calling So, forked processing and multi-threading should be avoided in R, a first attempt is to disable (OpenMPI) multi-threading used by xgboost. This can be done by setting:
before R starts, e.g. by setting it in an RhpcBLASctl::omp_set_num_threads(threads = 1L) in R before calling The above solves the problem; the script no longer stalls. At this point, I thought the combination of forked processing and multi-threading was the sole reason. However, it looks like there's more to it because it also stalls when using PSOCK clusters; doFuture::registerDoFuture()
future::plan("multisession", workers = 2L) Since the workers in a PSOCK cluster, contrary to forked workers, are R processes independent of the main R session, I would not expect multi-threading to be an issue. In other words, there's something else going on too. BTW, note that all of the following creates PSOCK clusters:
What surprises me is that disabling multi-threading (as above) solves the problem. Thus, there is something that requires xgboost to run in single-threaded mode when used in parallel code. This could be specific to caret or could apply for all parallel setups - I don't know. Finally, by setting: options(future.globals.onReference = "error") before calling Error: Detected a non-exportable reference ('externalptr') in one of the globals (<unknown>) used in the future expression which suggests that a non-exportable object is used by the workers and this could cause a race condition. The traceback for the above error is: 1: source("caret_issue_1106.R")
2: withVisible(eval(ei, envir))
3: eval(ei, envir)
4: eval(ei, envir)
5: caret_issue_1106.R#81: system.time({
mod <- train(Class ~ ., data = test_dat, method = "xgbTree", tuneLength = 10, nthreads = 1,
6: caret_issue_1106.R#82: train(Class ~ ., data = test_dat, method = "xgbTree", tuneLength = 10, nthreads = 1, trControl = train_control
7: train.formula(Class ~ ., data = test_dat, method = "xgbTree", tuneLength = 10, nthreads = 1, trControl = train_control, metric = "ROC
8: train(x, y, weights = w, ...)
9: train.default(x, y, weights = w, ...)
10: nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, method = models, ppOpts = preProcess, ctrl = trControl, lev = cla
11: foreach(iter = seq(along = resampleIndex), .combine = "c", .verbose = FALSE, .export = export, .packages = "caret") %:% foreach(parm
12: e$fun(obj, substitute(ex), parent.frame(), e$data)
13: getGlobalsAndPackages_doFuture(expr, envir = envir, export = obj$export, noexport = c(obj$noexport, argnames), packages = obj$package
14: getGlobalsAndPackages(expr, envir = globals_envir, globals = TRUE)
15: system.time({
assert_no_references(globals, action = action)
}, gcFirst = FALSE)
16: assert_no_references(globals, action = action) DetailsIf one run with 11: foreach(iter = seq(along = resampleIndex), .combine = "c", .verbose = FALSE, .export = export, .packages = "car
12: e$fun(obj, substitute(ex), parent.frame(), e$data)
13: mclapply(argsList, FUN, mc.preschedule = preschedule, mc.set.seed = set.seed, mc.silent = silent, mc.cores = co
14: selectChildren(ac[!fin], -1) and if we use a PSOCK cluster, it is stuck at: 11: foreach(iter = seq(along = resampleIndex), .combine = "c", .verbose = FALSE, .export = export, .packages = "car
12: e$fun(obj, substitute(ex), parent.frame(), e$data)
13: clusterApplyLB(cl, argsList, evalWrapper)
14: dynamicClusterApply(cl, fun, length(x), argfun)
15: recvOneResult(cl)
16: recvOneData(cl)
17: recvOneData.SOCKcluster(cl)
18: socketSelect(socklist) Using
I'm sure someone can dig into all the code and show there's some low-level multi-threaded race conditions going on here. |
FWIW, it's been on my todo list for quite a while to automagically protect against using multi-threading and forked processing at the same time. A few days ago I decided to go ahead an implement this in the future framework. Most likely it'll automatically set |
Thanks @HenrikBengtsson ; this is extremely helpful and I would have never figured this out. I suspect that this would also apply to |
You're welcome - I'm happy to help. Yeah, I think this happens for other packages too, but I don't know enough to tell exactly when multi-threading works and doesn't work with forked processing. It's on my way-too-long wishlist to read up on this and dive into the details - so this would be less of a "guessing game" (at least for me). I'm sure there are folks in the R community who have a better sense of what's going on and would be able to put together a list of do's and don't's. |
Overview
When using caret to train an XGBoost model with parallelization at the resample level, execution continues indefinitely (or at least, for more than several hours in my tests), when the first
summaryFunc
used is of typetwoClassSummary
.For the same code, if a call is first made to
train()
on any arbitrary data, however, the executions proceeds as expected, and in the case of the example below, finishes in seconds.This effect does not appear to depend on the parallelization back-end used; tested with:
It also occurs if the
twoClassSummary
is used along-side of other summary functions as part of a yardstickmetric_set()
.Based on this behavior, I'm guessing there is some kind of initialization of the interface with the parallelization back-end that normally occurs during the first invocation to
train()
which is not taking place as-expected, in the case below.Minimal, reproducible example:
To see the behavior in action, first run the code below as-is, and (at least in the linux environment I'm testing it in), it will likely continue running for a long time.
Next, uncomment the commented first call to
train()
and re-run the script. It will execute in parallel and finish quickly as expected.Session Info:
Tested for two different versions of R/caret the same Linux machine:
R 3.6.1
6.0-84
.R nightly (2019-12-10)
The text was updated successfully, but these errors were encountered: