Avoid purrr::partial()
, which captures the calling environment
#396
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related to #384
While looking into #384, I learned that our usage of
purrr::partial()
intune_grid_loop()
is really hurting performance. The "partialized" function captures its surrounding environment, and that includes theresamples
object. This gets fully serialized and sent out to each worker, meaning that each worker was being sent an extran_resamples
copies of the data set (beyond what we were already doing by mistake that #384 pointed out).For example, with the example from #384 (comment) (tweaked to have 5000 rows), each of my workers was peaking at around 12gb of memory, and with this PR they now peak at 6gb. That it still way too much per worker, but this is the first step in the right direction.
Since the iteration function is no longer defined in the environment where
foreach()
is called, we have to extract it withgetFromNamespace()
to be able to call it. We could also dotune:::
, but R CMD Check gives us a warning about that.