Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid purrr::partial(), which captures the calling environment #396



Copy link

Related to #384

While looking into #384, I learned that our usage of purrr::partial() in tune_grid_loop() is really hurting performance. The "partialized" function captures its surrounding environment, and that includes the resamples object. This gets fully serialized and sent out to each worker, meaning that each worker was being sent an extra n_resamples copies of the data set (beyond what we were already doing by mistake that #384 pointed out).

For example, with the example from #384 (comment) (tweaked to have 5000 rows), each of my workers was peaking at around 12gb of memory, and with this PR they now peak at 6gb. That it still way too much per worker, but this is the first step in the right direction.

Since the iteration function is no longer defined in the environment where foreach() is called, we have to extract it with getFromNamespace() to be able to call it. We could also do tune:::, but R CMD Check gives us a warning about that.

That calling environment can hold massive objects, like `resamples`, which will get needlessly sent to each worker
@DavisVaughan DavisVaughan merged commit 4a16ec9 into tidymodels:master Jul 23, 2021
@DavisVaughan DavisVaughan deleted the feature/purrr-partial-issues branch July 23, 2021 12:54
Copy link

github-actions bot commented Aug 7, 2021

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Aug 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
None yet
None yet

Successfully merging this pull request may close these issues.

None yet

1 participant