Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to set gcFirst to FALSE in the internal calls to system.time in form_form and xy_xy #611

Closed
fabrice-rossi opened this issue Nov 21, 2021 · 2 comments · Fixed by #614
Labels
feature a feature request or enhancement

Comments

@fabrice-rossi
Copy link

I am not sure if this is a bug or if this should be handled as a feature. In fit_helpers, both form_form and xy_xy call system.time with the default value of TRUE for gcFirst. This triggers unnecessary garbage collection which slows down significantly e.g. tune_grid for simple models such as decision trees. For instance the following code:

library(tidymodels)
library(mlbench)
data(PimaIndiansDiabetes)

my_grid <- expand.grid(min_n=2:50)
cv_folds <- vfold_cv(PimaIndiansDiabetes, v = 5, strata="diabetes")
my_model <- decision_tree(cost_complexity=0, min_n=tune()) %>% 
    set_engine("rpart",xval=0) %>% set_mode("classification")

tune_results <-  my_model %>% tune_grid(diabetes~.,
                                        resamples=cv_folds,
                                        grid=my_grid,
                                        metrics=metric_set(accuracy))

runs in roughly 50 seconds on my computer with gcFirst=TRUE but takes only 20 seconds with gcFirst=FALSE.

I'm not sure whether precise timing reporting is needed. If this is not the case, maybe gcFirst=FALSE should always be used. If some flexibility is needed, it could be a user visible option. In any case, the default to gcFirst=TRUE is adding a significant overhead to simple model fitting.

@topepo
Copy link
Member

topepo commented Nov 29, 2021

We'll turn off the garbage collection. We'll also tie the computing of the execution time to the control option for verbosity so that the new default will be to not compute the execution time.

@juliasilge juliasilge added the feature a feature request or enhancement label Nov 29, 2021
topepo added a commit that referenced this issue Nov 30, 2021
juliasilge added a commit that referenced this issue Nov 30, 2021
* system.time changes for #611

* Edits to documentation

Co-authored-by: Julia Silge <julia.silge@gmail.com>
@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants