-
Notifications
You must be signed in to change notification settings - Fork 94
Closed
Labels
featurea feature request or enhancementa feature request or enhancement
Description
I am not sure if this is a bug or if this should be handled as a feature. In fit_helpers, both form_form
and xy_xy
call system.time
with the default value of TRUE
for gcFirst
. This triggers unnecessary garbage collection which slows down significantly e.g. tune_grid
for simple models such as decision trees. For instance the following code:
library(tidymodels)
library(mlbench)
data(PimaIndiansDiabetes)
my_grid <- expand.grid(min_n=2:50)
cv_folds <- vfold_cv(PimaIndiansDiabetes, v = 5, strata="diabetes")
my_model <- decision_tree(cost_complexity=0, min_n=tune()) %>%
set_engine("rpart",xval=0) %>% set_mode("classification")
tune_results <- my_model %>% tune_grid(diabetes~.,
resamples=cv_folds,
grid=my_grid,
metrics=metric_set(accuracy))
runs in roughly 50 seconds on my computer with gcFirst=TRUE
but takes only 20 seconds with gcFirst=FALSE
.
I'm not sure whether precise timing reporting is needed. If this is not the case, maybe gcFirst=FALSE
should always be used. If some flexibility is needed, it could be a user visible option. In any case, the default to gcFirst=TRUE
is adding a significant overhead to simple model fitting.
juliasilge
Metadata
Metadata
Assignees
Labels
featurea feature request or enhancementa feature request or enhancement