Skip to content

Figure out why step_dummy() is slow with many dummy variables #1305

@EmilHvitfeldt

Description

@EmilHvitfeldt

Originally posted in #1253. This might just be a artifact of us handling the predictors one by one, but it is a stark difference

library(tidymodels)

make_factor <- function(x) {
  factor(sample(c("A", "B"), 100, TRUE), levels = c("A", "B"))
} 

x <- map(1:1001, make_factor) %>%
  set_names(c("outcome", paste0("x", 1:1000))) %>%
  as_tibble()

rec <- recipe(outcome ~ ., data = x) %>%
  step_dummy(all_nominal_predictors())

lr_mod <- logistic_reg()

lr_wf <- workflow() %>%
  add_model(lr_mod) %>%
  add_recipe(rec)

tictoc::tic("with recipes")
tmp <- lr_wf %>% fit(data = x)
tictoc::toc()
#> with recipes: 5.496 sec elapsed

tictoc::tic("without recipes")
tmp <- lr_mod %>% fit(outcome ~ ., data = x)
tictoc::toc()
#> without recipes: 0.437 sec elapsed

Metadata

Metadata

Assignees

Labels

bugan unexpected problem or unintended behaviorfeaturea feature request or enhancement

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions