Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up tunable.model_spec() #921

Merged
merged 2 commits into from Mar 14, 2023
Merged

speed up tunable.model_spec() #921

merged 2 commits into from Mar 14, 2023

Conversation

simonpcouch
Copy link
Contributor

library(tidymodels)

tunable.model_spec() is called once per resample fit, and several times when tuning hyperparameters.

lr <- linear_reg()

bm <- 
  bench::mark(
    total = fit_resamples(lr, mpg ~ ., bootstraps(mtcars, 100)),
    tunable = replicate(100, tunable(lr)),
    check = FALSE
  )
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.

bm
#> # A tibble: 2 脳 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 total         5.06s    5.06s     0.198   47.69MB     8.11
#> 2 tunable    488.02ms 488.49ms     2.05     3.32MB     8.19

As a percentage of total time, the check takes:

100 * as.numeric(bm$median[2]) / as.numeric(bm$median[1])
#> [1] 9.657983

The rewritten version of the check is as follows (for ease of reprex):

tunable2 <- function(x, ...) {
  mod_env <- get_model_env()
  
  if (is.null(x$engine)) {
    stop("Please declare an engine first using `set_engine()`.", call. = FALSE)
  }
  
  arg_name <- paste0(parsnip:::mod_type(x), "_args")
  if (!(any(arg_name == names(mod_env)))) {
    stop("The `parsnip` model database doesn't know about the arguments for ",
         "model `", mod_type(x), "`. Was it registered?",
         sep = "", call. = FALSE)
  }
  
  arg_vals <- mod_env[[arg_name]]
  arg_vals <- arg_vals[arg_vals$engine == x$engine, c("parsnip", "func")]
  names(arg_vals)[names(arg_vals) == "parsnip"] <- "name"
  names(arg_vals)[names(arg_vals) == "func"] <- "call_info"
  
  extra_args <- c(names(x$args), names(x$eng_args))
  extra_args <- extra_args[!extra_args %in% arg_vals$name]
  
  extra_args_tbl <-
    tibble::new_tibble(
      list(name = extra_args, call_info = vector("list", vctrs::vec_size(extra_args))),
      nrow = vctrs::vec_size(extra_args)
    )
  
  res <- vctrs::vec_rbind(arg_vals, extra_args_tbl)
  
  res$source <- "model_spec"
  res$component <- parsnip:::mod_type(x)
  res$component_id <- "main"
  res$component_id[!res$name %in% names(x$args)] <- "engine"
  
  if (nrow(res) > 0) {
    has_info <- purrr::map_lgl(res$call_info, is.null)
    rm_list <- !(has_info & (res$component_id == "main"))
    
    res <- res[rm_list,]
  }
  
  res[, c("name", "call_info", "source", "component", "component_id")]
}

With benchmarks:

bm2 <- 
  bench::mark(
    old = tunable(lr),
    new = tunable2(lr),
    check = TRUE
  )

Note with check = TRUE in the above, mark() checks equality of the outputs.馃崉 The old check is as.numeric(bm2$median[1]) / as.numeric(bm2$median[2]) times slower than the new one:

as.numeric(bm2$median[1]) / as.numeric(bm2$median[2])
#> [1] 19.12902

The tests live in extratests, and they are fine. :)

Created on 2023-03-12 with reprex v2.0.2

R/tunable.R Outdated Show resolved Hide resolved
Copy link
Member

@EmilHvitfeldt EmilHvitfeldt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! a few notes

Co-authored-by: Emil Hvitfeldt <emilhhvitfeldt@gmail.com>
@simonpcouch simonpcouch merged commit 22c87a8 into main Mar 14, 2023
9 checks passed
@simonpcouch simonpcouch deleted the tunable-speedup branch March 14, 2023 16:59
@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants