speed up `get_model_spec()` helper #901

simonpcouch · 2023-03-06T14:29:25Z

This PR speeds up the get_model_spec() helper. I'm first just pushing some new tests so we can run checks and ensure that results don't change.

This helper is called once per model specification fit (or translation).

library(tidymodels)

bt <- bootstraps(mtcars, 100)

bm <- 
  bench::mark(
    total = 
      fit_resamples(linear_reg(), mpg ~ ., bootstraps(mtcars, 100)),
    get_model_spec = 
      replicate(100, parsnip:::get_model_spec("linear_reg", "regression", "lm")),
    check = FALSE
  )
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.

bm
#> # A tibble: 2 × 6
#>   expression          min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 total              5.2s     5.2s     0.192   73.57MB    10.6 
#> 2 get_model_spec  258.6ms  342.9ms     2.92     2.42MB     7.29

As a percentage of total time, the helper takes a bit more than 6.5% of the evaluation time:

100 * as.numeric(bm$median[2]) / as.numeric(bm$median[1])
#> [1] 6.588154

The rewritten version of the helper is below (just for purposes of self-contained reprex):

get_model_spec2 <- function(model, mode, engine) {
  m_env <- get_model_env()
  env_obj <- rlang::env_names(m_env)
  env_obj <- grep(model, env_obj, value = TRUE)
  
  res <- list()
  
  libs <- rlang::env_get(m_env, paste0(model, "_pkgs"))
  libs <- vctrs::vec_slice(libs$pkg, libs$engine == engine)
  res$libs <- if (length(libs) > 0) {libs[[1]]} else {NULL}
  
  fits <- rlang::env_get(m_env, paste0(model, "_fit"))
  fits <- vctrs::vec_slice(fits$value, fits$mode == mode & fits$engine == engine)
  res$fit <- if (length(fits) > 0) {fits[[1]]} else {NULL}
  
  preds <- rlang::env_get(m_env, paste0(model, "_predict"))
  where <- preds$mode == mode & preds$engine == engine
  types <- vctrs::vec_slice(preds$type, where)
  values <- vctrs::vec_slice(preds$value, where)
  names(values) <- types
  res$pred <- values
  
  res
}

With benchmarks:

bm2 <- 
  bench::mark(
    old = parsnip:::get_model_spec("linear_reg", "regression", "lm"),
    new = get_model_spec2("linear_reg", "regression", "lm"),
    check = TRUE
  )

Note with check = TRUE in the above, mark() checks equality of the outputs. :) The new check is as.numeric(bm2$median[1]) / as.numeric(bm2$median[2]) times faster than the old one:

as.numeric(bm2$median[1]) / as.numeric(bm2$median[2])
#> [1] 24.86973

^{Created on 2023-03-06 with reprex v2.0.2}

cc @DavisVaughan--thanks for the rewrite.🏄 Note that this reprex was generated with dev dplyr--quite a big speedup already with their changes.

``` r boop <- list(a = 1) bench::mark( base = if (length(boop) > 0) {boop[[1]]} else {NULL}, purrr = purrr::pluck(boop, 1), dplyr = dplyr::first(boop), iterations = 100 ) #> # A tibble: 3 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> #> 1 base 122.94ns 163.91ns 3179808. 0B 0 #> 2 purrr 2.01µs 2.52µs 305755. 22.66KB 0 #> 3 dplyr 5.29µs 6.07µs 141894. 8.12MB 0 ``` <sup>Created on 2023-03-06 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>

hfrick

Nice! 🏄 do we need a specific version of vctrs for this?

simonpcouch · 2023-03-06T16:53:07Z

That's a good thought! There are no NEWS updates re: vec_slice() after our minimum 0.4.1, and the helper runs fine with that version, so I believe we should be good to go.

github-actions · 2023-03-25T00:56:42Z

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

simonpcouch added 3 commits March 6, 2023 09:20

test get_model_spec() helper

e758cde

speed up get_model_spec() helper

8ab4cc2

simonpcouch requested a review from hfrick March 6, 2023 15:13

hfrick approved these changes Mar 6, 2023

View reviewed changes

simonpcouch merged commit 74439c6 into main Mar 7, 2023
9 checks passed

simonpcouch deleted the get-model-spec-speedup branch March 7, 2023 12:40

github-actions bot locked and limited conversation to collaborators Mar 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speed up `get_model_spec()` helper #901

speed up `get_model_spec()` helper #901

simonpcouch commented Mar 6, 2023 •

edited

hfrick left a comment

simonpcouch commented Mar 6, 2023

github-actions bot commented Mar 25, 2023

speed up get_model_spec() helper #901

speed up get_model_spec() helper #901

Conversation

simonpcouch commented Mar 6, 2023 • edited

hfrick left a comment

Choose a reason for hiding this comment

simonpcouch commented Mar 6, 2023

github-actions bot commented Mar 25, 2023

speed up `get_model_spec()` helper #901

speed up `get_model_spec()` helper #901

simonpcouch commented Mar 6, 2023 •

edited