Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up get_model_spec() helper #901

Merged
merged 3 commits into from Mar 7, 2023
Merged

Conversation

simonpcouch
Copy link
Contributor

@simonpcouch simonpcouch commented Mar 6, 2023

This PR speeds up the get_model_spec() helper. I'm first just pushing some new tests so we can run checks and ensure that results don't change.

This helper is called once per model specification fit (or translation).

library(tidymodels)

bt <- bootstraps(mtcars, 100)

bm <- 
  bench::mark(
    total = 
      fit_resamples(linear_reg(), mpg ~ ., bootstraps(mtcars, 100)),
    get_model_spec = 
      replicate(100, parsnip:::get_model_spec("linear_reg", "regression", "lm")),
    check = FALSE
  )
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.

bm
#> # A tibble: 2 脳 6
#>   expression          min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 total              5.2s     5.2s     0.192   73.57MB    10.6 
#> 2 get_model_spec  258.6ms  342.9ms     2.92     2.42MB     7.29

As a percentage of total time, the helper takes a bit more than 6.5% of the evaluation time:

100 * as.numeric(bm$median[2]) / as.numeric(bm$median[1])
#> [1] 6.588154

The rewritten version of the helper is below (just for purposes of self-contained reprex):

get_model_spec2 <- function(model, mode, engine) {
  m_env <- get_model_env()
  env_obj <- rlang::env_names(m_env)
  env_obj <- grep(model, env_obj, value = TRUE)
  
  res <- list()
  
  libs <- rlang::env_get(m_env, paste0(model, "_pkgs"))
  libs <- vctrs::vec_slice(libs$pkg, libs$engine == engine)
  res$libs <- if (length(libs) > 0) {libs[[1]]} else {NULL}
  
  fits <- rlang::env_get(m_env, paste0(model, "_fit"))
  fits <- vctrs::vec_slice(fits$value, fits$mode == mode & fits$engine == engine)
  res$fit <- if (length(fits) > 0) {fits[[1]]} else {NULL}
  
  preds <- rlang::env_get(m_env, paste0(model, "_predict"))
  where <- preds$mode == mode & preds$engine == engine
  types <- vctrs::vec_slice(preds$type, where)
  values <- vctrs::vec_slice(preds$value, where)
  names(values) <- types
  res$pred <- values
  
  res
}

With benchmarks:

bm2 <- 
  bench::mark(
    old = parsnip:::get_model_spec("linear_reg", "regression", "lm"),
    new = get_model_spec2("linear_reg", "regression", "lm"),
    check = TRUE
  )

Note with check = TRUE in the above, mark() checks equality of the outputs. :) The new check is as.numeric(bm2$median[1]) / as.numeric(bm2$median[2]) times faster than the old one:

as.numeric(bm2$median[1]) / as.numeric(bm2$median[2])
#> [1] 24.86973

Created on 2023-03-06 with reprex v2.0.2

cc @DavisVaughan--thanks for the rewrite.馃弰 Note that this reprex was generated with dev dplyr--quite a big speedup already with their changes.

``` r
boop <- list(a = 1)

bench::mark(
    base = if (length(boop) > 0) {boop[[1]]} else {NULL},
    purrr = purrr::pluck(boop, 1),
    dplyr = dplyr::first(boop),
    iterations = 100
)
#> # A tibble: 3 脳 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 base       122.94ns 163.91ns  3179808.        0B        0
#> 2 purrr        2.01碌s   2.52碌s   305755.   22.66KB        0
#> 3 dplyr        5.29碌s   6.07碌s   141894.    8.12MB        0
```

<sup>Created on 2023-03-06 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>
@simonpcouch simonpcouch requested a review from hfrick March 6, 2023 15:13
Copy link
Member

@hfrick hfrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 馃弰 do we need a specific version of vctrs for this?

@simonpcouch
Copy link
Contributor Author

That's a good thought! There are no NEWS updates re: vec_slice() after our minimum 0.4.1, and the helper runs fine with that version, so I believe we should be good to go.

@simonpcouch simonpcouch merged commit 74439c6 into main Mar 7, 2023
9 checks passed
@simonpcouch simonpcouch deleted the get-model-spec-speedup branch March 7, 2023 12:40
@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants