Skip to content

Conversation

simonpcouch
Copy link
Contributor

My old helper was slow🦥 Instead of joining, we can perform the filtering on the separate datasets, and with vctrs instead of dplyr. For a quick model fit on a small dataset, this cuts down fit time by ~22%!

With main dev:

library(parsnip)

bench::mark(
  total = fit(linear_reg(), mpg ~ ., mtcars),
  iterations = 25
)
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 total        7.65ms   8.04ms      120.    7.19MB     67.6

With this PR:

#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 total        6.09ms   6.28ms      155.    7.12MB     60.3

Created on 2023-03-17 with reprex v2.0.2

There are additional tests for this infrastructure in extratests—they are fine. :)

Copy link
Member

@EmilHvitfeldt EmilHvitfeldt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking clean!

@simonpcouch simonpcouch merged commit 095499d into main Mar 20, 2023
@simonpcouch simonpcouch deleted the spec-is-possible branch March 20, 2023 12:29
@github-actions
Copy link

github-actions bot commented Apr 5, 2023

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants