Skip to content

Speed of first() evaluated within grouped tibbles #6682

@dkutner

Description

@dkutner

Using dplyr 1.1.0 and vctrs 0.5.2, I'm noticing speed issues when evaluating first within grouped tibbles.

df <- dplyr::group_by(tibble::tibble(x = 1:10000, grp = rep(1:100, 100)), grp)
bench::mark(
  first = dplyr::summarize(
    df,
    f = dplyr::first(x)
  ),
  indexed = dplyr::summarize(
    df,
    f = x[1]
  )
)
#> # A tibble: 2 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 first        7.44ms   7.98ms      119.    1.57MB     19.1
#> 2 indexed      1.01ms   1.06ms      901.   55.62KB     12.7

On dplyr 1.0.10 with vctrs 0.5.0, the first benchmark is about 1.7 ms.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions