Skip to content

Mutate and summarize speed related to parentheses #6681

@dkutner

Description

@dkutner

Using dplyr 1.1.0 and vctrs 0.5.2, I'm noticing speed issues with mutate and summarize related to parentheses on the RHS.

df <- tibble::tibble(x = 1:10000)
bench::mark(
  b1 = dplyr::summarize(
    df,
    res = (1 + 1)
  ),
  b2 = dplyr::summarize(
    df,
    res = 1 + 1
  )
)
#> # A tibble: 2 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 b1           1.47ms   1.53ms      625.    3.78MB     12.6
#> 2 b2           3.13ms   3.38ms      290.   546.9KB     15.6
bench::mark(
  b1 = dplyr::mutate(
    df,
    res = (1 + 1)
  ),
  b2 = dplyr::mutate(
    df,
    res = 1 + 1
  )
)
#> # A tibble: 2 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 b1           1.64ms   1.73ms      557.     363KB     14.9
#> 2 b2           3.23ms    3.5ms      279.     102KB     15.4

On dplyr 1.0.10 with vctrs 0.5.0, all give the same result, ~1.5 ms. I'm using tibble 3.1.8 for both tests. With more complicated aggregation expressions, I've seen 15x slowdowns.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions