Skip to content

Performance drop-off for arrange() #4962

@markfairbanks

Description

@markfairbanks

The dev version of dplyr has a performance drop-off when using arrange().

library(dplyr, warn.conflicts = FALSE)
library(bench)

data_size <- 1000000
test_df <- tibble(a = sample(c("a","a","b","c","d"), data_size, TRUE),
                  b = sample(1:20, data_size, TRUE))

bench::mark(
  tidyverse = arrange(test_df, a, b),
  check = FALSE,
  iterations = 5)

# dplyr 0.8.5
#> # A tibble: 1 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 tidyverse     369ms    372ms      2.69    19.2MB     4.04

# dplyr 1.0.0 dev
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 tidyverse     2.18s    2.24s     0.446    48.3MB    0.804

Created on 2020-03-10 by the reprex package (v0.3.0)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions