The dev version of dplyr has a performance drop-off when using arrange().
library(dplyr, warn.conflicts = FALSE)
library(bench)
data_size <- 1000000
test_df <- tibble(a = sample(c("a","a","b","c","d"), data_size, TRUE),
b = sample(1:20, data_size, TRUE))
bench::mark(
tidyverse = arrange(test_df, a, b),
check = FALSE,
iterations = 5)
# dplyr 0.8.5
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 tidyverse 369ms 372ms 2.69 19.2MB 4.04
# dplyr 1.0.0 dev
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 tidyverse 2.18s 2.24s 0.446 48.3MB 0.804
Created on 2020-03-10 by the reprex package (v0.3.0)
The dev version of dplyr has a performance drop-off when using
arrange().Created on 2020-03-10 by the reprex package (v0.3.0)