Skip to content

group_by() + slice_max() quite slow #216

@mgirlich

Description

@mgirlich

The dplyr version is way faster than the dtplyr one:

library(data.table)
library(dplyr, warn.conflicts = FALSE)
library(dtplyr)
DT <- rbindlist(rep(list(mtcars), 1000), idcol = "id")

DF <- as_tibble(DT)

bench::mark(
  df = DF %>% 
    group_by(id) %>% 
    slice_max(mpg, n = 2),
  dt = DT %>% 
    group_by(id) %>% 
    slice_max(mpg, n = 2) %>% 
    collect(),
  iterations = 2,
  check = FALSE
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 df           90.7ms   97.1ms     10.3     4.45MB     30.9
#> 2 dt          755.2ms  774.6ms      1.29   70.78MB     18.1

Created on 2021-03-04 by the reprex package (v1.0.0)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.3 (2020-10-10)
#>  os       macOS Big Sur 10.16         
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       UTC                         
#>  date     2021-03-04                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date       lib source                          
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 4.0.2)                  
#>  backports     1.2.1      2020-12-09 [1] CRAN (R 4.0.2)                  
#>  bench         1.1.1      2020-01-13 [1] CRAN (R 4.0.2)                  
#>  cli           2.3.1      2021-02-23 [1] CRAN (R 4.0.3)                  
#>  crayon        1.4.1      2021-02-08 [1] CRAN (R 4.0.3)                  
#>  data.table  * 1.14.0     2021-02-21 [1] CRAN (R 4.0.3)                  
#>  DBI           1.1.1      2021-01-15 [1] CRAN (R 4.0.3)                  
#>  digest        0.6.27     2020-10-24 [1] CRAN (R 4.0.2)                  
#>  dplyr       * 1.0.5      2021-02-25 [1] Github (tidyverse/dplyr@7a96866)
#>  dtplyr      * 1.1.0.9000 2021-03-04 [1] local                           
#>  ellipsis      0.3.1      2020-05-15 [1] CRAN (R 4.0.2)                  
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 4.0.1)                  
#>  fansi         0.4.2      2021-01-15 [1] CRAN (R 4.0.2)                  
#>  fs            1.5.0      2020-07-31 [1] CRAN (R 4.0.2)                  
#>  generics      0.1.0      2020-10-31 [1] CRAN (R 4.0.2)                  
#>  glue          1.4.2      2020-08-27 [1] CRAN (R 4.0.2)                  
#>  highr         0.8        2019-03-20 [1] CRAN (R 4.0.2)                  
#>  htmltools     0.5.1.1    2021-01-22 [1] CRAN (R 4.0.2)                  
#>  knitr         1.31       2021-01-27 [1] CRAN (R 4.0.3)                  
#>  lifecycle     1.0.0      2021-02-15 [1] CRAN (R 4.0.3)                  
#>  magrittr      2.0.1      2020-11-17 [1] CRAN (R 4.0.2)                  
#>  pillar        1.5.0      2021-02-22 [1] CRAN (R 4.0.3)                  
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.0.2)                  
#>  profmem       0.6.0      2020-12-13 [1] CRAN (R 4.0.2)                  
#>  purrr         0.3.4      2020-04-17 [1] CRAN (R 4.0.2)                  
#>  R6            2.5.0      2020-10-28 [1] CRAN (R 4.0.2)                  
#>  reprex        1.0.0      2021-01-27 [1] CRAN (R 4.0.2)                  
#>  rlang         0.4.10     2020-12-30 [1] CRAN (R 4.0.2)                  
#>  rmarkdown     2.7        2021-02-19 [1] CRAN (R 4.0.3)                  
#>  rstudioapi    0.13       2020-11-12 [1] CRAN (R 4.0.2)                  
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 4.0.2)                  
#>  stringi       1.5.3      2020-09-09 [1] CRAN (R 4.0.2)                  
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 4.0.2)                  
#>  styler        1.3.2      2020-02-23 [1] CRAN (R 4.0.2)                  
#>  tibble        3.1.0      2021-02-25 [1] CRAN (R 4.0.2)                  
#>  tidyselect    1.1.0      2020-05-11 [1] CRAN (R 4.0.2)                  
#>  utf8          1.1.4      2018-05-24 [1] CRAN (R 4.0.2)                  
#>  vctrs         0.3.6.9000 2021-02-17 [1] Github (r-lib/vctrs@9af59e9)    
#>  withr         2.4.1      2021-01-26 [1] CRAN (R 4.0.2)                  
#>  xfun          0.21       2021-02-10 [1] CRAN (R 4.0.3)                  
#>  yaml          2.2.1      2020-02-01 [1] CRAN (R 4.0.2)                  
#> 
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions