Skip to content

slice() unnecessarily evaluates ... twice for each group #377

@eutwt

Description

@eutwt

slice() could be made faster when the arguments in ... are expensive to compute by evaluating ... only once per group. In the example below where the argument to slice takes one second to compute and there are two groups, the code takes ~4 seconds to run instead of ~2.

library(dtplyr)
library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 4.1.2
library(data.table, warn.conflicts = FALSE)

dt <- data.table(id = c(1, 2))

return_1 <- function() {Sys.sleep(1); 1}

library(tictoc)
tic()
dt %>% 
  lazy_dt() %>% 
  group_by(id) %>% 
  slice(return_1())
#> Source: local data table [2 x 1]
#> Groups: id
#> Call:   `_DT1`[`_DT1`[, .I[return_1()[between(return_1(), -.N, .N)]], 
#>     by = .(id)]$V1]
#> 
#>      id
#>   <dbl>
#> 1     1
#> 2     2
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
toc()
#> 4.074 sec elapsed

tic()
dt[dt[,
  {
    slice_ind <- return_1()
    .I[slice_ind[between(slice_ind, -.N, .N)]]
  },
  by = .(id)
]$V1]
#>    id
#> 1:  1
#> 2:  2
toc()
#> 2.014 sec elapsed

Created on 2022-07-19 by the reprex package (v2.0.1)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions