Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plans on adding group_map/group_modify/group_walk support? #108

Closed
TobiRoby opened this issue Sep 27, 2019 · 4 comments
Closed

Plans on adding group_map/group_modify/group_walk support? #108

TobiRoby opened this issue Sep 27, 2019 · 4 comments
Labels
feature a feature request or enhancement

Comments

@TobiRoby
Copy link

Love dtplyr! I am using the newest version (lazy evaluation) as often as possible.
Sadly there is no support for group_map, group_modify, group_walk functionality.

Are there any plans on adding them in the future?

Here is a failing example:

library(data.table)
library(dtplyr)
library(dplyr, warn.conflicts = FALSE)
library(stringi)

# specify size 
n <- 1e5

# generate data.table with random data
data_dt <- data.table(id=stri_rand_strings(n, 1, pattern = "[A-Z]"),
                      product=stri_rand_strings(n, 3, pattern = "[A-Z]"),
                      date=sample(seq(as.Date('2019/01/01'), as.Date('2019/04/01'), by="day"), n, replace=TRUE),
                      amount=sample(1:10000,n,replace=TRUE),
                      price=rnorm(n, mean = 100, sd = 20))

data_dt_lazy <- lazy_dt(data_dt)

lag_amount_per_group <- function(group, ...){
  group %>% 
    lazy_dt() %>%
    mutate(amount_lag7 = lag(amount, n = 7)) %>% 
    as.data.table()
}

result_dtplyr <- data_dt_lazy %>%
  arrange(date) %>%
  group_by(id) %>%
  group_modify(lag_amount_per_group) %>%
  as.data.table()

Error in UseMethod("group_split") :
no applicable method for 'group_split' applied to an object of class "c('dtplyr_step_group', 'dtplyr_step')"


Also a working example with native data.table and dplyr:

result_dtplyr <- data_dt %>%
  arrange(date) %>%
  group_by(id) %>%
  group_modify(lag_amount_per_group)

result_native_dt <- data_dt[order(date)][, lag_amount_per_group(.SD), keyby=.(id)]
@hadley
Copy link
Member

hadley commented Oct 4, 2019

group_walk() seems like it wouldn't be a good fit here, because side-effects and lazy computation are a weird match. I think the best it could do would be to create force computation with a warning. I think the same is probably true for group_map(), since it has to return a list, and making the results lazy would require some new data structure.

I think group_modify() could return a lazy_dt though:

library(data.table)
library(dplyr, warn.conflicts = FALSE)

dt <- data.table(g = c(1, 1, 2), x = 1:3, y = 2:4, z = 3:5)

myfun <- function(df, key) {
  data.frame(xy = df$x + df$y, yz = df$y + df$z)
}
dt %>% 
  as_tibble() %>% 
  group_by(g) %>% 
  group_modify(myfun)
#> # A tibble: 3 x 3
#> # Groups:   g [2]
#>       g    xy    yz
#>   <dbl> <int> <int>
#> 1     1     3     5
#> 2     1     5     7
#> 3     2     7     9

dt[, as.data.table(myfun(.SD)), key = g]
#>    g xy yz
#> 1: 1  3  5
#> 2: 1  5  7
#> 3: 2  7  9

Created on 2019-10-04 by the reprex package (v0.3.0)

@hadley
Copy link
Member

hadley commented Oct 4, 2019

Implementing this will require a group_split() method that collects (with a warning), and a new lazy step_modify().

@hadley
Copy link
Member

hadley commented Oct 4, 2019

Need #110 before I implement group_split():

group_split.dtplyr_step <- function(x, ...) {
  warn("Forcing evaluation of lazy data table in group_split()")
  
  group_split(collect(x), ...)
}

@hadley
Copy link
Member

hadley commented Oct 4, 2019

That approach doesn't work because group_map() also needs group_keys(), and implementing the methods individual will be inefficient (it would have to collect the lazy_dt twice). Unfortunately, I can't currently provide a group_map() method because it's not generic (tidyverse/dplyr#4576).

Once that issue is fixed, I think group_map() should look something like this:

#' @export
#' @importFrom dplyr group_split
group_map.dtplyr_step <- function(.tbl, .f, ..., keep = FALSE) {
  dt <- as.data.table(.tbl)
  dt[, list(list(.f(.SD, .BY, ...))), by = .tbl$groups]$V1
}

hadley added a commit that referenced this issue Oct 4, 2019
@hadley hadley added the feature a feature request or enhancement label Jan 25, 2021
hadley added a commit to smingerson/dtplyr that referenced this issue Jan 28, 2021
@hadley hadley closed this as completed in 6482016 Jan 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants