Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

group_by and grouped_df for classes extending tibble #5480

Closed
GiuliaPais opened this issue Aug 14, 2020 · 3 comments
Closed

group_by and grouped_df for classes extending tibble #5480

GiuliaPais opened this issue Aug 14, 2020 · 3 comments
Labels
feature a feature request or enhancement grouping 👨‍👩‍👧‍👦

Comments

@GiuliaPais
Copy link

GiuliaPais commented Aug 14, 2020

As per title, we're developing a package where we use a structure that inherits from tibble, defined as follows:

new_ISADataFrame <- function(x,
    mandVars = c("chr", "integration_locus", "strand"),
    meta = character(), ..., class = character()) {
    stopifnot(is.list(x))
    minLength <- min(vapply(x, length, FUN.VALUE = numeric(1)))

    tibble::new_tibble(x,
        mandatoryVars = mandVars,
        metadata = meta, ...,
        nrow = minLength, class = c(class, "ISADataFrame")
    )
}

Unfortunately when performing grouping on this structure, class isn't preserved and we've found no proper way to implement group_by generic, we've tried this:

new_grouped_ISAdf <- function(x, groups, ..., class = character()) {
    stopifnot(is.ISADataFrame(x))
    dplyr::new_grouped_df(x, groups, ..., class = c(class, "ISADataFrame"))
}

validate_grouped_ISAdf <- function(x, check_bounds = FALSE) {
    dplyr::validate_grouped_df(x, check_bounds)
}

grouped_ISAdf <- function(data, vars,
                          drop = dplyr::group_by_drop_default(data)) {
    if (!is.data.frame(data)) {
        rlang::abort("`data` must be a data frame.")
    }
    if (!is.character(vars)) {
        rlang::abort("`vars` must be a character vector.")
    }

    if (length(vars) == 0) {
        ISADataFrame(data)
    } else {
        groups <- compute_groups(data, vars, drop = drop) #This is not an exported function
        new_grouped_ISAdf(data, groups)
    }
}

group_by.ISADataFrame <- function(.data, ...,
                                  .add = FALSE,
                                  .drop = dplyr::group_by_drop_default(.data)) {
    groups <- dplyr::group_by_prepare(.data, ..., .add = .add)
    grouped_ISAdf(groups$data, groups$group_names, .drop)
}

But as you can see compute_groups is not an exported function from dplyr. Is there any proper way to do this or could this function be exported so we can call it in our implementation?

@DavisVaughan
Copy link
Member

I would probably use NextMethod() from your group_by() method to let dplyr internally call compute_groups(). Then manipulate the results. Something like this:

library(dplyr)

new_special_df <- function(x, ..., class = character()) {
  x <- tibble::as_tibble(x)
  tibble::new_tibble(x = x, ..., nrow = nrow(x), class = c(class, "special_df"))
}

new_grouped_special_df <- function(x, groups, ..., class = character()) {
  class <- c(class, c("grouped_special_df", "grouped_df"))
  new_special_df(x, groups = groups, ..., class = class)
}

group_by.special_df <- function(.data, 
                                ..., 
                                .add = FALSE, 
                                .drop = group_by_drop_default(.data)) {
  out <- NextMethod()
  groups <- group_data(out)
  new_grouped_special_df(out, groups)
}

df <- new_special_df(list(x = 1, y = 2))

class(df)
#> [1] "special_df" "tbl_df"     "tbl"        "data.frame"

gdf <- group_by(df, x)

class(gdf)
#> [1] "grouped_special_df" "grouped_df"         "special_df"        
#> [4] "tbl_df"             "tbl"                "data.frame"

Created on 2020-08-18 by the reprex package (v0.3.0.9001)


Two pain points I ran into here (for us to review internally):

  1. It is really hard to pass nrow to new_tibble() correctly. I think it should use new_data_frame() internally, and nrow should be optional again, like it is in new_data_frame().

  2. new_grouped_special_df() would ideally call new_grouped_df(), but this is tough because we have to insert the "special_df" class between "grouped_df" and "tbl_df". I'm not exactly sure how to make this easier, but there seems to be some need for "wrapping" a subclassed tbl_df with "grouped_df" without dropping the subclass.

@GiuliaPais
Copy link
Author

Proposed solution works indeed for group_by, thanks. Unfortunately we face the same issue when trying to implement summarise, since the constructor for grouped_df is directly called in the body of the function: classes are completely dropped and only grouped_df , tbl_df, tbl, data.frame are preserved

@hadley
Copy link
Member

hadley commented Aug 28, 2020

@GiuliaPais you'll need to supply your own summarise() method as well. Unfortunately dplyr is not really ready for extension in this way, so it's going to be an uphill battle 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement grouping 👨‍👩‍👧‍👦
Projects
None yet
Development

No branches or pull requests

3 participants