-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify summarise to collapse data #4232
Comments
We are aware that this is a problem, and are running various experimentations about it.
library(dplyr, warn.conflicts = FALSE)
library(splice)
mtcars %>%
group_by(cyl) %>%
summarise(
!!!at_(vars(wt, qsec), sum),
!!!at_(vars(disp, drat), first),
!!!at_(vars(mpg, hp, vs), min, na.rm = TRUE),
!!!at_(vars(am, gear, carb), mean)
)
#> # A tibble: 3 x 11
#> cyl wt qsec disp drat mpg hp vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4 25.1 211. 108 3.85 21.4 52 0 0.727 4.09 1.55
#> 2 6 21.8 126. 160 3.9 17.8 105 0 0.429 3.86 3.43
#> 3 8 56.0 235. 360 3.15 10.4 150 0 0.143 3.29 3.5 Here the Created on 2019-03-04 by the reprex package (v0.2.1.9000)
library(purrr)
library(dplyr, warn.conflicts = FALSE)
library(dance)
mtcars %>%
group_by(cyl) %>%
tango(
swing(sum, wt, qsec),
swing(first, disp, drat),
swing(~min(., na.rm = TRUE), mpg, hp, vs),
swing(mean, am, gear, carb)
)
#> # A tibble: 3 x 11
#> cyl wt qsec disp drat mpg hp vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4 25.1 211. 108 3.85 21.4 52 0 0.727 4.09 1.55
#> 2 6 21.8 126. 160 3.9 17.8 105 0 0.429 3.86 3.43
#> 3 8 56.0 235. 360 3.15 10.4 150 0 0.143 3.29 3.5 The take is slightly different here, as |
Thanks for your answer. Both approaches are really good! The nice with It took me some time to understand how # Functions
library(tidyverse)
summarise2 <- function (.data, ...) {
dots <- quos(...)
dots <- unlist(map_if(dots, names(dots) == "", ~rlang::eval_tidy(.)))
dplyr:::summarise_impl(.data, dots, environment(), rlang::caller_env())
}
eval_context <- function (...) {
calls <- sys.calls()
frames <- sys.frames()
n <- length(frames)
list(.data = frames[[n - 7]]$.data, .env = frames[[n - 7]],
...)
}
chunk <- function (.funs, .vars, ...) {
context <- eval_context()
dplyr:::manip_at(context$.data, .vars, .funs, enquo(.funs), context$.env, ...)
}
# Example
mtcars %>%
group_by(gear) %>%
summarise2(
vs = sum(vs),
chunk(mean, vars(wt, qsec)),
chunk(min, vars(matches("drat|cyl"))),
chunk(first, vars(last_col()))
)
#> # A tibble: 3 x 7
#> gear vs wt qsec cyl drat carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 3 3 3.89 17.7 4 2.76 1
#> 2 4 10 2.62 19.0 4 3.69 4
#> 3 5 1 2.63 15.6 4 3.54 2 Created on 2019-03-24 by the reprex package (v0.2.1) It would be great to allow for |
Duplicate of #2326 |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
Hi,
dplyr is awesome, thank you for that. An issue I have come across multiple times when reviewing and writing code is collapsing data with dplyr. Often users want to have multiple summary functions like min, max, mean, first for different columns.
summarise_at
works great for homogenous tibbles when all columns should be summed up. But since it is destructive of the original tibble it can't be piped with othersummarise_at
using other aggregation functions.A common way to write:
Created on 2019-03-01 by the reprex package (v0.2.1)
The main problem with the above approach is that is tedious, especially if a tibble have more than 50 columns. It is also prone to errors since the user have to double reference column and functions. Basically, it is hard to get efficiency and dynamic code when collapsing data.
A solution would be to let users specify chunks of columns that should be summarised with the same function (and keep original name). I tried to make a simple take on how this would work.
Created on 2019-03-01 by the reprex package (v0.2.1)
Some things that I have not solved are incorporation of
tidyselect
and probably it could work without creating a newsummarise_*
- function. Preferred syntax would be:Is this an issue that is worth digging into?
The text was updated successfully, but these errors were encountered: