-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Summarising verbs with variable-length outputs #2132
Comments
Could also have With that verb the lengths can be different across results but must be the same across groups. |
Why not simply summarize()-ing into a data frame? iris %>% group_by(Species) %>% summarize(data = list(data_frame(Sepal.Length))) |
It's not the same output structure. I was thinking about these because we're deprecating the purrr df functions and they are more liberal with the kind of outputs they accept. It's true the alternative is not too verbose, but not very expressive either: mtcars %>%
summarize_all(function(x) list(summary(x))) %>%
tidyr::unnest()
mtcars %>%
condense_all(summary) |
@hadley: Please advise. |
Ideally we wouldn't need a separate verb for this, and would instead make This is some what related to the idea of having |
For reference: #2149 (comment) |
Related to #2326 I typically use a library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
condense <- function(.data, ...){
dots <- quos(...)
summarise(.data, ..nested.. = list(tibble(!!!dots)) ) %>%
tidyr::unnest(..nested..)
}
mtcars %>%
group_by(cyl) %>%
condense(col = 1:5, other = 5:1)
#> # A tibble: 15 x 3
#> cyl col other
#> <dbl> <int> <int>
#> 1 4. 1 5
#> 2 4. 2 4
#> 3 4. 3 3
#> 4 4. 4 2
#> 5 4. 5 1
#> 6 6. 1 5
#> 7 6. 2 4
#> 8 6. 3 3
#> 9 6. 4 2
#> 10 6. 5 1
#> 11 8. 1 5
#> 12 8. 2 4
#> 13 8. 3 3
#> 14 8. 4 2
#> 15 8. 5 1
grouped <- mtcars %>% group_by(am)
grouped %>%
condense(
col = rep(mean(cyl), times = round(mean(cyl))),
other = rep(length(col), length(col))
)
#> # A tibble: 12 x 3
#> am col other
#> <dbl> <dbl> <int>
#> 1 0. 6.95 7
#> 2 0. 6.95 7
#> 3 0. 6.95 7
#> 4 0. 6.95 7
#> 5 0. 6.95 7
#> 6 0. 6.95 7
#> 7 0. 6.95 7
#> 8 1. 5.08 5
#> 9 1. 5.08 5
#> 10 1. 5.08 5
#> 11 1. 5.08 5
#> 12 1. 5.08 5 Created on 2018-04-23 by the reprex package (v0.2.0). |
library(dplyr)
mtcars %>%
group_by(am) %>%
group_map(~{
mean_cyl <- mean(.x$cyl)
tibble(
col = rep(mean_cyl, times = round(mean_cyl)),
other = rep(length(col), length(col))
)
})
#> # A tibble: 12 x 3
#> # Groups: am [2]
#> am col other
#> * <dbl> <dbl> <int>
#> 1 0 6.95 7
#> 2 0 6.95 7
#> 3 0 6.95 7
#> 4 0 6.95 7
#> 5 0 6.95 7
#> 6 0 6.95 7
#> 7 0 6.95 7
#> 8 1 5.08 5
#> 9 1 5.08 5
#> 10 1 5.08 5
#> 11 1 5.08 5
#> 12 1 5.08 5 Created on 2018-12-14 by the reprex package (v0.2.1.9000) |
But it is syntactically far from Maybe there's room for a quosure like function between
|
This is obsolete now that |
A new dplyr family of verbs for variable-length output may be useful.
summarise()
it would discard all input columns except for the grouping variables. This allows the output to have a different number of rows than the input.summarise()
, it would not require length 1 results and would only check for equal length within group. Grouping columns would be recycled to these lengths.It could be called
condense()
, though it's only condensing in the sense that it get rids of non-grouping variables. May need a better name.Ungrouped data frame: check squared constraint
This gives us immediately:
For a grouped data frame, we'd check the square constrain within groups:
Relevant discussion: #154
The text was updated successfully, but these errors were encountered: