-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically unpack unnamed df-cols #2326
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Now that we have
From this 🐦 thread https://twitter.com/romain_francois/status/943399604065849344 |
I toyed with this syntax on the tie 📦 here: https://github.com/romainfrancois/tie > iris %>%
+ dplyr::group_by(Species) %>%
+ bow( tie(min, max) := range(Sepal.Length) )
# A tibble: 3 x 3
Species min max
<fct> <dbl> <dbl>
1 setosa 4.30 5.80
2 versicolor 4.90 7.00
3 virginica 4.90 7.90
>
> x <- "min"
> iris %>%
+ dplyr::group_by(Species) %>%
+ bow( tie(!!x, max) := range(Sepal.Length) )
# A tibble: 3 x 3
Species min max
<fct> <dbl> <dbl>
1 setosa 4.30 5.80
2 versicolor 4.90 7.00
3 virginica 4.90 7.90 Now it just does a classic > iris %>%
+ group_by(Species) %>%
+ summarise( ..tmp.. = list(range(Sepal.Length)) ) %>%
+ mutate( min = map_dbl(..tmp.., 1), max = map_dbl(..tmp.., 2) ) %>%
+ select( -..tmp..)
# A tibble: 3 x 3
Species min max
<fct> <dbl> <dbl>
1 setosa 4.30 5.80
2 versicolor 4.90 7.00
3 virginica 4.90 7.90 |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Update on naming: I think it now seems reasonable that named tibbles would produce a df-col: # Tibble is spliced into output, producing two new columns
df %>% summarise(tibble(mean = mean(x), sd = sd(x))
# Produces a single df-col containing two variables.
df %>% summarise(summary = tibble(mean = mean(x), sd = sd(x)) Update on sizes: I think it now seems reasonable obvious that columns in summarise must have a size of 1, and columns in mutate must have a size of |
To handle the "quantile" problem, we'll need a df %>% summarise(agg_quantile(x))
df %>% summarise(col_mean(starts_with("x")), col_min(ends_with("y")) We'll have carefully think how these functions compose: what does colwise quantile look like? What if you want to summarise multiple variables with multiple functions? |
Is this implemented already? In the library(tidyverse)
tibble(a = 1) %>% mutate(tibble(b = 2))
#> # A tibble: 1 x 2
#> a b
#> <dbl> <dbl>
#> 1 1 2
tibble(a = 1) %>% mutate(tibble(b = 2), c = b)
#> Error in get(as.character(FUN), mode = "function", envir = envir): object '.f' of mode 'function' was not found Created on 2019-10-24 by the reprex package (v0.3.0) Do we unpack (auto-splice) at the end or right after processing an expression? This is relevant for tidyverse/tibble#581. |
Was mistakingly using compat library(tidyverse)
tibble(a = 1) %>% mutate(tibble(b = 2))
#> # A tibble: 1 x 2
#> a b
#> <dbl> <dbl>
#> 1 1 2
tibble(a = 1) %>% mutate(tibble(b = 2), c = b)
#> # A tibble: 1 x 3
#> a b c
#> <dbl> <dbl> <dbl>
#> 1 1 2 2 |
but yeah, it's on: library(dplyr, warn.conflicts = FALSE)
mtcars %>%
group_by(cyl) %>%
summarise(as_tibble(as.list(quantile(mpg))))
#> # A tibble: 3 x 6
#> cyl `0%` `25%` `50%` `75%` `100%`
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4 21.4 22.8 26 30.4 33.9
#> 2 6 17.8 18.6 19.7 21 21.4
#> 3 8 10.4 14.4 15.2 16.2 19.2 |
Currently
mutate()
andsummarise()
only work with vectorised functions: functions that take a vector as input and return a vector (or "scalar") as output. I don't see any reason whysummarise()
andmutate()
couldn't also accept tibbles. The existing restrictions would continue to apply so that insummarise()
the tibble would have to have exactly one row, and inmutate()
it would have to have either one row or n rows.In other words, the following two lines of code should be equivalent:
This would allow you to extract that repeated pattern out into a function:
We'd need to work on documentation to help people develop effective functions of this nature develop tools so that you could easily specify input variables (using whatever the next iteration of lazyeval provides) and name the outputs. But that's largely a second-order concern: we can figure out those details later.
Supporting tibbles in this way would be particular useful for dplyr as it would help to clarify the nature of functions like
separate()
andunite()
which are currently data frame wrappers around simple vector functions.These ideas are most important for
summarise()
andmutate()
but I think we should apply the same principles tofilter()
andarrange()
as well.cc @lionel- @jennybc @krlmlr
The text was updated successfully, but these errors were encountered: