New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scoped mutate/summarize with multiple columns AND functions #4125
Conversation
closes #4119 |
Our policy is to not create non-syntactic names, so |
R/colwise-mutate.R
Outdated
@@ -309,6 +309,11 @@ manip_apply_syms <- function(funs, syms, tbl) { | |||
names(out) <- names(funs) | |||
} else { | |||
syms_names <- map_chr(syms, as_string) | |||
names(funs) <- if_else( | |||
names(funs) == "<fn>", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does <fn>
get set as a name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 65 in cf0e282
temp <- exprs_auto_name(temp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually we should add name repairing arguments to exprs_auto_name()
(or its successor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we fix it higher up? i.e. if the list is unnamed, we manually name it fn1
, fn2
etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, and I guess as_fun_list()
would be the right spot.
But it's all part of this complicated set of cases that we try to give nice defaults for, e.g. if there's only one function, named and unnamed do something different and are both legit:
library(dplyr, warn.conflicts = FALSE)
summarise_at(iris, vars(starts_with("Sepal")), list(mean))
#> Sepal.Length Sepal.Width
#> 1 5.843333 3.057333
summarise_at(iris, vars(starts_with("Sepal")), list(mean = mean))
#> Sepal.Length_mean Sepal.Width_mean
#> 1 5.843333 3.057333
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romainfrancois Good point. We should only auto-name when there's more than one function.
I see that tibble creates ..n
suffix to disambiguate names. Is that to draw user attention on these names? I think I'd prefer an n
suffix in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually do need to fix this higher up, otherwise this happens (on the second path):
> summarise_at(iris, vars(Sepal.Width), list(mean, median))
<fn>
1 3
@hadley This is connected to the labelling of literals, i.e. in transmute(mtcars, !!seq_len(nrow(mtcars))) Currently the pillar type label is taken in that case. |
There's also the problem from #2912 when we use list of lambdas calling the same function: library(dplyr, warn.conflicts = FALSE)
iris %>%
summarise_at(vars(starts_with("Sepal")), list(~quantile(., .25), ~quantile(., .75)))
#> Sepal.Length_quantile Sepal.Width_quantile
#> 1 6.4 3.3
iris %>%
summarise_at(vars(starts_with("Sepal")), list(q25 = ~quantile(., .25), q75 = ~quantile(., .75)))
#> Sepal.Length_q25 Sepal.Width_q25 Sepal.Length_q75 Sepal.Width_q75
#> 1 5.1 2.8 6.4 3.3 Created on 2019-01-25 by the reprex package (v0.2.1.9000) |
Can we use |
The name repairing should move to rlang I think, because of all the auto-naming in captured arguments. What is the motivation in using Edit: If yes, do we need to make a difference between names repair and names generation? |
I went the route of library(dplyr, warn.conflicts = FALSE)
# both gets called "quantile"
iris %>%
summarise_at(vars(starts_with("Sepal")), list(~quantile(., .25), ~quantile(., .75)))
#> Sepal.Length_quantile Sepal.Width_quantile
#> 1 6.4 3.3
# both unnamed, so called fn1, fn2
iris %>%
summarise_at(vars(starts_with("Sepal")), list(mean, median))
#> Sepal.Length_fn1 Sepal.Width_fn1 Sepal.Length_fn2 Sepal.Width_fn2
#> 1 5.843333 3.057333 5.8 3
# what if we really wanted a fn1 ?
iris %>%
summarise_at(vars(starts_with("Sepal")), list(mean, median, fn1 = sum))
#> Sepal.Length_fn1 Sepal.Width_fn1 Sepal.Length_fn2 Sepal.Width_fn2
#> 1 876.5 458.6 5.8 3 This looks tangent to name repairing, or name generation as @lionel- hints. Do we need a package specific to naming things ? @jennybc |
Borrowing library(dplyr, warn.conflicts = FALSE)
iris %>%
summarise_at(vars(starts_with("Sepal")), list(~quantile(., .25), ~quantile(., .75)))
#> Sepal.Length_quantile..1 Sepal.Width_quantile..1
#> 1 5.1 2.8
#> Sepal.Length_quantile..2 Sepal.Width_quantile..2
#> 1 6.4 3.3
iris %>%
summarise_at(vars(starts_with("Sepal")), list(mean, median))
#> Sepal.Length_fn..1 Sepal.Width_fn..1 Sepal.Length_fn..2
#> 1 5.843333 3.057333 5.8
#> Sepal.Width_fn..2
#> 1 3
iris %>%
summarise_at(vars(starts_with("Sepal")), list(mean, median, fn..1 = mean))
#> Sepal.Length_fn..1 Sepal.Width_fn..1 Sepal.Length_fn..2
#> 1 5.843333 3.057333 5.8
#> Sepal.Width_fn..2 Sepal.Length_fn..3 Sepal.Width_fn..3
#> 1 3 5.843333 3.057333 |
CC @jennybc. |
That makes sense. In the case of repairing, we need some peppery typography to draw the attention to potential problems. In the case of name generation, using other rules might be ok? Producing readable labels might be a more important goal in that case. |
I'd still think in that case |
Another random though is that perhaps when the function is S3 generic, maybe it could just be named after its name, i.e. what's inside |
We could support other name-generation rules, but for now it feels like |
Interesting idea, but it might make it more unpredictable for users to figure out when they need to supply names. |
Do we want teach users to use |
We don't have to scan all namespaces, only the topenv of the function. But I think we shouldn't go down this road because of performance implications and because this will cause different behaviours for exported and sourced functions. Also the user might have imported the function locally with another name. |
Yeah this would only buy some more time but then make a bigger monster. Using some name repair function on generated names, or some name generation gives us ugly enough names for users to want to change them. We could |
sort of related to #3769 |
a2db1d6
to
d10f01b
Compare
R/colwise-mutate.R
Outdated
# nms[dup] <- paste(nms[dup], which(dup), sep = ".") | ||
# } | ||
|
||
# append ..<position> to duplicates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind copying the relevant parts of name repair from tibble to here, to make sure we're consistent? The current code will misbehave in some corner cases.
I can move the code to a compat-name-repair.R
file in tibble so that you just need to copy that file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Essentially, this needs universal_names(quiet = TRUE)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krlmlr aren't we switching to use ...
because ..n
is non-syntactic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With universal names we add another dot if necessary.
library(tibble)
as_tibble(list(1, a = 2, a = 3), .name_repair = "unique")
#> New names:
#> * `` -> `..1`
#> * a -> a..2
#> * a -> a..3
#> # A tibble: 1 x 3
#> ..1 a..2 a..3
#> <dbl> <dbl> <dbl>
#> 1 1 2 3
as_tibble(list(1, a = 2, a = 3), .name_repair = "universal")
#> New names:
#> * `` -> ...1
#> * a -> a..2
#> * a -> a..3
#> # A tibble: 1 x 3
#> ...1 a..2 a..3
#> <dbl> <dbl> <dbl>
#> 1 1 2 3
Created on 2019-01-29 by the reprex package (v0.2.1.9000)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romainfrancois: Can you please pick up from here, let me know if anything's missing: https://github.com/tidyverse/tibble/blob/r-2.0/R/compat-name-repair.R
…pend ..<position> in that case
08d6f1b
to
b173332
Compare
I think this is ready now, getting this. Prior to calling library(dplyr, warn.conflicts = FALSE)
storms %>%
summarise_at(vars(wind, pressure), list(mean, median))
#> # A tibble: 1 x 4
#> wind_fn..1 pressure_fn..1 wind_fn..2 pressure_fn..2
#> <dbl> <dbl> <dbl> <dbl>
#> 1 53.5 992. 45 999
iris %>%
summarise_at(vars(starts_with("Sepal")), list(~quantile(., .25), ~quantile(., .75)))
#> Sepal.Length_quantile..1 Sepal.Width_quantile..1
#> 1 5.1 2.8
#> Sepal.Length_quantile..2 Sepal.Width_quantile..2
#> 1 6.4 3.3 |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
Created on 2019-01-24 by the reprex package (v0.2.1.9000)