-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
summarise_each_q naming consistency #442
Comments
It is consistent - it always uses the minimum needed to disambiguate between columns. |
I understand the underlying logic, but if one wants to use the results of summarise_each_q downstream (i.e. programmatically in a larger piece of code), it requires quite some special casing to e.g. pick the right columns depending on the number of funs and vars that were specified. |
I prefer to solve this conundrum by some how describing how the outputs should be named. Any ideas for interfaces? |
Thank you, Hadley. Allowing to explicitly name outputs is a good solution. Below a first idea for an interface. If outputs is NULL (default) the current system could be applied in the presence of a single 'var' (vars of length one):
|
If This way we'd get a tidy data frame. |
The issue here is not just naming inconsistency. More problematic is dplyr's behavior to add variables when there are two summarizing functions and replace existing variables when there is only one summarizing function (see #1259) I would suggest that dplyr adds variable and renames them when the user gives a named function, e.g. This way, there is no need for an extra argument and the behavior is consistent with when there are multiple summarizing functions specified. |
@LaDilettante I like that idea! |
The behaviour of summarise_each changed so that summarised variables get a suffix (see tidyverse/dplyr#442 ).
Currently the naming of output variables does not seem consistent:
Maybe in each of the cases
<var>_<fun>
could be used, as inThe text was updated successfully, but these errors were encountered: