Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implicit arrange in group_by #1026

Closed
rubenarslan opened this issue Mar 17, 2015 · 3 comments
Closed

implicit arrange in group_by #1026

rubenarslan opened this issue Mar 17, 2015 · 3 comments

Comments

@rubenarslan
Copy link

In some cases I think it would make sense if group_by %>% summarise automatically sorted by the grouping variables.

At the moment I write

swed.1 %>% group_by(paternalage.factor) %>% arrange(paternalage.factor) %>% 
summarise(ever_married = mean(ever_married))

Without the arrange dplyr keeps the data in some order, probably after the first occurrence of each level. I think this rarely is the desired behaviour. I'm not sure what the heuristic would be as to when an implicit arrange would be nice, but I'm pretty sure it always makes sense when you group by one variable and then summarise.

@hadley
Copy link
Member

hadley commented Mar 17, 2015

Sorting is expensive, so we can't do it automatically on large data. And in your case it would be faster to arrange after the summary.

@hadley hadley closed this as completed Mar 17, 2015
@rubenarslan
Copy link
Author

Oh, you're of course right about sorting after the summary. I wanted to suggest doing it automatically only if a summary leads to small data, not in every case.

@krlmlr
Copy link
Member

krlmlr commented Sep 9, 2015

I think this somehow became the default now, which was rather unexpected to me. I'm not sure this is desired, because a NEWS entry for dplyr 0.4.0 reads:

group_by() on a data table preserves original order of the rows (#623)

and there seem to be no relevant news items that suggest the opposite in more recent versions.

Test with CRAN version (0.4.3):

> data_frame(a = rev(letters[1:3])) %>% group_by(a) %>% ungroup
Source: local data frame [3 x 1]

      a
  (chr)
1     c
2     b
3     a
> data_frame(a = rev(letters[1:3])) %>% group_by(a) %>% summarize() %>% ungroup
Source: local data frame [3 x 1]

      a
  (chr)
1     a
2     b
3     c
> data_frame(a = rev(letters[1:3])) %>% group_by(a) %>% do(data_frame(b=1)) %>% ungroup
Source: local data frame [3 x 2]

      a     b
  (chr) (dbl)
1     a     1
2     b     1
3     c     1

krlmlr pushed a commit to krlmlr/import.gen that referenced this issue Sep 9, 2015
@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants