implicit arrange in group_by #1026

rubenarslan · 2015-03-17T11:15:48Z

In some cases I think it would make sense if group_by %>% summarise automatically sorted by the grouping variables.

At the moment I write

swed.1 %>% group_by(paternalage.factor) %>% arrange(paternalage.factor) %>% 
summarise(ever_married = mean(ever_married))

Without the arrange dplyr keeps the data in some order, probably after the first occurrence of each level. I think this rarely is the desired behaviour. I'm not sure what the heuristic would be as to when an implicit arrange would be nice, but I'm pretty sure it always makes sense when you group by one variable and then summarise.

The text was updated successfully, but these errors were encountered:

hadley · 2015-03-17T11:30:22Z

Sorting is expensive, so we can't do it automatically on large data. And in your case it would be faster to arrange after the summary.

rubenarslan · 2015-03-17T12:25:13Z

Oh, you're of course right about sorting after the summary. I wanted to suggest doing it automatically only if a summary leads to small data, not in every case.

krlmlr · 2015-09-09T00:17:38Z

I think this somehow became the default now, which was rather unexpected to me. I'm not sure this is desired, because a NEWS entry for dplyr 0.4.0 reads:

group_by() on a data table preserves original order of the rows (#623)

and there seem to be no relevant news items that suggest the opposite in more recent versions.

Test with CRAN version (0.4.3):

> data_frame(a = rev(letters[1:3])) %>% group_by(a) %>% ungroup
Source: local data frame [3 x 1]

      a
  (chr)
1     c
2     b
3     a
> data_frame(a = rev(letters[1:3])) %>% group_by(a) %>% summarize() %>% ungroup
Source: local data frame [3 x 1]

      a
  (chr)
1     a
2     b
3     c
> data_frame(a = rev(letters[1:3])) %>% group_by(a) %>% do(data_frame(b=1)) %>% ungroup
Source: local data frame [3 x 2]

      a     b
  (chr) (dbl)
1     a     1
2     b     1
3     c     1

tidyverse/dplyr#1026 (comment)

hadley closed this as completed Mar 17, 2015

krlmlr pushed a commit to krlmlr/import.gen that referenced this issue Sep 9, 2015

need to use factor to preserve sort order

65baefc

tidyverse/dplyr#1026 (comment)

lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implicit arrange in group_by #1026

implicit arrange in group_by #1026

rubenarslan commented Mar 17, 2015

hadley commented Mar 17, 2015

rubenarslan commented Mar 17, 2015

krlmlr commented Sep 9, 2015

implicit arrange in group_by #1026

implicit arrange in group_by #1026

Comments

rubenarslan commented Mar 17, 2015

hadley commented Mar 17, 2015

rubenarslan commented Mar 17, 2015

krlmlr commented Sep 9, 2015