Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ordering result of group_by #242

Closed
hadley opened this issue Feb 5, 2014 · 3 comments
Closed

Ordering result of group_by #242

hadley opened this issue Feb 5, 2014 · 3 comments

Comments

@hadley
Copy link
Member

@hadley hadley commented Feb 5, 2014

It would be nice to have an option to order the result of group_by so that (e.g.)

mtcars %.% group_by(cyl) %.% summarise(mpg = mean(mpg))

is ordered by increasing cyl. I think this should be the default for relatively small numbers of groups.

It doesn't make any difference to an analysis, but I find it surprisingly distracting.

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Feb 5, 2014

There would be some extra cost. Essentially a sort but in the order of the number of groups. So relatively small compared to the training of the original hash map.

But the upside obviously is that we get predictable results and it simplifies reading the output. So I'd vote for making this the default.

@hadley
Copy link
Member Author

@hadley hadley commented Feb 5, 2014

I think it could be the default if n_groups < 1e4 - then the cost will be minimal, and if you have more groups than that, you're less likely to care about the order.

@mnel
Copy link
Contributor

@mnel mnel commented Feb 14, 2014

In reference to closed issue #263 group_by should retain the order of appearance the grouped data by default. (Which it currently does not).

df <- data.frame(users=c(1,2,3),items=1:3) %.%group_by(users)
#Source: local data frame [3 x 2]
#Groups: users
#
#  users items
# 1     1     1
# 2     2     2
# 3     3     3
df%.%group_by(users)%.%summarise(identity(items))
# Source: local data frame [3 x 2]
# 
# users identity(items)
# 1     3               3
# 2     1               1
# 3     2               2

@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants