Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

group_by should replace existing grouping #385

Closed
holgerbrandl opened this issue Apr 14, 2014 · 5 comments
Closed

group_by should replace existing grouping #385

holgerbrandl opened this issue Apr 14, 2014 · 5 comments
Milestone

Comments

@holgerbrandl
Copy link

@holgerbrandl holgerbrandl commented Apr 14, 2014

I would love to regroup my data using a chained dplyr command:

group_by(diamonds, color) %.% filter(mean(carat)>0.24) %.% group_by(cut) %.% filter(mean(depth)>60)

The result is:

Source: local data frame [53,940 x 10]
Groups: color, cut

   carat       cut color clarity depth table price    x    y    z
1   0.23     Ideal     E     SI2  61.5    55   326 3.95 3.98 2.43
2   0.21   Premium     E     SI1  59.8    61   326 3.89 3.84 2.31
...

The second group_by is not replacing the first grouping, but it iw just added to it. It took me a while to realize that it needs to be group_by(cut, add=F).

For sure this is a more a usability preference, but especially when it comes to grouped operations, it's easy to miss an incorrect grouping, so it would be imho more clear (especially for new users) to change the default of add to FALSE. Then above code could would behave as written.

In a chained operation users might think about adding add=F. However, when using unchained (line by line) scripting, many will tend to forget that they work with a group table and will get unexpected results when applying another group_by to it.

This seems related to #121 which was tagged as fixed, but I can't see how the fix works.

@hadley
Copy link
Member

@hadley hadley commented Apr 14, 2014

I think you're probably right, and group_by() should default to add = FALSE. I'm a little worried about breaking existing code, but dplyr is still so young that this is probably ok.

@hadley hadley added this to the v0.2 milestone Apr 14, 2014
@hadley hadley closed this in f9bcca0 Apr 14, 2014
@holgerbrandl
Copy link
Author

@holgerbrandl holgerbrandl commented Apr 14, 2014

Thanks a lot for the fix.

Looking through the diff, it seems that you've made a minor mistake when fixing the documentation: Lines 30+31 in group-by.r should be "To instead add to the existing groups use \code{add = TRUE}".

@statsandwich
Copy link

@statsandwich statsandwich commented Apr 15, 2014

I like this change!

Actually, I somehow was not aware of the 'add' argument, so had been wrapping in as.data.frame() as a hack for chaining.

@hadley
Copy link
Member

@hadley hadley commented Apr 15, 2014

@statsandwich there's also ungroup().

@statsandwich
Copy link

@statsandwich statsandwich commented Apr 15, 2014

The reason I didn't use ungroup() was that I frankly did not understand what was going on with the Groups: when I was chaining operations. The behavior was unpredictable to me, so I sought the intermediary comfort of a non-dplyr-class object.

Only saying this as an aside to understand how a less-sophisticated programmer may think. Not dplyr's problem.

@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants