-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
preserve order of original dataset when using group_by()? #2159
Comments
data.frame
using group_by()
Thanks, we should preserve order when grouping. Would you like to contribute a testthat test for the desired behavior? |
@krlmlr I'll have to look into that package; I've never done something like that before. Would wrapping the above into a test script work? If this is meant to serve as a Edit: I also don't know what is intended from
|
I may have misread your post. I'm not sure if we want to maintain the original order when grouping after all, #2192 is related. To solve your problem you could turn your character column to a factor with the levels in the order you need for plotting. If the order of the values in the data frame is correct, you can use forcats::fct_inorder(). |
True, as cited above I can also go to factors after If the For whatever reason, in my mind I sort of imagine it grouping by order of appearance (first variable is |
You can also go to factors before grouping: tibble(a = factor(letters, levels = rev(letters))) %>% count(a) @hadley: Is there a specific reason why group_by() also sorts? |
|
|
It seems that dplyr's group_by does sort, at least for character, integer and numeric. It does maintain order for factor. Tested with dplyr 0.7.4: set.seed(4)
char <- sample(LETTERS[1:20],40,replace = TRUE)
int <- sample(1L:20L,40,replace = TRUE)
double <- sample(runif(20),40,replace = TRUE)
x <- tibble(char,int,double,fact=factor(char,levels = unique(char)))
# All group_by results are sorted except the factor
group_by(x,char) %>% do(.[1,'char'])
group_by(x,int) %>% do(.[1,'int'])
group_by(x,double) %>% do(.[1,'double'])
group_by(x,fact) %>% do(.[1,'fact'])
# If group_by does not sort, the first indices should contain the first element (zero-based)
# This is only true for the factor
g <- group_by(x,char);attr(g,'indices')[[1]]
g <- group_by(x,int);attr(g,'indices')[[1]]
g <- group_by(x,double);attr(g,'indices')[[1]]
g <- group_by(x,fact);attr(g,'indices')[[1]] Not sure why group_by is sorting. It seems like it's unnecessary including the additional computational effort. This would make the behavior more like the base function Sometimes sorting is nice, so perhaps it could be an option. If the behavior remains as is, perhaps we can add a sorting note to the group_by documentation. |
Thanks. Would you mind filing a new issue? This comment is likely to get lost otherwise. Please make sure to add references to existing discussions. If you argue with computational effort, we'd need to assess the actual impact on run time. The new gprofiler package is an option (work in progress, documentation coming very soon, currently Linux only). |
Feel free to let me know that this is not a bug. I was just puzzled by the re-ordering of my grouped column. I specifically converted to
character
in my input df in order to not have to constantly re-order the columns for plotting inggplot2
. This isn't the original data, but here's a reproducible example:Result, even without factors:
My actual data contains month names, so consider that
"c", "b", "a"
is already the correct monthly order. What I expected/hoped for was that the groups could be fed intosummarise()
as they appear in the data frame. Otherwise, to get it to play nicely withggplot2
I've just been going back to factors since I can easily specify an order. This:It would be nice not to re-specify the ordering I already have in the data source to the results of
dplyr
'sgroup_by()
whenever I use it.Is there a defacto way to not have to do this? Or what is the suggested route to get back to the ordering that appears in the original data set for, say, plotting discrete variables in the correct order?
The text was updated successfully, but these errors were encountered: