-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is slice() meant to give the same results as filter with row_number() on grouped data frames? #2192
Comments
Confirmed. @hadley: Do we guarantee any particular row order after group_by()? |
I think we should preserve the existing order of the rows (i.e. it's a "stable" split) |
@alexfun: Would you like to contribute a testthat test? |
@krlmlr I would be happy to. Let me know if you have any specific requests. |
@hadley @alexfun: I was confused. The following two give identical results (ordered by the grouping variable): tmp_df2 <- data.frame(a = c(1, 3, 2, 4), b = c(1, 2, 3, 4))
tmp_df2 %>%
group_by(a) %>%
slice(1) %>%
ungroup
tmp_df2 %>%
group_by(a) %>%
summarize(b = b[[1]]) %>%
ungroup The filter example returns the data in order of appearance: tmp_df2 %>%
group_by(a) %>%
filter(row_number() == 1) The question is: Is the slice() a summarize-like or a filter-like operation? I tend to think it's more like a filter operation, in this case the observed behavior is a bug. |
I would argue that summarise should also return the grouping variables in the original order presented. I had no idea that summarise exhibits this behaviour as well. This in particular is not good for my workflow, as I tend to like to cbind summary data with data frames with rows sorted in a particular order. To get around the fact that summarise rearranges rows, I will need to have key columns in both summarised data frames and other data frames, and do a join instead of a straight cbind. |
Minimal reprex: library(dplyr, warn.conflicts = FALSE)
df <- tibble(a = c(2, 1), b = c("x", "y")) %>% group_by(a)
df %>% slice(1)
#> Source: local data frame [2 x 2]
#> Groups: a [2]
#>
#> a b
#> <dbl> <chr>
#> 1 1 y
#> 2 2 x
df %>% filter(row_number() == 1)
#> Source: local data frame [2 x 2]
#> Groups: a [2]
#>
#> a b
#> <dbl> <chr>
#> 1 2 x
#> 2 1 y Thinking on it more, I don't think there's any guarantee that the row order should be the same. Relying on row order (instead of doing a join) is extremely dangerous because it will silently fail. So I don't think this is like to get high enough up on my priority list to be fixed. |
From:
http://stackoverflow.com/questions/40187530/in-dplyr-0-5-0-why-does-slice1-not-give-the-same-results-as-filterrow-number?noredirect=1#comment67643041_40187530
Using example data
tmp_df2 <- data.frame(a = c(1, 3, 2, 4), b = c(1, 2, 3, 4))
The operations
produces a different row order to
The text was updated successfully, but these errors were encountered: