Is slice() meant to give the same results as filter with row_number() on grouped data frames? #2192

alexfun · 2016-10-22T02:59:15Z

From:

http://stackoverflow.com/questions/40187530/in-dplyr-0-5-0-why-does-slice1-not-give-the-same-results-as-filterrow-number?noredirect=1#comment67643041_40187530

Using example data

tmp_df2 <- data.frame(a = c(1, 3, 2, 4), b = c(1, 2, 3, 4))

The operations


tmp_df2 %>%
    group_by(a) %>%
    slice(1)

produces a different row order to

tmp_df2 %>%
    group_by(a) %>%
    filter(row_number() == 1)

The text was updated successfully, but these errors were encountered:

krlmlr · 2016-11-07T15:35:50Z

Confirmed. @hadley: Do we guarantee any particular row order after group_by()?

hadley · 2016-11-07T16:03:41Z

I think we should preserve the existing order of the rows (i.e. it's a "stable" split)

krlmlr · 2016-11-07T16:12:14Z

@alexfun: Would you like to contribute a testthat test?

alexfun · 2016-11-07T22:15:03Z

@krlmlr I would be happy to. Let me know if you have any specific requests.

krlmlr · 2016-11-07T22:25:25Z

@hadley @alexfun: I was confused.

The following two give identical results (ordered by the grouping variable):

tmp_df2 <- data.frame(a = c(1, 3, 2, 4), b = c(1, 2, 3, 4))

tmp_df2 %>%
  group_by(a) %>%
  slice(1) %>% 
  ungroup

tmp_df2 %>%
  group_by(a) %>%
  summarize(b = b[[1]]) %>% 
  ungroup

The filter example returns the data in order of appearance:

tmp_df2 %>%
    group_by(a) %>%
    filter(row_number() == 1)

The question is: Is the slice() a summarize-like or a filter-like operation? I tend to think it's more like a filter operation, in this case the observed behavior is a bug.

alexfun · 2016-11-07T22:35:20Z

I would argue that summarise should also return the grouping variables in the original order presented. I had no idea that summarise exhibits this behaviour as well. This in particular is not good for my workflow, as I tend to like to cbind summary data with data frames with rows sorted in a particular order. To get around the fact that summarise rearranges rows, I will need to have key columns in both summarised data frames and other data frames, and do a join instead of a straight cbind.

hadley · 2017-02-22T12:27:22Z

Minimal reprex:

library(dplyr, warn.conflicts = FALSE)
df <- tibble(a = c(2, 1), b = c("x", "y")) %>% group_by(a)

df %>% slice(1)
#> Source: local data frame [2 x 2]
#> Groups: a [2]
#> 
#>       a     b
#>   <dbl> <chr>
#> 1     1     y
#> 2     2     x
df %>% filter(row_number() == 1)
#> Source: local data frame [2 x 2]
#> Groups: a [2]
#> 
#>       a     b
#>   <dbl> <chr>
#> 1     2     x
#> 2     1     y

Thinking on it more, I don't think there's any guarantee that the row order should be the same. Relying on row order (instead of doing a join) is extremely dangerous because it will silently fail. So I don't think this is like to get high enough up on my priority list to be fixed.

krlmlr added the data frame label Nov 7, 2016

alexfun added the bug an unexpected problem or unintended behavior label Nov 7, 2016

krlmlr mentioned this issue Nov 10, 2016

preserve order of original dataset when using group_by()? #2159

Closed

hadley closed this as completed Feb 22, 2017

lock bot locked as resolved and limited conversation to collaborators Jun 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is slice() meant to give the same results as filter with row_number() on grouped data frames? #2192

Is slice() meant to give the same results as filter with row_number() on grouped data frames? #2192

alexfun commented Oct 22, 2016

krlmlr commented Nov 7, 2016

hadley commented Nov 7, 2016

krlmlr commented Nov 7, 2016

alexfun commented Nov 7, 2016

krlmlr commented Nov 7, 2016

alexfun commented Nov 7, 2016 •

edited

hadley commented Feb 22, 2017

Is slice() meant to give the same results as filter with row_number() on grouped data frames? #2192

Is slice() meant to give the same results as filter with row_number() on grouped data frames? #2192

Comments

alexfun commented Oct 22, 2016

krlmlr commented Nov 7, 2016

hadley commented Nov 7, 2016

krlmlr commented Nov 7, 2016

alexfun commented Nov 7, 2016

krlmlr commented Nov 7, 2016

alexfun commented Nov 7, 2016 • edited

hadley commented Feb 22, 2017

alexfun commented Nov 7, 2016 •

edited