Create a group_indices as a new variable #1185

matthieugomez · 2015-05-30T17:43:23Z

Some packages like ggplot2 act on groups defined by one variable only (as opposed to groups defined by several variables). It would be nice to have a function, say group(), that creates a new integer variable from groups defined by multiple variables:

Batting %>% mutate(group = group(teamID, yearID))
Batting %>% group_by(teamID, yearID) %>% mutate(group = group())

This function could also have a na.rm argument. The default should return a missing value for the observation if some grouping variable for this observation is missing.

group_indices is not suited for that since (i) it requires df as an argument (ii) group_indices does not work inside mutate

df <- data_frame(v1 = c(NA, NA, 2, 2, 3), v2 = c(NA, NA, 3,3, 4))
df %>% mutate(g = group_indices(df, v1))
# Error: cannot handle

A work around for now

group <- function(..., na.rm = FALSE){
  df <- data.frame(list(...))
  if (na.rm){
    out <- rep(NA, nrow(df))
    complete <- complete.cases(df)
    indices <- df %>% filter(complete) %>% group_indices_(.dots = names(df))
    out[complete] <- indices
  } else{
    out <- group_indices_(df, .dots = names(df))
  }
  out
}

The text was updated successfully, but these errors were encountered:

romainfrancois · 2015-07-08T15:32:07Z

You use group_indices like this:

> df <- data_frame(v1 = c(NA, NA, 2, 2, 3), v2 = c(NA, NA, 3,3, 4))
> df %>% group_by(v1) %>% group_indices
[1] 3 3 1 1 2

@hadley perhaps we could support something like:

df %>% group_by(v1) %>% mutate( g = group_indices() )
# or 
df %>% group_by(v1) %>% mutate( g = group() )

matthieugomez · 2015-07-08T17:05:34Z

In all cases I used this function (for instance before using ggplot2 or plm), I wanted the group index to be missing in observations where some of the column was missing. This is different from the behavior of group_indices. Maybe dplyr is not the best place to implement such a function?

romainfrancois · 2015-07-08T18:39:57Z

In that case, perhaps you can close the issue ?

voxnonecho · 2015-07-08T20:45:06Z

I think the initial idea is great. Using group_indices() as we use rleid() in data.table

jarkub · 2016-05-19T19:47:14Z

@romainfrancois

Why was the issue closed? While @matthieugomez found another solution for his particular use case, the original request would be an extremely useful feature. It would be great to be able to do what you suggested above:

df %>% group_by(v1) %>% mutate( g = group_indices() )
# or 
df %>% group_by(v1) %>% mutate( g = group() )

stephlocke · 2017-03-28T10:49:26Z

I would expect to be able to use this inside a mutate. I use a lot of data.table like @voxnonecho and that there is no easy way to do this is in dplyr is a bit of a pain

krlmlr · 2017-03-28T20:31:17Z

I can offer:

library(dplyr, warn.conflicts = FALSE)
df <- data_frame(v1 = c(NA, NA, 2, 2, 3), v2 = c(NA, NA, 3,3, 4))
df %>% group_by(v1) %>% { mutate(ungroup(.), g = group_indices(.)) }
#> # A tibble: 5 × 3
#>      v1    v2     g
#>   <dbl> <dbl> <int>
#> 1    NA    NA     3
#> 2    NA    NA     3
#> 3     2     3     1
#> 4     2     3     1
#> 5     3     4     2

@hadley: This would be much simpler if we had a hybrid handler for group_indices() that just does the right thing for grouped data frames.

hadley · 2017-03-28T21:30:03Z

I want to shelve this for now, and come back to when we broadly reconsider what other pronouns would useful inside dplyr verbs.

ericgtaylor · 2017-10-03T16:20:20Z

howdy, thread. any updates on using group_indices inside of mutate? :)

df %>% group_by(v1) %>% mutate( g = group_indices() )

dfrail24 · 2018-02-05T21:57:56Z

I see that another option is doing
df$g <- df %>% group_by(v1) %>% group_indices
but it would be great if we could do this in a seamless pipe like others have voiced above.

Zedseayou · 2018-02-14T20:51:03Z

I'll add that if you want to use group_indices inside a pipe, you can always do this:
df %>% bind_cols(g = group_indices(., group_var1, group_var2))
which does work seamlessly in a pipe but does not feel very neat.

krlmlr · 2018-02-28T17:48:36Z

Happy to reconsider.

krlmlr · 2018-02-28T17:49:48Z

@hadley: Do we have a better idea what other pronouns are useful inside dplyr verbs?

```r > df <- data_frame(v1 = c(NA, NA, 2, 2, 3), v2 = c(NA, NA, 3,3, 4)) %>% group_by(v1) > df %>% mutate( g = group_indices() ) # A tibble: 5 x 3 # Groups: v1 [3] v1 v2 g <dbl> <dbl> <int> 1 NA NA 3 2 NA NA 3 3 2. 3. 1 4 2. 3. 1 5 3. 4. 2 ```

romainfrancois · 2018-04-09T11:54:17Z

I pushed some code in the feature-1185-group branch so that group_indices() is hybrid-interpreted.

df <- data_frame(v1 = c(3, 3, 2, 2, 3, 1), v2 = 1:6) %>% group_by(v1)
mutate(df, g = group_indices())
#> # A tibble: 6 x 3
#> # Groups:   v1 [3]
#>      v1    v2     g
#>   <dbl> <int> <int>
#> 1    3.     1     3
#> 2    3.     2     3
#> 3    2.     3     2
#> 4    2.     4     2
#> 5    3.     5     3
#> 6    1.     6     1

Because of internal implementation, this gives 0 as the group for everybody when this is not a grouped data frame. Is that ok ?

df <- data_frame(v1 = c(3, 3, 2, 2, 3, 1), v2 = 1:6)
mutate(df, g = group_indices())
#> # A tibble: 6 x 3
#>      v1    v2     g
#>   <dbl> <int> <int>
#> 1    3.     1     0
#> 2    3.     2     0
#> 3    2.     3     0
#> 4    2.     4     0
#> 5    3.     5     0
#> 6    1.     6     0

krlmlr · 2018-04-09T23:34:53Z

This sounds good, but I think group_indices() should return an all-1 vector for an ungrouped data frame for consistency:

dplyr::group_indices(mtcars)
#>  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Created on 2018-04-10 by the reprex package (v0.2.0).

```r > df <- data_frame(v1 = c(NA, NA, 2, 2, 3), v2 = c(NA, NA, 3,3, 4)) %>% group_by(v1) > df %>% mutate( g = group_indices() ) # A tibble: 5 x 3 # Groups: v1 [3] v1 v2 g <dbl> <dbl> <int> 1 NA NA 3 2 NA NA 3 3 2. 3. 1 4 2. 3. 1 5 3. 4. 2 ```

romainfrancois · 2018-04-10T07:02:35Z

I made the change in the branch, why are group() returning -1 in the first place ?

krlmlr · 2018-04-10T07:16:10Z

Thanks. What is group()?

krlmlr · 2018-04-10T07:17:58Z

Oh, I see -- it's in the C++ code.

krlmlr · 2018-04-10T07:21:09Z

I have no idea. Do the tests still pass if we change that -1 to 1 ?

romainfrancois · 2018-04-10T07:25:45Z

This would have to be 0, I'll check

```r > df <- data_frame(v1 = c(NA, NA, 2, 2, 3), v2 = c(NA, NA, 3,3, 4)) %>% group_by(v1) > df %>% mutate( g = group_indices() ) # A tibble: 5 x 3 # Groups: v1 [3] v1 v2 g <dbl> <dbl> <int> 1 NA NA 3 2 NA NA 3 3 2. 3. 1 4 2. 3. 1 5 3. 4. 2 ```

- `group_indices()` can be used without argument in expressions in verbs (#1185).

lock · 2018-11-01T15:46:14Z

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

matthieugomez changed the title ~~Using group_indices within mutate~~ Create a group_indices as a new variable Jun 6, 2015

matthieugomez closed this as completed Jul 8, 2015

voxnonecho mentioned this issue Nov 14, 2015

data.table::rleid for dplyr #1534

Closed

krlmlr mentioned this issue Feb 5, 2018

group_id pronoun #3341

Closed

krlmlr added feature a feature request or enhancement data frame labels Feb 28, 2018

krlmlr reopened this Feb 28, 2018

romainfrancois self-assigned this Mar 27, 2018

romainfrancois added a commit that referenced this issue Apr 9, 2018

test for #1185. closes #1185

a2b2510

romainfrancois added a commit that referenced this issue Apr 9, 2018

additional #1185 test for rowwise

ea6db02

romainfrancois mentioned this issue Apr 9, 2018

Do we need FullDataFrame at all ? #3486

Closed

romainfrancois added a commit that referenced this issue Apr 10, 2018

test for #1185. closes #1185

edb08c9

romainfrancois added a commit that referenced this issue Apr 10, 2018

additional #1185 test for rowwise

416fe38

romainfrancois mentioned this issue Apr 10, 2018

Feature 1185 group #3490

Merged

romainfrancois added a commit that referenced this issue May 3, 2018

test for #1185. closes #1185

5870af0

romainfrancois added a commit that referenced this issue May 3, 2018

additional #1185 test for rowwise

fab2fb8

krlmlr closed this as completed in #3490 May 5, 2018

krlmlr added a commit that referenced this issue May 5, 2018

Merge pull request #3490 from tidyverse/feature-1185-group

dc007a6

- `group_indices()` can be used without argument in expressions in verbs (#1185).

tobadia mentioned this issue Jul 24, 2018

group_indices() cannot be used inside a dplyr expression #3727

Closed

lock bot locked and limited conversation to collaborators Nov 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a group_indices as a new variable #1185

Create a group_indices as a new variable #1185

matthieugomez commented May 30, 2015 •

edited

Loading

romainfrancois commented Jul 8, 2015

matthieugomez commented Jul 8, 2015 •

edited

Loading

romainfrancois commented Jul 8, 2015

voxnonecho commented Jul 8, 2015

jarkub commented May 19, 2016 •

edited

Loading

stephlocke commented Mar 28, 2017

krlmlr commented Mar 28, 2017

hadley commented Mar 28, 2017

ericgtaylor commented Oct 3, 2017

dfrail24 commented Feb 5, 2018

Zedseayou commented Feb 14, 2018

krlmlr commented Feb 28, 2018

krlmlr commented Feb 28, 2018

romainfrancois commented Apr 9, 2018 •

edited

Loading

krlmlr commented Apr 9, 2018

romainfrancois commented Apr 10, 2018

krlmlr commented Apr 10, 2018

krlmlr commented Apr 10, 2018

krlmlr commented Apr 10, 2018

romainfrancois commented Apr 10, 2018

lock bot commented Nov 1, 2018

Create a group_indices as a new variable #1185

Create a group_indices as a new variable #1185

Comments

matthieugomez commented May 30, 2015 • edited Loading

romainfrancois commented Jul 8, 2015

matthieugomez commented Jul 8, 2015 • edited Loading

romainfrancois commented Jul 8, 2015

voxnonecho commented Jul 8, 2015

jarkub commented May 19, 2016 • edited Loading

stephlocke commented Mar 28, 2017

krlmlr commented Mar 28, 2017

hadley commented Mar 28, 2017

ericgtaylor commented Oct 3, 2017

dfrail24 commented Feb 5, 2018

Zedseayou commented Feb 14, 2018

krlmlr commented Feb 28, 2018

krlmlr commented Feb 28, 2018

romainfrancois commented Apr 9, 2018 • edited Loading

krlmlr commented Apr 9, 2018

romainfrancois commented Apr 10, 2018

krlmlr commented Apr 10, 2018

krlmlr commented Apr 10, 2018

krlmlr commented Apr 10, 2018

romainfrancois commented Apr 10, 2018

lock bot commented Nov 1, 2018

matthieugomez commented May 30, 2015 •

edited

Loading

matthieugomez commented Jul 8, 2015 •

edited

Loading

jarkub commented May 19, 2016 •

edited

Loading

romainfrancois commented Apr 9, 2018 •

edited

Loading