-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a group_indices as a new variable #1185
Comments
You use
@hadley perhaps we could support something like:
|
In all cases I used this function (for instance before using ggplot2 or plm), I wanted the group index to be missing in observations where some of the column was missing. This is different from the behavior of group_indices. Maybe dplyr is not the best place to implement such a function? |
In that case, perhaps you can close the issue ? |
I think the initial idea is great. Using |
Why was the issue closed? While @matthieugomez found another solution for his particular use case, the original request would be an extremely useful feature. It would be great to be able to do what you suggested above:
|
I would expect to be able to use this inside a mutate. I use a lot of data.table like @voxnonecho and that there is no easy way to do this is in dplyr is a bit of a pain |
I can offer: library(dplyr, warn.conflicts = FALSE)
df <- data_frame(v1 = c(NA, NA, 2, 2, 3), v2 = c(NA, NA, 3,3, 4))
df %>% group_by(v1) %>% { mutate(ungroup(.), g = group_indices(.)) }
#> # A tibble: 5 × 3
#> v1 v2 g
#> <dbl> <dbl> <int>
#> 1 NA NA 3
#> 2 NA NA 3
#> 3 2 3 1
#> 4 2 3 1
#> 5 3 4 2 @hadley: This would be much simpler if we had a hybrid handler for |
I want to shelve this for now, and come back to when we broadly reconsider what other pronouns would useful inside dplyr verbs. |
howdy, thread. any updates on using group_indices inside of mutate? :)
|
I see that another option is doing |
I'll add that if you want to use |
Happy to reconsider. |
@hadley: Do we have a better idea what other pronouns are useful inside dplyr verbs? |
```r > df <- data_frame(v1 = c(NA, NA, 2, 2, 3), v2 = c(NA, NA, 3,3, 4)) %>% group_by(v1) > df %>% mutate( g = group_indices() ) # A tibble: 5 x 3 # Groups: v1 [3] v1 v2 g <dbl> <dbl> <int> 1 NA NA 3 2 NA NA 3 3 2. 3. 1 4 2. 3. 1 5 3. 4. 2 ```
I pushed some code in the df <- data_frame(v1 = c(3, 3, 2, 2, 3, 1), v2 = 1:6) %>% group_by(v1)
mutate(df, g = group_indices())
#> # A tibble: 6 x 3
#> # Groups: v1 [3]
#> v1 v2 g
#> <dbl> <int> <int>
#> 1 3. 1 3
#> 2 3. 2 3
#> 3 2. 3 2
#> 4 2. 4 2
#> 5 3. 5 3
#> 6 1. 6 1 Because of internal implementation, this gives df <- data_frame(v1 = c(3, 3, 2, 2, 3, 1), v2 = 1:6)
mutate(df, g = group_indices())
#> # A tibble: 6 x 3
#> v1 v2 g
#> <dbl> <int> <int>
#> 1 3. 1 0
#> 2 3. 2 0
#> 3 2. 3 0
#> 4 2. 4 0
#> 5 3. 5 0
#> 6 1. 6 0 |
This sounds good, but I think dplyr::group_indices(mtcars)
#> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
|
```r > df <- data_frame(v1 = c(NA, NA, 2, 2, 3), v2 = c(NA, NA, 3,3, 4)) %>% group_by(v1) > df %>% mutate( g = group_indices() ) # A tibble: 5 x 3 # Groups: v1 [3] v1 v2 g <dbl> <dbl> <int> 1 NA NA 3 2 NA NA 3 3 2. 3. 1 4 2. 3. 1 5 3. 4. 2 ```
I made the change in the branch, why are |
Thanks. What is |
Oh, I see -- it's in the C++ code. |
I have no idea. Do the tests still pass if we change that |
This would have to be 0, I'll check |
```r > df <- data_frame(v1 = c(NA, NA, 2, 2, 3), v2 = c(NA, NA, 3,3, 4)) %>% group_by(v1) > df %>% mutate( g = group_indices() ) # A tibble: 5 x 3 # Groups: v1 [3] v1 v2 g <dbl> <dbl> <int> 1 NA NA 3 2 NA NA 3 3 2. 3. 1 4 2. 3. 1 5 3. 4. 2 ```
- `group_indices()` can be used without argument in expressions in verbs (#1185).
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
Some packages like
ggplot2
act on groups defined by one variable only (as opposed to groups defined by several variables). It would be nice to have a function, saygroup()
, that creates a new integer variable from groups defined by multiple variables:This function could also have a na.rm argument. The default should return a missing value for the observation if some grouping variable for this observation is missing.
group_indices
is not suited for that since (i) it requiresdf
as an argument (ii)group_indices
does not work inside mutateA work around for now
The text was updated successfully, but these errors were encountered: