Skip to content

Create a group_indices as a new variable #1185

@matthieugomez

Description

@matthieugomez

Some packages like ggplot2 act on groups defined by one variable only (as opposed to groups defined by several variables). It would be nice to have a function, say group(), that creates a new integer variable from groups defined by multiple variables:

Batting %>% mutate(group = group(teamID, yearID))
Batting %>% group_by(teamID, yearID) %>% mutate(group = group())

This function could also have a na.rm argument. The default should return a missing value for the observation if some grouping variable for this observation is missing.

group_indices is not suited for that since (i) it requires df as an argument (ii) group_indices does not work inside mutate

df <- data_frame(v1 = c(NA, NA, 2, 2, 3), v2 = c(NA, NA, 3,3, 4))
df %>% mutate(g = group_indices(df, v1))
# Error: cannot handle

A work around for now

group <- function(..., na.rm = FALSE){
  df <- data.frame(list(...))
  if (na.rm){
    out <- rep(NA, nrow(df))
    complete <- complete.cases(df)
    indices <- df %>% filter(complete) %>% group_indices_(.dots = names(df))
    out[complete] <- indices
  } else{
    out <- group_indices_(df, .dots = names(df))
  }
  out
}

Metadata

Metadata

Labels

featurea feature request or enhancement

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions