Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an interaction variable from two or more factors #136

ewenharrison opened this issue Jul 4, 2018 · 2 comments

Create an interaction variable from two or more factors #136

ewenharrison opened this issue Jul 4, 2018 · 2 comments


Copy link

@ewenharrison ewenharrison commented Jul 4, 2018

I do this a lot but don't see a function in forcats?

It's straightforward, but base R interaction is pretty ugly.

gss_cat %>% 
		marital__race = interaction(marital, race, sep = ":")

I don't know if all forcats functions have to take a factor as their first argument? But here is an alternative to the above.

#' Combine levels from two or more factors to create new factor
#' Computes a factor which represents the interaction of the given factors. 
#' @param .data Dataframe or tibble.
#' @param ... Unquoted names of two or more factors from \code{.data}
#' @param sep A character string to separate the levels
#' @param collapse A character string to separate the factor names in new factor name
#' @return \code{.data} with the new factor 
#' @examples
#' gss_cat %>% 
#'   fct_combine(marital, race)

fct_combine = function(.data, ..., sep = ":", collapse = "__"){
  .f <- rlang::quos(...)
  .n <- purrr::map(.f, rlang::quo_name)
  .n <- paste(.n, collapse = collapse)
    !! .n := paste(!!! .f, sep = sep)
Copy link

@hadley hadley commented Jan 4, 2019

It would be inconsistent with the rest of forcats to take a data frame as first argument, but I think you easily eliminate that.

I'm not sure about the name — I'd prefer something that implied that you were getting the Cartesian product of the factor? I think you also need a drop argument that would control whether the levels of the new factor where the set of all possible combinations, or only the combinations that occur in the data.

Copy link

@tslumley tslumley commented Jan 19, 2019

Working on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants