New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an interaction variable from two or more factors #136

Closed
ewenharrison opened this Issue Jul 4, 2018 · 2 comments

Comments

Projects
None yet
3 participants
@ewenharrison
Copy link

ewenharrison commented Jul 4, 2018

I do this a lot but don't see a function in forcats?

It's straightforward, but base R interaction is pretty ugly.

gss_cat %>% 
	dplyr::mutate(
		marital__race = interaction(marital, race, sep = ":")
	)

I don't know if all forcats functions have to take a factor as their first argument? But here is an alternative to the above.

#' Combine levels from two or more factors to create new factor
#'
#' Computes a factor which represents the interaction of the given factors. 
#'
#' @param .data Dataframe or tibble.
#' @param ... Unquoted names of two or more factors from \code{.data}
#' @param sep A character string to separate the levels
#' @param collapse A character string to separate the factor names in new factor name
#' @return \code{.data} with the new factor 
#'
#' @examples
#' gss_cat %>% 
#'   fct_combine(marital, race)

fct_combine = function(.data, ..., sep = ":", collapse = "__"){
  .f <- rlang::quos(...)
  .n <- purrr::map(.f, rlang::quo_name)
  .n <- paste(.n, collapse = collapse)
  dplyr::mutate(.data,
    !! .n := paste(!!! .f, sep = sep)
  )
}
@hadley

This comment has been minimized.

Copy link
Member

hadley commented Jan 4, 2019

It would be inconsistent with the rest of forcats to take a data frame as first argument, but I think you easily eliminate that.

I'm not sure about the name — I'd prefer something that implied that you were getting the Cartesian product of the factor? I think you also need a drop argument that would control whether the levels of the new factor where the set of all possible combinations, or only the combinations that occur in the data.

@tslumley

This comment has been minimized.

Copy link
Contributor

tslumley commented Jan 19, 2019

Working on this

tslumley added a commit to tslumley/forcats that referenced this issue Jan 19, 2019

@hadley hadley closed this in 2bd5f4c Jan 19, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment