Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an interaction variable from two or more factors #136

Closed
ewenharrison opened this issue Jul 4, 2018 · 2 comments
Closed

Create an interaction variable from two or more factors #136

ewenharrison opened this issue Jul 4, 2018 · 2 comments
Labels
feature a feature request or enhancement tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day

Comments

@ewenharrison
Copy link

I do this a lot but don't see a function in forcats?

It's straightforward, but base R interaction is pretty ugly.

gss_cat %>% 
	dplyr::mutate(
		marital__race = interaction(marital, race, sep = ":")
	)

I don't know if all forcats functions have to take a factor as their first argument? But here is an alternative to the above.

#' Combine levels from two or more factors to create new factor
#'
#' Computes a factor which represents the interaction of the given factors. 
#'
#' @param .data Dataframe or tibble.
#' @param ... Unquoted names of two or more factors from \code{.data}
#' @param sep A character string to separate the levels
#' @param collapse A character string to separate the factor names in new factor name
#' @return \code{.data} with the new factor 
#'
#' @examples
#' gss_cat %>% 
#'   fct_combine(marital, race)

fct_combine = function(.data, ..., sep = ":", collapse = "__"){
  .f <- rlang::quos(...)
  .n <- purrr::map(.f, rlang::quo_name)
  .n <- paste(.n, collapse = collapse)
  dplyr::mutate(.data,
    !! .n := paste(!!! .f, sep = sep)
  )
}
@hadley
Copy link
Member

hadley commented Jan 4, 2019

It would be inconsistent with the rest of forcats to take a data frame as first argument, but I think you easily eliminate that.

I'm not sure about the name — I'd prefer something that implied that you were getting the Cartesian product of the factor? I think you also need a drop argument that would control whether the levels of the new factor where the set of all possible combinations, or only the combinations that occur in the data.

@hadley hadley added feature a feature request or enhancement tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day labels Jan 4, 2019
@tslumley
Copy link
Contributor

Working on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day
Projects
None yet
Development

No branches or pull requests

3 participants