Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement case_match() and vec_case_match() #6328

Merged
merged 16 commits into from
Aug 18, 2022

Conversation

DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented Jul 12, 2022

case_match() is a variant of case_when() that takes a primary input, .x, and then a series of formulas where the LHSs of each formula are values to match against .x rather than logical vectors. The LHSs get turned into logical conditions by vec_in(), and then the results are passed on to vec_case_when().

It technically closes tidyverse/funs#60

This would function as a direct successor to recode(), which is already questioning and has an awkward interface for anything except character vectors (and even there it can be odd).

char_vec <- sample(c("a", "b", "c"), 10, replace = TRUE)

recode(char_vec, a = "Apple", b = "Banana")
#>  [1] "Banana" "Banana" "c"      "Banana" "c"      "Banana"
#>  [7] "Apple"  "Banana" "c"      "Banana"

case_match(
  char_vec,
  "a" ~ "Apple",
  "b" ~ "Banana",
  .default = char_vec
)
#>  [1] "Banana" "Banana" "c"      "Banana" "c"      "Banana"
#>  [7] "Apple"  "Banana" "c"      "Banana"


recode(char_vec, a = "Apple", b = "Banana", .default = NA_character_)
#>  [1] "Banana" "Banana" NA       "Banana" NA       "Banana"
#>  [7] "Apple"  "Banana" NA       "Banana"

case_match(
  char_vec,
  "a" ~ "Apple",
  "b" ~ "Banana"
)
#>  [1] "Banana" "Banana" NA       "Banana" NA       "Banana"
#>  [7] "Apple"  "Banana" NA       "Banana"


# `case_match()` is more general and works elegantly 
# with more than just character
num_vec <- c(1:4, NA)

recode(num_vec, `1` = "o", `2` = "e", `3` = "o", `4` = "e", .missing = "m")
#> [1] "o" "e" "o" "e" "m"

case_match(
  num_vec,
  c(1, 3) ~ "o",
  c(2, 4) ~ "e",
  NA ~ "m"
)
#> [1] "o" "e" "o" "e" "m"


# More of a programmatic usage
level_key <- c(a = "apple", b = "banana", c = "carrot")
recode(char_vec, !!!level_key)
#>  [1] "banana" "banana" "carrot" "banana" "carrot" "banana"
#>  [7] "apple"  "banana" "carrot" "banana"

vec_case_match(
  needles = char_vec,
  haystacks = as.list(names(level_key)),
  values = as.list(level_key),
  default = char_vec
)
#>  [1] "banana" "banana" "carrot" "banana" "carrot" "banana"
#>  [7] "apple"  "banana" "carrot" "banana"

I still think a replace_match() would be useful here, like:

# type stable replacement wrapper around case_match()
replace_match <- function(.x, ...) {
  ptype <- vec_ptype(.x)
  ptype <- vec_ptype_finalise(ptype)
  case_match(.x = .x, ..., .default = .x, .ptype = ptype)
}

# very close to compactness of recode()
replace_match(
  char_vec,
  "a" ~ "Apple",
  "b" ~ "Banana"
)

# instead of 
case_match(
  char_vec,
  "a" ~ "Apple",
  "b" ~ "Banana",
  .default = char_vec
)

replace_match() could also be used instead of a match-like version of na_if()

x <- c("a", "NA", "NaN", "no")
replace_match(x, c("NA", "NaN", "no") ~ NA)

In forcats, we could have fct_case_match() as a successor to recode_factor(), but its interface would probably be the other way around, like:

fct_case_match(
  .x,
  odd = c(1, 3),
  even = c(2, 4),
  ordered = FALSE
)

fct_case_when(
  odd = .x %in% c(1, 3),
  even = .x %in% c(2, 4),
  ordered = FALSE
)

@DavisVaughan

This comment was marked as outdated.

@DavisVaughan DavisVaughan changed the title Draft case_match() and vec_case_match() Draft case_switch() and vec_case_switch() Aug 10, 2022
@DavisVaughan DavisVaughan changed the title Draft case_switch() and vec_case_switch() Draft case_match() and vec_case_match() Aug 11, 2022
@DavisVaughan DavisVaughan marked this pull request as ready for review August 11, 2022 14:35
@DavisVaughan DavisVaughan changed the title Draft case_match() and vec_case_match() Implement case_match() and vec_case_match() Aug 11, 2022
NEWS.md Show resolved Hide resolved
R/case-match.R Outdated Show resolved Hide resolved
R/case-match.R Show resolved Hide resolved
R/case-match.R Show resolved Hide resolved
R/case-match.R Outdated Show resolved Hide resolved
R/case-match.R Outdated
#' # `.default` is allowed to be vectorized, and you can supply a `.ptype` to
#' # force a particular output type. Combining these features together allows
#' # you to create a type stable "replace match" helper.
#' replace_match <- function(.x, ...) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure we don't want to provide this function in dplyr?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been able to decide if we should or not because I think it would imply we also need replace_when()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having replace_when() and replace_match() would be nice alternatives to replace() though, especially it we don't do mutate(.when = ) right now

R/case-match.R Outdated Show resolved Hide resolved
R/case_when.R Outdated
@@ -200,30 +207,23 @@ case_when_formula_evaluate <- function(args,

for (i in seq_len(n_args)) {
pair <- quos_pairs[[i]]
conditions[[i]] <- eval_tidy(pair$lhs, env = default_env)
values[[i]] <- eval_tidy(pair$rhs, env = default_env)
lhs[[i]] <- eval_tidy(pair$lhs, env = default_env)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use error wrapping here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example of why it would be useful? Since this is just evaluating the LHS/RHS of the formula, I feel like the only error we could hit would be:

dplyr::case_when(foo ~ 1)
#> Error in eval_tidy(pair$lhs, env = default_env): object 'foo' not found

But I'm not sure we can improve on that anyways

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could say which case caused the problem,

dplyr::case_match(letters, "z" ~ stop("!"))
#' Error in case_match():
#'   Failed to compute right hand side of match 1
#' Caused by error:
#'  !

But maybe that's not worth it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated to throw a chained error like this

_pkgdown.yml Outdated Show resolved Hide resolved
Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need check/better error message for NULL lhs and rhs:

case_match(letters, "x" ~ NULL)
case_match(letters, NULL ~ "x")

Otherwise, looks great!

Do we want to superseded recode() in this PR or a separate one?

R/case_when.R Outdated
@@ -200,30 +207,23 @@ case_when_formula_evaluate <- function(args,

for (i in seq_len(n_args)) {
pair <- quos_pairs[[i]]
conditions[[i]] <- eval_tidy(pair$lhs, env = default_env)
values[[i]] <- eval_tidy(pair$rhs, env = default_env)
lhs[[i]] <- eval_tidy(pair$lhs, env = default_env)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could say which case caused the problem,

dplyr::case_match(letters, "z" ~ stop("!"))
#' Error in case_match():
#'   Failed to compute right hand side of match 1
#' Caused by error:
#'  !

But maybe that's not worth it?

R/case-match.R Outdated Show resolved Hide resolved
tests/testthat/test-case-match.R Show resolved Hide resolved
@DavisVaughan
Copy link
Member Author

Do we want to superseded recode() in this PR or a separate one?

I'll leave that for another PR, I want to get this one in

@DavisVaughan DavisVaughan merged commit 978d7e3 into tidyverse:main Aug 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider case_when() variant that uses values
2 participants