Implement `case_match()` and `vec_case_match()` #6328

DavisVaughan · 2022-07-12T19:18:05Z

case_match() is a variant of case_when() that takes a primary input, .x, and then a series of formulas where the LHSs of each formula are values to match against .x rather than logical vectors. The LHSs get turned into logical conditions by vec_in(), and then the results are passed on to vec_case_when().

It technically closes tidyverse/funs#60

This would function as a direct successor to recode(), which is already questioning and has an awkward interface for anything except character vectors (and even there it can be odd).

char_vec <- sample(c("a", "b", "c"), 10, replace = TRUE)

recode(char_vec, a = "Apple", b = "Banana")
#>  [1] "Banana" "Banana" "c"      "Banana" "c"      "Banana"
#>  [7] "Apple"  "Banana" "c"      "Banana"

case_match(
  char_vec,
  "a" ~ "Apple",
  "b" ~ "Banana",
  .default = char_vec
)
#>  [1] "Banana" "Banana" "c"      "Banana" "c"      "Banana"
#>  [7] "Apple"  "Banana" "c"      "Banana"


recode(char_vec, a = "Apple", b = "Banana", .default = NA_character_)
#>  [1] "Banana" "Banana" NA       "Banana" NA       "Banana"
#>  [7] "Apple"  "Banana" NA       "Banana"

case_match(
  char_vec,
  "a" ~ "Apple",
  "b" ~ "Banana"
)
#>  [1] "Banana" "Banana" NA       "Banana" NA       "Banana"
#>  [7] "Apple"  "Banana" NA       "Banana"


# `case_match()` is more general and works elegantly 
# with more than just character
num_vec <- c(1:4, NA)

recode(num_vec, `1` = "o", `2` = "e", `3` = "o", `4` = "e", .missing = "m")
#> [1] "o" "e" "o" "e" "m"

case_match(
  num_vec,
  c(1, 3) ~ "o",
  c(2, 4) ~ "e",
  NA ~ "m"
)
#> [1] "o" "e" "o" "e" "m"


# More of a programmatic usage
level_key <- c(a = "apple", b = "banana", c = "carrot")
recode(char_vec, !!!level_key)
#>  [1] "banana" "banana" "carrot" "banana" "carrot" "banana"
#>  [7] "apple"  "banana" "carrot" "banana"

vec_case_match(
  needles = char_vec,
  haystacks = as.list(names(level_key)),
  values = as.list(level_key),
  default = char_vec
)
#>  [1] "banana" "banana" "carrot" "banana" "carrot" "banana"
#>  [7] "apple"  "banana" "carrot" "banana"

I still think a replace_match() would be useful here, like:

# type stable replacement wrapper around case_match()
replace_match <- function(.x, ...) {
  ptype <- vec_ptype(.x)
  ptype <- vec_ptype_finalise(ptype)
  case_match(.x = .x, ..., .default = .x, .ptype = ptype)
}

# very close to compactness of recode()
replace_match(
  char_vec,
  "a" ~ "Apple",
  "b" ~ "Banana"
)

# instead of 
case_match(
  char_vec,
  "a" ~ "Apple",
  "b" ~ "Banana",
  .default = char_vec
)

replace_match() could also be used instead of a match-like version of na_if()

x <- c("a", "NA", "NaN", "no")
replace_match(x, c("NA", "NaN", "no") ~ NA)

In forcats, we could have fct_case_match() as a successor to recode_factor(), but its interface would probably be the other way around, like:

fct_case_match(
  .x,
  odd = c(1, 3),
  even = c(2, 4),
  ordered = FALSE
)

fct_case_when(
  odd = .x %in% c(1, 3),
  even = .x %in% c(2, 4),
  ordered = FALSE
)

NEWS.md

R/case-match.R

hadley · 2022-08-15T14:23:13Z

R/case-match.R

+#' # `.default` is allowed to be vectorized, and you can supply a `.ptype` to
+#' # force a particular output type. Combining these features together allows
+#' # you to create a type stable "replace match" helper.
+#' replace_match <- function(.x, ...) {


Are you sure we don't want to provide this function in dplyr?

I haven't been able to decide if we should or not because I think it would imply we also need replace_when()

Having replace_when() and replace_match() would be nice alternatives to replace() though, especially it we don't do mutate(.when = ) right now

R/case-match.R

hadley · 2022-08-15T14:29:16Z

R/case_when.R

@@ -200,30 +207,23 @@ case_when_formula_evaluate <- function(args,

  for (i in seq_len(n_args)) {
    pair <- quos_pairs[[i]]
-    conditions[[i]] <- eval_tidy(pair$lhs, env = default_env)
-    values[[i]] <- eval_tidy(pair$rhs, env = default_env)
+    lhs[[i]] <- eval_tidy(pair$lhs, env = default_env)


Should we use error wrapping here?

Do you have an example of why it would be useful? Since this is just evaluating the LHS/RHS of the formula, I feel like the only error we could hit would be:

dplyr::case_when(foo ~ 1) #> Error in eval_tidy(pair$lhs, env = default_env): object 'foo' not found

But I'm not sure we can improve on that anyways

We could say which case caused the problem,

dplyr::case_match(letters, "z" ~ stop("!")) #' Error in case_match(): #' Failed to compute right hand side of match 1 #' Caused by error: #' !

But maybe that's not worth it?

I've updated to throw a chained error like this

_pkgdown.yml

hadley

Need check/better error message for NULL lhs and rhs:

case_match(letters, "x" ~ NULL)
case_match(letters, NULL ~ "x")

Otherwise, looks great!

Do we want to superseded recode() in this PR or a separate one?

hadley · 2022-08-17T20:43:04Z

R/case_when.R

@@ -200,30 +207,23 @@ case_when_formula_evaluate <- function(args,

  for (i in seq_len(n_args)) {
    pair <- quos_pairs[[i]]
-    conditions[[i]] <- eval_tidy(pair$lhs, env = default_env)
-    values[[i]] <- eval_tidy(pair$rhs, env = default_env)
+    lhs[[i]] <- eval_tidy(pair$lhs, env = default_env)


We could say which case caused the problem,

dplyr::case_match(letters, "z" ~ stop("!")) #' Error in case_match(): #' Failed to compute right hand side of match 1 #' Caused by error: #' !

But maybe that's not worth it?

R/case-match.R

tests/testthat/test-case-match.R

DavisVaughan · 2022-08-18T17:50:37Z

Do we want to superseded recode() in this PR or a separate one?

I'll leave that for another PR, I want to get this one in

After thinking about this more, I think this more accurately captures the intention here, and is a more applicable name in other scenarios: - `vec_case_match(needles, haystacks)` makes more sense - `fct_case_match(new_lvl = haystack)` would make more sense since the LHS here is the resulting value, not the thing you switch on - I seem to use "match" very frequently in the docs and the test descriptions, making me think that is the better name

DavisVaughan mentioned this pull request Jul 12, 2022

Rewrite na_if() using vctrs #6329

Merged

This comment was marked as outdated.

Sign in to view

DavisVaughan force-pushed the feature/case-match branch from 9058e02 to 9db0073 Compare August 10, 2022 19:19

DavisVaughan changed the title ~~Draft case_match() and vec_case_match()~~ Draft case_switch() and vec_case_switch() Aug 10, 2022

DavisVaughan changed the title ~~Draft case_switch() and vec_case_switch()~~ Draft case_match() and vec_case_match() Aug 11, 2022

DavisVaughan marked this pull request as ready for review August 11, 2022 14:35

DavisVaughan changed the title ~~Draft case_match() and vec_case_match()~~ Implement case_match() and vec_case_match() Aug 11, 2022

DavisVaughan requested a review from hadley August 11, 2022 14:47

hadley reviewed Aug 15, 2022

View reviewed changes

DavisVaughan force-pushed the feature/case-match branch from 2bd4ec0 to 9dda617 Compare August 17, 2022 17:27

hadley approved these changes Aug 17, 2022

View reviewed changes

DavisVaughan and others added 16 commits August 18, 2022 13:51

Draft case_match() and vec_case_match()

6dd858f

Rename to case_switch() and vec_case_switch()

b1d90d0

Document, export, and test case_switch()

e69ac39

NEWS bullet

b8b3ac8

Add case_match() to _pkgdown.yml

80b8f1d

Use alphabetical order in pkgdown reference

096c41c

Tweak case_match() docs based on code review

bb1204b

Update snapshot test with latest dev version of vctrs

5ca04cb

Tweak docs

0544b74

Add a few extra tests

c9b3e27

Typo fix and slight clarification of return value

4490b6a

Tweak leading paragraph one more time

3a6f004

One last consistency read through the docs

ed72329

Better handle NULL formula elements

8be5e0c

Throw nicer chained errors when formula evaluation fails

46ee6ba

DavisVaughan force-pushed the feature/case-match branch from a4c1039 to 46ee6ba Compare August 18, 2022 17:53

DavisVaughan merged commit 978d7e3 into tidyverse:main Aug 18, 2022

DavisVaughan deleted the feature/case-match branch August 18, 2022 18:08

twest820 mentioned this pull request Feb 21, 2023

recode() is superseded by case_match() but case_match() isn't exported from dplyr 1.0.10 #6749

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `case_match()` and `vec_case_match()` #6328

Implement `case_match()` and `vec_case_match()` #6328

DavisVaughan commented Jul 12, 2022 •

edited

Loading

This comment was marked as outdated.

hadley Aug 15, 2022

DavisVaughan Aug 17, 2022

DavisVaughan Aug 17, 2022

hadley Aug 15, 2022

DavisVaughan Aug 17, 2022

hadley Aug 17, 2022

DavisVaughan Aug 18, 2022

hadley left a comment

hadley Aug 17, 2022

DavisVaughan commented Aug 18, 2022

Implement case_match() and vec_case_match() #6328

Implement case_match() and vec_case_match() #6328

Conversation

DavisVaughan commented Jul 12, 2022 • edited Loading

This comment was marked as outdated.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hadley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DavisVaughan commented Aug 18, 2022

Implement `case_match()` and `vec_case_match()` #6328

Implement `case_match()` and `vec_case_match()` #6328

DavisVaughan commented Jul 12, 2022 •

edited

Loading