Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

case_when() needs to throw a better error when conditions are matrices #6862

Closed
DarwinAwardWinner opened this issue May 30, 2023 · 3 comments · Fixed by #7069
Closed

case_when() needs to throw a better error when conditions are matrices #6862

DarwinAwardWinner opened this issue May 30, 2023 · 3 comments · Fixed by #7069
Labels
bug an unexpected problem or unintended behavior tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day

Comments

@DarwinAwardWinner
Copy link

Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/ or https://community.rstudio.com/.

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.


I had some code that was passing matrices as the conditions to a case_when statement. Previously, case_when would implicitly convert these matrices to vectors, but at some point it stopped doing that. However, I don't think this was an intentional behavior change, because depending on how exactly one calls case_when on matrix conditions, it can produce one of several different error messages or return a value of the wrong length (see reprex below demonstrating 3 different behaviors). None of the error messages hint at the underlying problem, which made debugging this issue after a package update very difficult.

If passing matrices as the conditions to case_when is not an intended usage, I would expect it to produce an interpretable error message to that effect.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(assertthat)

x <- matrix(rnorm(36), ncol = 6)

## Each of the following generates a different error
{
  y1 <- case_when(
    !is.finite(x) ~ "Invalid",
    .default = "Default"
  )
  assert_that(length(y1) == length(x))
}
#> Error: length(y1) not equal to length(x)

{
  y2 <- case_when(
    !is.finite(x) ~ "Invalid",
    x < 0 ~ "Negative",
    .default = "Default"
  )
  assert_that(length(y2) == length(x))
}
#> Error in `vec_slice()`:
#> ! Can't subset elements past the end.
#> ℹ Locations 8, 10, 13, …, 35, and 36 don't exist.
#> ℹ There are only 6 elements.
#> Backtrace:
#>     ▆
#>  1. ├─dplyr::case_when(...)
#>  2. │ └─dplyr:::vec_case_when(...)
#>  3. │   └─vctrs::vec_slice(value, loc)
#>  4. └─vctrs (local) `<fn>`()
#>  5.   └─vctrs:::stop_subscript_oob(...)
#>  6.     └─vctrs:::stop_subscript(...)
#>  7.       └─rlang::abort(...)

{
  y3 <- case_when(
    !is.finite(x) ~ "Invalid",
    x < 0 ~ "Negative",
    x > 0 ~ "Positive",
    x == 0 ~ "Zero"
  )
  assert_that(length(y3) == length(x))
}
#> Error in if (!any(are_unused)) {: missing value where TRUE/FALSE needed

## Everything works fine if you convert x to a vector
{
  xvec <- as.vector(x)
  yvec <- case_when(
    !is.finite(xvec) ~ "Invalid",
    xvec < 0 ~ "Negative",
    xvec > 0 ~ "Positive",
    xvec == 0 ~ "Zero"
  )
  assert_that(length(yvec) == length(xvec))
}
#> [1] TRUE

Created on 2023-05-30 with reprex v2.0.2

@DarwinAwardWinner
Copy link
Author

Session info:

> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /hpc/packages/minerva-centos7/intel/parallel_studio_xe_2019/compilers_and_libraries_2019.0.117/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] assertthat_0.2.1 dplyr_1.1.2      colorout_1.2-2

loaded via a namespace (and not attached):
 [1] fansi_1.0.4      utf8_1.2.3       crayon_1.5.2     R6_2.5.1
 [5] lifecycle_1.0.3  magrittr_2.0.3   pillar_1.9.0     rlang_1.1.1
 [9] cli_3.6.0        vctrs_0.6.2      generics_0.1.3   glue_1.6.2
[13] compiler_4.2.0   pkgconfig_2.0.3  tidyselect_1.2.0 tibble_3.2.1

Session info from other machine that also exhibits the issue:

> sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-apple-darwin22.4.0 (64-bit)
Running under: macOS Ventura 13.3.1

Matrix products: default
BLAS:   /usr/local/Cellar/openblas/0.3.23/lib/libopenblasp-r0.3.23.dylib 
LAPACK: /usr/local/Cellar/r/4.3.0_1/lib/R/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] graphics  grDevices utils     datasets  stats     methods   base     

other attached packages:
 [1] assertthat_0.2.1 reprex_2.0.2     tidyr_1.3.0      future_1.32.0   
 [5] devtools_2.4.5   usethis_2.1.6    openxlsx_4.2.5.2 magrittr_2.0.3  
 [9] dplyr_1.1.2      rex_1.2.1        glue_1.6.2       stringr_1.5.0   
[13] ggplot2_3.4.2    colorout_1.2-2  

loaded via a namespace (and not attached):
 [1] gtable_0.3.3      xfun_0.39         htmlwidgets_1.6.2 remotes_2.4.2    
 [5] processx_3.8.1    callr_3.7.3       vctrs_0.6.2       tools_4.3.0      
 [9] ps_1.7.5          generics_0.1.3    parallel_4.3.0    tibble_3.2.1     
[13] fansi_1.0.4       pkgconfig_2.0.3   lifecycle_1.0.3   compiler_4.3.0   
[17] munsell_0.5.0     codetools_0.2-19  httpuv_1.6.10     htmltools_0.5.5  
[21] yaml_2.3.7        later_1.3.1       pillar_1.9.0      crayon_1.5.2     
[25] urlchecker_1.0.1  ellipsis_0.3.2    cachem_1.0.8      sessioninfo_1.2.2
[29] mime_0.12         parallelly_1.35.0 tidyselect_1.2.0  zip_2.3.0        
[33] digest_0.6.31     stringi_1.7.12    purrr_1.0.1       listenv_0.9.0    
[37] fastmap_1.1.1     grid_4.3.0        colorspace_2.1-0  cli_3.6.1        
[41] pkgbuild_1.4.0    utf8_1.2.3        clipr_0.8.0       withr_2.5.0      
[45] prettyunits_1.1.1 scales_1.2.1      promises_1.2.0.1  rmarkdown_2.21   
[49] globals_0.16.2    evaluate_0.21     memoise_2.0.1     shiny_1.7.4      
[53] knitr_1.42        miniUI_0.1.1.1    profvis_0.3.8     rlang_1.1.1      
[57] Rcpp_1.0.10       xtable_1.8-4      pkgload_1.3.2     rstudioapi_0.14  
[61] R6_2.5.1          fs_1.6.2    

@DavisVaughan
Copy link
Member

DavisVaughan commented May 31, 2023

Yea we don't intend for matrices to be passed through as conditions. We probably need an additional check here for dim

check_logical(condition, arg = condition_arg, call = call)

@DarwinAwardWinner

This comment was marked as off-topic.

@DavisVaughan DavisVaughan added the bug an unexpected problem or unintended behavior label Nov 3, 2023
@DavisVaughan DavisVaughan changed the title case_when gives confusing and inconsistent error messages when conditions are matrices case_when() needs to throw a better error when conditions are matrices Nov 3, 2023
@DavisVaughan DavisVaughan added the tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day label Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants