Closed
Description
As discussed in #118, the default translations for pmin()
and pmax()
are LEAST
and GREATEST
, respectively.
However, for Snowflake (akin to Oracle), LEAST and GREATEST do the following (in contrast to many other backends, discussed with Snowflake here and a community-identified workaround here):
If any of the argument values is NULL, the result is NULL.
Accordingly:
- when providing
na.rm = TRUE
topmin()
andpmax()
, it's silently ignored currently (related to dbplyr translation ofpmax
/pmin
throws an error if it receives the na.rm = TRUE argument #479) - The default, periodic messaging for na.rm = FALSE in check_na_rm()) is not accurate.
I'd expect either:
- [preferred]
na.rm = TRUE
is supported OR - [interim]
na.rm = TRUE
errors when called on Snowflake (and does not warn whenna.rm = FALSE
)
Illustrative Code (masking company Snowflake instance):
conn_snowflake <- DBI::dbConnect(odbc::odbc(), .connection_string = "<company>")
id_temp <- DBI::Id(schema = "TEST", table = "TEST")
test_tbl <- tibble::tibble(
a = c(1, NA, 3),
b = c(NA, 2, 4)
)
dbplyr::copy_to(
conn_snowflake,
test_tbl,
name = id_temp,
overwrite = TRUE
)
tbl_sf <- conn_snowflake %>%
tbl(id_temp)
# expected behavior: correctly ignores NA for `min()` / `max()`
tbl_sf %>%
dplyr::summarise(
min_a = min(a, na.rm = TRUE),
min_b = min(b, na.rm = TRUE),
max_a = max(a, na.rm = TRUE),
max_b = max(b, na.rm = TRUE)
)
# Source: SQL [1 x 2]
# Database: Snowflake 7.22.1[michael.thomson@Snowflake/TEST]
min_a min_b max_a max_b
<dbl> <dbl> <dbl> <dbl>
1 1 2 3 4
na_rm <- TRUE # toggle TRUE or FALSE
# unexpected behavior: both return NA for each na.rm = TRUE and FALSE
tbl_sf %>%
dplyr::mutate(
pmin_a_b = pmin(a, b, na.rm = na_rm),
pmax_a_b = pmax(a, b, na.rm = na_rm)
)
# Source: SQL [3 x 4]
# Database: Snowflake 7.22.1[michael.thomson@Snowflake/TEST]
a b pmin_a_b pmax_a_b
<dbl> <dbl> <dbl> <dbl>
1 1 NA NA NA
2 3 4 3 4
3 NA 2 NA NA
Metadata
Metadata
Assignees
Labels
No labels