Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pivot_wider handles NAs inconsistently with tidyr #1238

Closed
carlganz opened this issue Apr 6, 2023 · 1 comment · Fixed by #1246
Closed

pivot_wider handles NAs inconsistently with tidyr #1238

carlganz opened this issue Apr 6, 2023 · 1 comment · Fixed by #1246

Comments

@carlganz
Copy link

carlganz commented Apr 6, 2023

When there are NA's in the names_from variable dbplyr translates pivot_wider inconsistently.

library(dplyr)
library(dbplyr)
library(tidyr)

### need to setup your own SQL Server connection
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
x <- tibble(
  id = "id",
  x = 1:3,
  y = c("A", NA, "B")
)
copy_to(con, x, overwrite = T)
x2 <- tbl(con, "x")

# NA's work properly in-memory this returns 2
x %>%
  pivot_wider(names_from = y, values_from = x) %>%
  pull(`NA`)

# with dbplyr this returns NA
x2 %>%
  pivot_wider(names_from = y, values_from = x) %>%
  pull(`NA`)
@ejneer
Copy link
Contributor

ejneer commented Apr 11, 2023

this stems from the keys_cond in build_pivot_wider_exprs where expressions are made to ultimately put values in the names_from column.

with your example, the pivot expressions are:

pivot_exprs
$A
max(ifelse(y == "A", x, NULL), na.rm = TRUE)

$`NA`
max(ifelse(y == NA_character_, x, NULL), na.rm = TRUE)

$B
max(ifelse(y == "B", x, NULL), na.rm = TRUE)

I think the issue is y == NA_character_, where y in that case is NA. So NA == NA_character_ is NA, so the ifelse results in NA.

Updating build_pivot_wider_exprs to test for NA seems to fix it but haven't tested if that breaks anything else

build_pivot_wider_exprs <- function(row_id, spec, values_fill, values_fn, data, call) {
  values_col <- spec[[".value"]][row_id]
  fill_value <- values_fill[[values_col]]

  keys <- vctrs::vec_slice(spec[, -(1:2)], row_id)
  keys_cond <- purrr::imap(
    keys,
     function(value, name) {
-      expr(!!sym(name) == !!value)
+      expr(!!sym(name) == !!value || is.na(!!sym(name)) && is.na(!!value))
     }
) %>%
    purrr::reduce(~ expr(!!.x & !!.y))

  case_expr <- expr(ifelse(!!keys_cond, !!sym(values_col), !!fill_value))

  agg_fn <- values_fn[[values_col]]
  resolve_fun(agg_fn, case_expr, data = data, call = call)
}
x2 %>%
  pivot_wider(names_from = y, values_from = x) %>%
  pull(`NA`)
# [1] 2

I can test it some more and open up a PR soon

ejneer added a commit to ejneer/dbplyr that referenced this issue Apr 18, 2023
mgirlich pushed a commit that referenced this issue Apr 24, 2023
* pivot_wider test for na column name handling

* handle `NA` column names in pivot_wider

* update news for #1238
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants