`pivot_wider` handles NAs inconsistently with `tidyr` #1238

carlganz · 2023-04-06T14:57:54Z

When there are NA's in the names_from variable dbplyr translates pivot_wider inconsistently.

library(dplyr)
library(dbplyr)
library(tidyr)

### need to setup your own SQL Server connection
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
x <- tibble(
  id = "id",
  x = 1:3,
  y = c("A", NA, "B")
)
copy_to(con, x, overwrite = T)
x2 <- tbl(con, "x")

# NA's work properly in-memory this returns 2
x %>%
  pivot_wider(names_from = y, values_from = x) %>%
  pull(`NA`)

# with dbplyr this returns NA
x2 %>%
  pivot_wider(names_from = y, values_from = x) %>%
  pull(`NA`)

The text was updated successfully, but these errors were encountered:

ejneer · 2023-04-11T19:16:48Z

this stems from the keys_cond in build_pivot_wider_exprs where expressions are made to ultimately put values in the names_from column.

with your example, the pivot expressions are:

pivot_exprs
$A
max(ifelse(y == "A", x, NULL), na.rm = TRUE)

$`NA`
max(ifelse(y == NA_character_, x, NULL), na.rm = TRUE)

$B
max(ifelse(y == "B", x, NULL), na.rm = TRUE)

I think the issue is y == NA_character_, where y in that case is NA. So NA == NA_character_ is NA, so the ifelse results in NA.

Updating build_pivot_wider_exprs to test for NA seems to fix it but haven't tested if that breaks anything else

build_pivot_wider_exprs <- function(row_id, spec, values_fill, values_fn, data, call) {
  values_col <- spec[[".value"]][row_id]
  fill_value <- values_fill[[values_col]]

  keys <- vctrs::vec_slice(spec[, -(1:2)], row_id)
  keys_cond <- purrr::imap(
    keys,
     function(value, name) {
-      expr(!!sym(name) == !!value)
+      expr(!!sym(name) == !!value || is.na(!!sym(name)) && is.na(!!value))
     }
) %>%
    purrr::reduce(~ expr(!!.x & !!.y))

  case_expr <- expr(ifelse(!!keys_cond, !!sym(values_col), !!fill_value))

  agg_fn <- values_fn[[values_col]]
  resolve_fun(agg_fn, case_expr, data = data, call = call)
}

x2 %>%
  pivot_wider(names_from = y, values_from = x) %>%
  pull(`NA`)
# [1] 2

I can test it some more and open up a PR soon

* pivot_wider test for na column name handling * handle `NA` column names in pivot_wider * update news for #1238

ejneer mentioned this issue Apr 13, 2023

Make pivot_wider NA column name handling match tidyr #1246

Merged

ejneer added a commit to ejneer/dbplyr that referenced this issue Apr 18, 2023

update news for tidyverse#1238

d9df895

mgirlich closed this as completed in #1246 Apr 24, 2023

mgirlich pushed a commit that referenced this issue Apr 24, 2023

Make pivot_wider NA column name handling match tidyr (#1246)

48aebb6

* pivot_wider test for na column name handling * handle `NA` column names in pivot_wider * update news for #1238

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`pivot_wider` handles NAs inconsistently with `tidyr` #1238

`pivot_wider` handles NAs inconsistently with `tidyr` #1238

carlganz commented Apr 6, 2023

ejneer commented Apr 11, 2023

pivot_wider handles NAs inconsistently with tidyr #1238

pivot_wider handles NAs inconsistently with tidyr #1238

Comments

carlganz commented Apr 6, 2023

ejneer commented Apr 11, 2023

`pivot_wider` handles NAs inconsistently with `tidyr` #1238

`pivot_wider` handles NAs inconsistently with `tidyr` #1238