Revisit `dplyr::coalesce` with `across` #54

tmastny · 2020-04-14T14:35:21Z

With dplyr 1.0.0 introducing c_across and across I was wondering if it was possible to revisit tidyverse/dplyr#3548, by allowing dplyr::coalesce to work more naturally with the new across or c_across functions.

After reading the row-wise article, I expected dplyr::coalesce to work like rowSums since it naturally works across rows, or at worst it would work like rowwise => sum.

However, coalesce doesn't seem to work with the across family at all, as you can see in the code below.

Would it be possible to make coalesce compatible with the new across workflow?

library(dplyr, warn.conflicts = FALSE)

df <- tibble(
  id = 1:5, 
  w = c(10, NA, NA, NA, 14), 
  x = c(NA, 21, 22, 23, NA), 
  y = c(NA, NA, 32, 33, NA), 
  z = c(NA, NA, NA, 43, 44)
)

## Does coalesce work like rowSums, because
## they both naturally work across rows?
df %>%
  mutate(a = rowSums(across(-id), na.rm = TRUE))
#> # A tibble: 5 x 6
#>      id     w     x     y     z     a
#>   <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1    10    NA    NA    NA    10
#> 2     2    NA    21    NA    NA    21
#> 3     3    NA    22    32    NA    54
#> 4     4    NA    23    33    43    99
#> 5     5    14    NA    NA    44    58

# No: coalesce doesn't work like rowSums
df %>%
  mutate(a = coalesce(across(-id)))
#> # A tibble: 5 x 6
#>      id     w     x     y     z   a$w    $x    $y    $z
#>   <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1    10    NA    NA    NA    10    NA    NA    NA
#> 2     2    NA    21    NA    NA    NA    21    NA    NA
#> 3     3    NA    22    32    NA    NA    22    32    NA
#> 4     4    NA    23    33    43    NA    23    33    43
#> 5     5    14    NA    NA    44    14    NA    NA    44



## Maybe it works like sum, since coalesce's argument is `...`
df %>%
  rowwise() %>%
  mutate(a = sum(c_across(-id), na.rm = TRUE))
#> # A tibble: 5 x 6
#> # Rowwise: 
#>      id     w     x     y     z     a
#>   <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1    10    NA    NA    NA    10
#> 2     2    NA    21    NA    NA    21
#> 3     3    NA    22    32    NA    54
#> 4     4    NA    23    33    43    99
#> 5     5    14    NA    NA    44    58

# No: coalesce doesn't work with rowwise
df %>%
  rowwise() %>%
  mutate(a = coalesce(c_across(-id)))
#> Error: `mutate()` argument `a` must be recyclable.
#> ℹ `a` is `coalesce(c_across(-id))`.
#> ℹ The error occured in row 1.
#> x `a` can't be recycled to size 1.
#> ℹ `a` must be size 1, not 4.
#> ℹ Did you mean: `a = list(coalesce(c_across(-id)))` ?



## coalesce works if you write out each by hand,
## but that goes against the spirit of the new `across` family
df %>%
  mutate(a = coalesce(w, x, y, z))
#> # A tibble: 5 x 6
#>      id     w     x     y     z     a
#>   <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1    10    NA    NA    NA    10
#> 2     2    NA    21    NA    NA    21
#> 3     3    NA    22    32    NA    22
#> 4     4    NA    23    33    43    23
#> 5     5    14    NA    NA    44    14

# there is a work around suggested in tidyverse/dplyr#3548, but it's not very user friendly
# and requires a different package
library(tidyselect)
df %>%
  mutate(a = coalesce(!!!syms(vars_select(names(.), -id))))
#> # A tibble: 5 x 6
#>      id     w     x     y     z     a
#>   <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1    10    NA    NA    NA    10
#> 2     2    NA    21    NA    NA    21
#> 3     3    NA    22    32    NA    22
#> 4     4    NA    23    33    43    23
#> 5     5    14    NA    NA    44    14

^{Created on 2020-04-14 by the reprex package (v0.3.0)}

The text was updated successfully, but these errors were encountered:

hadley · 2020-04-14T17:39:04Z

This should work, but I can't immediately understand why it doesn't:

library(dplyr, warn.conflicts = FALSE)

df <- tibble(
  id = 1:5, 
  w = c(10, NA, NA, NA, 14), 
  x = c(NA, 21, 22, 23, NA), 
  y = c(NA, NA, 32, 33, NA), 
  z = c(NA, NA, NA, 43, 44)
)

df %>%
  mutate(a = coalesce(!!!across(-id)))
#> Error in .subset2(chunks, self$get_current_group()): attempt to select less than one element in integerOneIndex

^{Created on 2020-04-14 by the reprex package (v0.3.0)}

romainfrancois · 2020-04-15T08:01:14Z

splicing happens "too early", but this works:

library(dplyr, warn.conflicts = FALSE)

df <- tibble(
  id = 1:5, 
  w = c(10, NA, NA, NA, 14), 
  x = c(NA, 21, 22, 23, NA), 
  y = c(NA, NA, 32, 33, NA), 
  z = c(NA, NA, NA, 43, 44)
)

coacross <- function(...) {
  coalesce(!!!across(...))
}

df %>%
  mutate(a = coacross(-id))
#> # A tibble: 5 x 6
#>      id     w     x     y     z     a
#>   <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1    10    NA    NA    NA    10
#> 2     2    NA    21    NA    NA    21
#> 3     3    NA    22    32    NA    22
#> 4     4    NA    23    33    43    23
#> 5     5    14    NA    NA    44    14

^{Created on 2020-04-15 by the reprex package (v0.3.0)}

romainfrancois · 2021-05-06T08:43:31Z

Feature request: coalesce working backwards, i.e. returning the last non-missing column: coalesce() returns the first non-missing passed column/vector value. However, there are use-cases where the opposite would be helpful, i.e. returning the last non-missing value from several columns/vectors.

eutwt · 2021-08-04T05:38:03Z

In case anyone comes across this issue after googling, another workaround is to use do.call(coalesce, across(-id)), which is a little less typing than coalesce(!!!syms(vars_select(names(.), -id)))) and no extra package.

If you want to do it in reverse you could just rev the input to coalesce, although that's probably inefficient.

library(dplyr, warn.conflicts = FALSE)

df <- tibble(
  id = 1:5,
  w = c(10, NA, NA, NA, 14),
  x = c(NA, 21, 22, 23, NA),
  y = c(NA, NA, 32, 33, NA),
  z = c(NA, NA, NA, 43, 44)
)

df %>%
  mutate(a = do.call(coalesce, across(-id)))
#> # A tibble: 5 × 6
#>      id     w     x     y     z     a
#>   <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1    10    NA    NA    NA    10
#> 2     2    NA    21    NA    NA    21
#> 3     3    NA    22    32    NA    22
#> 4     4    NA    23    33    43    23
#> 5     5    14    NA    NA    44    14


df %>%
  mutate(a = do.call(coalesce, rev(across(-id))))
#> # A tibble: 5 × 6
#>      id     w     x     y     z     a
#>   <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1    10    NA    NA    NA    10
#> 2     2    NA    21    NA    NA    21
#> 3     3    NA    22    32    NA    32
#> 4     4    NA    23    33    43    43
#> 5     5    14    NA    NA    44    44

^{Created on 2021-08-04 by the reprex package (v2.0.0)}

ericemc3 · 2022-04-12T09:34:47Z

What about:

df <- tibble(
  id = 1:5, 
  w = c(10, NA, NA, NA, 14), 
  x = c(NA, 21, 22, 23, NA), 
  y = c(NA, NA, 32, 33, NA), 
  z = c(NA, NA, NA, 43, 44)
)

df %>%
  mutate(a = coalesce(!!!select(., -id)))

# A tibble: 5 x 6
     id     w     x     y     z     a
  <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1    10    NA    NA    NA    10
2     2    NA    21    NA    NA    21
3     3    NA    22    32    NA    22
4     4    NA    23    33    43    23
5     5    14    NA    NA    44    14

moodymudskipper · 2022-11-22T12:01:42Z

Since we're revisiting coalesce() and I see some feature requests gathered here, what about overriding other values than NAs ?

The use case is data where missing or special values are encoded as 0, -1, Inf, NaN, "non available" etc.

We have na_if() but we need to use it on all coalesced columns, and might need to turn the NAs back to their special values afterwards. It would be handy if coalesce() handled it.

jdonland · 2024-01-31T17:24:49Z

Wailing, gnashing my teeth, rending my clothing in the streets because coalesce(across(...)) still doesn't work.

hadley transferred this issue from tidyverse/dplyr May 11, 2020

romainfrancois mentioned this issue May 6, 2021

Feature request: coalesce working backwards, i.e. returning the last non-missing column tidyverse/dplyr#5873

Closed

eutwt mentioned this issue Oct 12, 2021

more about across(.fns = NULL) tidyverse/dplyr#6027

Merged

eutwt mentioned this issue Dec 11, 2021

Feature request: allow tidyselect in coalesce() tidyverse/dplyr#5170

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit `dplyr::coalesce` with `across` #54

Revisit `dplyr::coalesce` with `across` #54

tmastny commented Apr 14, 2020

hadley commented Apr 14, 2020

romainfrancois commented Apr 15, 2020

romainfrancois commented May 6, 2021

eutwt commented Aug 4, 2021 •

edited

ericemc3 commented Apr 12, 2022

moodymudskipper commented Nov 22, 2022

jdonland commented Jan 31, 2024

Revisit dplyr::coalesce with across #54

Revisit dplyr::coalesce with across #54

Comments

tmastny commented Apr 14, 2020

hadley commented Apr 14, 2020

romainfrancois commented Apr 15, 2020

romainfrancois commented May 6, 2021

eutwt commented Aug 4, 2021 • edited

ericemc3 commented Apr 12, 2022

moodymudskipper commented Nov 22, 2022

jdonland commented Jan 31, 2024

Revisit `dplyr::coalesce` with `across` #54

Revisit `dplyr::coalesce` with `across` #54

eutwt commented Aug 4, 2021 •

edited