Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit dplyr::coalesce with across #54

Open
tmastny opened this issue Apr 14, 2020 · 7 comments
Open

Revisit dplyr::coalesce with across #54

tmastny opened this issue Apr 14, 2020 · 7 comments

Comments

@tmastny
Copy link

tmastny commented Apr 14, 2020

With dplyr 1.0.0 introducing c_across and across I was wondering if it was possible to revisit tidyverse/dplyr#3548, by allowing dplyr::coalesce to work more naturally with the new across or c_across functions.

After reading the row-wise article, I expected dplyr::coalesce to work like rowSums since it naturally works across rows, or at worst it would work like rowwise => sum.

However, coalesce doesn't seem to work with the across family at all, as you can see in the code below.

Would it be possible to make coalesce compatible with the new across workflow?

library(dplyr, warn.conflicts = FALSE)

df <- tibble(
  id = 1:5, 
  w = c(10, NA, NA, NA, 14), 
  x = c(NA, 21, 22, 23, NA), 
  y = c(NA, NA, 32, 33, NA), 
  z = c(NA, NA, NA, 43, 44)
)

## Does coalesce work like rowSums, because
## they both naturally work across rows?
df %>%
  mutate(a = rowSums(across(-id), na.rm = TRUE))
#> # A tibble: 5 x 6
#>      id     w     x     y     z     a
#>   <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1    10    NA    NA    NA    10
#> 2     2    NA    21    NA    NA    21
#> 3     3    NA    22    32    NA    54
#> 4     4    NA    23    33    43    99
#> 5     5    14    NA    NA    44    58

# No: coalesce doesn't work like rowSums
df %>%
  mutate(a = coalesce(across(-id)))
#> # A tibble: 5 x 6
#>      id     w     x     y     z   a$w    $x    $y    $z
#>   <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1    10    NA    NA    NA    10    NA    NA    NA
#> 2     2    NA    21    NA    NA    NA    21    NA    NA
#> 3     3    NA    22    32    NA    NA    22    32    NA
#> 4     4    NA    23    33    43    NA    23    33    43
#> 5     5    14    NA    NA    44    14    NA    NA    44



## Maybe it works like sum, since coalesce's argument is `...`
df %>%
  rowwise() %>%
  mutate(a = sum(c_across(-id), na.rm = TRUE))
#> # A tibble: 5 x 6
#> # Rowwise: 
#>      id     w     x     y     z     a
#>   <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1    10    NA    NA    NA    10
#> 2     2    NA    21    NA    NA    21
#> 3     3    NA    22    32    NA    54
#> 4     4    NA    23    33    43    99
#> 5     5    14    NA    NA    44    58

# No: coalesce doesn't work with rowwise
df %>%
  rowwise() %>%
  mutate(a = coalesce(c_across(-id)))
#> Error: `mutate()` argument `a` must be recyclable.
#> ℹ `a` is `coalesce(c_across(-id))`.
#> ℹ The error occured in row 1.
#> x `a` can't be recycled to size 1.
#> ℹ `a` must be size 1, not 4.
#> ℹ Did you mean: `a = list(coalesce(c_across(-id)))` ?



## coalesce works if you write out each by hand,
## but that goes against the spirit of the new `across` family
df %>%
  mutate(a = coalesce(w, x, y, z))
#> # A tibble: 5 x 6
#>      id     w     x     y     z     a
#>   <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1    10    NA    NA    NA    10
#> 2     2    NA    21    NA    NA    21
#> 3     3    NA    22    32    NA    22
#> 4     4    NA    23    33    43    23
#> 5     5    14    NA    NA    44    14

# there is a work around suggested in tidyverse/dplyr#3548, but it's not very user friendly
# and requires a different package
library(tidyselect)
df %>%
  mutate(a = coalesce(!!!syms(vars_select(names(.), -id))))
#> # A tibble: 5 x 6
#>      id     w     x     y     z     a
#>   <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1    10    NA    NA    NA    10
#> 2     2    NA    21    NA    NA    21
#> 3     3    NA    22    32    NA    22
#> 4     4    NA    23    33    43    23
#> 5     5    14    NA    NA    44    14

Created on 2020-04-14 by the reprex package (v0.3.0)

@hadley
Copy link
Member

hadley commented Apr 14, 2020

This should work, but I can't immediately understand why it doesn't:

library(dplyr, warn.conflicts = FALSE)

df <- tibble(
  id = 1:5, 
  w = c(10, NA, NA, NA, 14), 
  x = c(NA, 21, 22, 23, NA), 
  y = c(NA, NA, 32, 33, NA), 
  z = c(NA, NA, NA, 43, 44)
)

df %>%
  mutate(a = coalesce(!!!across(-id)))
#> Error in .subset2(chunks, self$get_current_group()): attempt to select less than one element in integerOneIndex

Created on 2020-04-14 by the reprex package (v0.3.0)

@romainfrancois
Copy link
Member

splicing happens "too early", but this works:

library(dplyr, warn.conflicts = FALSE)

df <- tibble(
  id = 1:5, 
  w = c(10, NA, NA, NA, 14), 
  x = c(NA, 21, 22, 23, NA), 
  y = c(NA, NA, 32, 33, NA), 
  z = c(NA, NA, NA, 43, 44)
)

coacross <- function(...) {
  coalesce(!!!across(...))
}

df %>%
  mutate(a = coacross(-id))
#> # A tibble: 5 x 6
#>      id     w     x     y     z     a
#>   <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1    10    NA    NA    NA    10
#> 2     2    NA    21    NA    NA    21
#> 3     3    NA    22    32    NA    22
#> 4     4    NA    23    33    43    23
#> 5     5    14    NA    NA    44    14

Created on 2020-04-15 by the reprex package (v0.3.0)

@romainfrancois
Copy link
Member

Feature request: coalesce working backwards, i.e. returning the last non-missing column: coalesce() returns the first non-missing passed column/vector value. However, there are use-cases where the opposite would be helpful, i.e. returning the last non-missing value from several columns/vectors.

@eutwt
Copy link

eutwt commented Aug 4, 2021

In case anyone comes across this issue after googling, another workaround is to use do.call(coalesce, across(-id)), which is a little less typing than coalesce(!!!syms(vars_select(names(.), -id)))) and no extra package.

If you want to do it in reverse you could just rev the input to coalesce, although that's probably inefficient.

library(dplyr, warn.conflicts = FALSE)

df <- tibble(
  id = 1:5,
  w = c(10, NA, NA, NA, 14),
  x = c(NA, 21, 22, 23, NA),
  y = c(NA, NA, 32, 33, NA),
  z = c(NA, NA, NA, 43, 44)
)

df %>%
  mutate(a = do.call(coalesce, across(-id)))
#> # A tibble: 5 × 6
#>      id     w     x     y     z     a
#>   <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1    10    NA    NA    NA    10
#> 2     2    NA    21    NA    NA    21
#> 3     3    NA    22    32    NA    22
#> 4     4    NA    23    33    43    23
#> 5     5    14    NA    NA    44    14


df %>%
  mutate(a = do.call(coalesce, rev(across(-id))))
#> # A tibble: 5 × 6
#>      id     w     x     y     z     a
#>   <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1    10    NA    NA    NA    10
#> 2     2    NA    21    NA    NA    21
#> 3     3    NA    22    32    NA    32
#> 4     4    NA    23    33    43    43
#> 5     5    14    NA    NA    44    44

Created on 2021-08-04 by the reprex package (v2.0.0)

@ericemc3
Copy link

What about:

df <- tibble(
  id = 1:5, 
  w = c(10, NA, NA, NA, 14), 
  x = c(NA, 21, 22, 23, NA), 
  y = c(NA, NA, 32, 33, NA), 
  z = c(NA, NA, NA, 43, 44)
)

df %>%
  mutate(a = coalesce(!!!select(., -id)))

# A tibble: 5 x 6
     id     w     x     y     z     a
  <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1    10    NA    NA    NA    10
2     2    NA    21    NA    NA    21
3     3    NA    22    32    NA    22
4     4    NA    23    33    43    23
5     5    14    NA    NA    44    14

@moodymudskipper
Copy link

Since we're revisiting coalesce() and I see some feature requests gathered here, what about overriding other values than NAs ?

The use case is data where missing or special values are encoded as 0, -1, Inf, NaN, "non available" etc.

We have na_if() but we need to use it on all coalesced columns, and might need to turn the NAs back to their special values afterwards. It would be handy if coalesce() handled it.

@jdonland
Copy link

Wailing, gnashing my teeth, rending my clothing in the streets because coalesce(across(...)) still doesn't work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants