Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Can pivot_wider() have an option to preserve non-id_cols column by chopping into list columns? #990

Closed
yutannihilation opened this issue Jul 5, 2020 · 0 comments · Fixed by #1280
Labels
feature pivoting ♻️

Comments

@yutannihilation
Copy link
Member

@yutannihilation yutannihilation commented Jul 5, 2020

(I asked this on RStudio Community first, but couldn't get a nice answer. So I thought this is worth a feature request.)

At first, honestly I didn't feel it's useful that pivot_wider() creates list columns when there are any duplicated values. But, now I find it very useful, in that it allows us to delay the decision of how to handle the duplication. This request is basically about extending it to non-id_cols column.

Suppose I have this data:

library(dplyr, warn.conflicts = FALSE)
library(tidyr)

d <- data.frame(
  id = rep(1:5, each = 3),
  update = c(as.Date("2020-07-01") + 1:15),
  key = rep(c("a", "b", "c"), 5),
  val = 1:15
)

d
#>    id     update key val
#> 1   1 2020-07-02   a   1
#> 2   1 2020-07-03   b   2
#> 3   1 2020-07-04   c   3
#> 4   2 2020-07-05   a   4
#> 5   2 2020-07-06   b   5
#> 6   2 2020-07-07   c   6
#> 7   3 2020-07-08   a   7
#> 8   3 2020-07-09   b   8
#> 9   3 2020-07-10   c   9
#> 10  4 2020-07-11   a  10
#> 11  4 2020-07-12   b  11
#> 12  4 2020-07-13   c  12
#> 13  5 2020-07-14   a  13
#> 14  5 2020-07-15   b  14
#> 15  5 2020-07-16   c  15

and want to

  • pivot key to columns and
  • get the latest update of each id

at the same time. Currently, I can achieve this by summarising update with mutate() beforehand:

d %>% 
  # summarise `update` by using `mutate()`
  group_by(id) %>% 
  mutate(last_update = max(update)) %>% 
  ungroup() %>% 
  # choose `id` and `last_update` as key
  pivot_wider(c(id, last_update),
    names_from = key,
    values_from = val
  )
#> # A tibble: 5 x 5
#>      id last_update     a     b     c
#>   <int> <date>      <int> <int> <int>
#> 1     1 2020-07-04      1     2     3
#> 2     2 2020-07-07      4     5     6
#> 3     3 2020-07-10      7     8     9
#> 4     4 2020-07-13     10    11    12
#> 5     5 2020-07-16     13    14    15

This is fine. But, I wander if it's possible that pivot_wider() chops non-id_cols column into a list column instead of just dropping. If it does, I can take max()s of them. Here's a pseudo-code; update isn't specified either on id_cols, names_from, or values_from), so it gets dropped, but I want to preserve the column.

d %>% 
  pivot_wider(id,
    names_from = key,
    values_from = val
  ) %>%
  rowwise() %>%
  mutate(last_update = max(update), .keep = "unused")

Or, more simply, this:

d %>% 
  pivot_wider(id,
    names_from = key,
    values_from = val,
    values_fn = max
  )
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature pivoting ♻️
Projects
None yet
2 participants