-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add drop to pivot_wider #770
Comments
Needs r-lib/vctrs#686. Not sure what to call this argument; it could be |
What about |
Note that the library(tidyr)
d = tibble(day_int = c(4,5,1,2),
day_fac = factor(day_int, levels=1:5,
labels=c("Mon","Tue", "Wed","Thu","Fri")))
d
#> # A tibble: 4 x 2
#> day_int day_fac
#> <dbl> <fct>
#> 1 4 Thu
#> 2 5 Fri
#> 3 1 Mon
#> 4 2 Tue
levels(d$day_fac)
#> [1] "Mon" "Tue" "Wed" "Thu" "Fri"
# spread() automatically creates a `Wed` column
spread(d, key = "day_fac", value = "day_int", drop = FALSE)
#> # A tibble: 1 x 5
#> Mon Tue Wed Thu Fri
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 NA 4 5
# ... but pivot_wider() does not (and does not respect
# the level ordering, but that is issue #839)
pivot_wider(d, names_from = day_fac, values_from = day_int,
values_fill = list(day_fac = NA))
#> # A tibble: 1 x 4
#> Thu Fri Mon Tue
#> <dbl> <dbl> <dbl> <dbl>
#> 1 4 5 1 2 For naming the argument, I like the consistency of |
Thinking about this more, I think it's unlikely the full spectrum of needs can be resolved with just a couple of new arguments to |
Hi, I think it's a shame that spread/pivot_wider can't be told what columns to expect when given an empty input. In extensive pipes, empty data frames/tibbles do happen and missing columns can break the whole mechanism e.g. mutate after pivot_wider. I'd be happier with a result albeit an empty data frames than a broken pipe. thanks, |
@smarc it can be. That's the purpose of the spec. |
Here’s another use case, a bit different from my earlier example, but similar to data I work with. Here, each level of a factor ( library(tidyverse)
# Example data
# Note: In 2019, only males responded
d = tibble(
year = c(2018, 2018, 2019, 2020, 2020),
gender = factor(c("female", "male", "male", "female", "male")),
percentage = seq(30, 70, 10)
)
pivot_wider(d, names_from = c(year, gender), values_from = percentage)
#> # A tibble: 1 x 5
#> `2018_female` `2018_male` `2019_male` `2020_female` `2020_male`
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 30 40 50 60 70 Expected syntax and results: pivot_wider(d, names_from = c(year, gender), values_from = percentage,
names_fill = TRUE)
#> # A tibble: 1 x 6
#> `2018_female` `2018_male` `2019_female` `2019_male` `2020_female` `2020_male`
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 30 40 NA 50 60 70 (I am aware that the |
names_fill = TRUE would be a useful feature! |
Any progress on this feature? It would be very useful and would save the manual filling in of the names and values. |
I think there are generally two classes of problems that Problem 1 - Missing levels in
|
Here is an example exhibiting both problem 1 and problem 2 from above, which is solved by a call to library(tidyr)
df <- tibble(
id = factor(c(2, 1, 1, 2, 1), levels = c(1, 2, 3)),
year = c(2018, 2018, 2019, 2020, 2020),
gender = factor(c("F", "M", "M", "F", "M"), levels = c("F", "M")),
percentage = seq(30, 70, 10)
)
df
#> # A tibble: 5 × 4
#> id year gender percentage
#> <fct> <dbl> <fct> <dbl>
#> 1 2 2018 F 30
#> 2 1 2018 M 40
#> 3 1 2019 M 50
#> 4 2 2020 F 60
#> 5 1 2020 M 70
# - rows are in the wrong order (2 then 1)
# - rows are missing level 3
# - cols are missing combination `2019_F`
pivot_wider(df, id_cols = id, names_from = c(year, gender), values_from = percentage)
#> # A tibble: 2 × 6
#> id `2018_F` `2018_M` `2019_M` `2020_F` `2020_M`
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2 30 NA NA 60 NA
#> 2 1 NA 40 50 NA 70
df <- complete(df, id, year, gender)
df
#> # A tibble: 18 × 4
#> id year gender percentage
#> <fct> <dbl> <fct> <dbl>
#> 1 1 2018 F NA
#> 2 1 2018 M 40
#> 3 1 2019 F NA
#> 4 1 2019 M 50
#> 5 1 2020 F NA
#> 6 1 2020 M 70
#> 7 2 2018 F 30
#> 8 2 2018 M NA
#> 9 2 2019 F NA
#> 10 2 2019 M NA
#> 11 2 2020 F 60
#> 12 2 2020 M NA
#> 13 3 2018 F NA
#> 14 3 2018 M NA
#> 15 3 2019 F NA
#> 16 3 2019 M NA
#> 17 3 2020 F NA
#> 18 3 2020 M NA
pivot_wider(df, id_cols = id, names_from = c(year, gender), values_from = percentage)
#> # A tibble: 3 × 7
#> id `2018_F` `2018_M` `2019_F` `2019_M` `2020_F` `2020_M`
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 NA 40 NA 50 NA 70
#> 2 2 30 NA NA NA 60 NA
#> 3 3 NA NA NA NA NA NA Created on 2021-12-14 by the reprex package (v2.0.1) |
pivot_wider
is missing thedrop
option that was present inspread
which is quite usefulto fill a matrix with empty rows.
The text was updated successfully, but these errors were encountered: