-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should pivot_wider(values_fill =)
fill existing missing values?
#1270
Comments
I'm pretty sure this was deliberate (I have a vague recollection that reshape2 filled all missings). We could revisit the decision, but it makes me a little uncomfortable. |
You could always library(tidyverse)
mtcars$hp[1] <- NA
d <- head(mtcars) %>%
group_by(gear, cyl) %>%
summarise(mean_hp = mean(hp)) %>%
ungroup()
#> `summarise()` has grouped output by 'gear'. You can override using the `.groups` argument.
d
#> # A tibble: 4 × 3
#> gear cyl mean_hp
#> <dbl> <dbl> <dbl>
#> 1 3 6 108.
#> 2 3 8 175
#> 3 4 4 93
#> 4 4 6 NA
# - doesn't fill the explicit NA (right)
# - doesn't have desired column order (wrong)
pivot_wider(d, names_from = cyl, values_from = mean_hp, values_fill = 0)
#> # A tibble: 2 × 4
#> gear `6` `8` `4`
#> <dbl> <dbl> <dbl> <dbl>
#> 1 3 108. 175 0
#> 2 4 NA 0 93
# fills in both explicit and implicit NAs
d <- complete(d, gear, cyl, fill = list(mean_hp = 0))
d
#> # A tibble: 6 × 3
#> gear cyl mean_hp
#> <dbl> <dbl> <dbl>
#> 1 3 4 0
#> 2 3 6 108.
#> 3 3 8 175
#> 4 4 4 93
#> 5 4 6 0
#> 6 4 8 0
# - explicit NA is filled (wrong)
# - has desired column order (right)
pivot_wider(d, names_from = cyl, values_from = mean_hp)
#> # A tibble: 2 × 4
#> gear `4` `6` `8`
#> <dbl> <dbl> <dbl> <dbl>
#> 1 3 0 108. 175
#> 2 4 93 0 0 Created on 2021-12-14 by the reprex package (v2.0.1) |
Possibly worth considering adding an option to 2015 Hadley seemed to think this was a good idea #127 (comment) complete2 <- function(data,
...,
fill = list(),
explicit = TRUE) {
full <- expand(data, ...)
names <- names(full)
full <- dplyr::full_join(full, data, by = names)
if (explicit) {
full <- replace_na(full, replace = fill)
} else {
new <- !vec_in(full[names], data[names])
slice <- vec_slice(full, new)
slice <- replace_na(slice, replace = fill)
full <- vec_assign(full, new, slice)
}
reconstruct_tibble(data, full)
}
complete2(d, gear, cyl, fill = list(mean_hp = 0), explicit = FALSE)
#> # A tibble: 6 × 3
#> gear cyl mean_hp
#> <dbl> <dbl> <dbl>
#> 1 3 4 0
#> 2 3 6 108.
#> 3 3 8 175
#> 4 4 4 93
#> 5 4 6 NA
#> 6 4 8 0 |
2021 Hadley still thinks this is a good idea. |
Currently it only fills "new" missing values.
spread()
filled both new and pre-existing missing values.If it also filled existing missing values, that would help with the
complete() + pivot_wider()
storyCreated on 2021-12-14 by the reprex package (v2.0.1)
The text was updated successfully, but these errors were encountered: