Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should pivot_wider(values_fill =) fill existing missing values? #1270

Closed
DavisVaughan opened this issue Dec 14, 2021 · 4 comments · Fixed by #1274
Closed

Should pivot_wider(values_fill =) fill existing missing values? #1270

DavisVaughan opened this issue Dec 14, 2021 · 4 comments · Fixed by #1274
Labels
ask :bowtie: feature a feature request or enhancement pivoting ♻️ pivot rectangular data to different "shapes"

Comments

@DavisVaughan
Copy link
Member

DavisVaughan commented Dec 14, 2021

Currently it only fills "new" missing values. spread() filled both new and pre-existing missing values.

If it also filled existing missing values, that would help with the complete() + pivot_wider() story

library(tidyverse)

d <- head(mtcars) %>% 
  group_by(gear, cyl) %>% 
  summarise(mean_hp = mean(hp)) %>%
  ungroup()
#> `summarise()` has grouped output by 'gear'. You can override using the `.groups` argument.

d
#> # A tibble: 4 × 3
#>    gear   cyl mean_hp
#>   <dbl> <dbl>   <dbl>
#> 1     3     6    108.
#> 2     3     8    175 
#> 3     4     4     93 
#> 4     4     6    110

pivot_wider(d, names_from = cyl, values_from = mean_hp, values_fill = 0)
#> # A tibble: 2 × 4
#>    gear   `6`   `8`   `4`
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     3  108.   175     0
#> 2     4  110      0    93

spread(d, cyl, mean_hp, fill = 0)
#> # A tibble: 2 × 4
#>    gear   `4`   `6`   `8`
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     3     0  108.   175
#> 2     4    93  110      0

# but maybe we want gear and cyl in numeric ordering
d <- complete(d, gear, cyl)
d
#> # A tibble: 6 × 3
#>    gear   cyl mean_hp
#>   <dbl> <dbl>   <dbl>
#> 1     3     4     NA 
#> 2     3     6    108.
#> 3     3     8    175 
#> 4     4     4     93 
#> 5     4     6    110 
#> 6     4     8     NA

# it won't fill existing NAs
pivot_wider(d, names_from = cyl, values_from = mean_hp, values_fill = 0)
#> # A tibble: 2 × 4
#>    gear   `4`   `6`   `8`
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     3    NA  108.   175
#> 2     4    93  110     NA

spread(d, cyl, mean_hp, fill = 0)
#> # A tibble: 2 × 4
#>    gear   `4`   `6`   `8`
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     3     0  108.   175
#> 2     4    93  110      0

Created on 2021-12-14 by the reprex package (v2.0.1)

@DavisVaughan DavisVaughan added ask :bowtie: feature a feature request or enhancement labels Dec 14, 2021
@hadley
Copy link
Member

hadley commented Dec 14, 2021

I'm pretty sure this was deliberate (I have a vague recollection that reshape2 filled all missings). We could revisit the decision, but it makes me a little uncomfortable.

@DavisVaughan
Copy link
Member Author

You could always fill at the complete() step, but that does fill both implicit and explicit missings, so you can't ever get it exactly right

library(tidyverse)

mtcars$hp[1] <- NA

d <- head(mtcars) %>% 
  group_by(gear, cyl) %>% 
  summarise(mean_hp = mean(hp)) %>%
  ungroup()
#> `summarise()` has grouped output by 'gear'. You can override using the `.groups` argument.

d
#> # A tibble: 4 × 3
#>    gear   cyl mean_hp
#>   <dbl> <dbl>   <dbl>
#> 1     3     6    108.
#> 2     3     8    175 
#> 3     4     4     93 
#> 4     4     6     NA

# - doesn't fill the explicit NA (right)
# - doesn't have desired column order (wrong)
pivot_wider(d, names_from = cyl, values_from = mean_hp, values_fill = 0)
#> # A tibble: 2 × 4
#>    gear   `6`   `8`   `4`
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     3  108.   175     0
#> 2     4   NA      0    93

# fills in both explicit and implicit NAs
d <- complete(d, gear, cyl, fill = list(mean_hp = 0))
d
#> # A tibble: 6 × 3
#>    gear   cyl mean_hp
#>   <dbl> <dbl>   <dbl>
#> 1     3     4      0 
#> 2     3     6    108.
#> 3     3     8    175 
#> 4     4     4     93 
#> 5     4     6      0 
#> 6     4     8      0

# - explicit NA is filled (wrong)
# - has desired column order (right)
pivot_wider(d, names_from = cyl, values_from = mean_hp)
#> # A tibble: 2 × 4
#>    gear   `4`   `6`   `8`
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     3     0  108.   175
#> 2     4    93    0      0

Created on 2021-12-14 by the reprex package (v2.0.1)

@DavisVaughan DavisVaughan added the pivoting ♻️ pivot rectangular data to different "shapes" label Dec 14, 2021
@DavisVaughan
Copy link
Member Author

DavisVaughan commented Dec 14, 2021

Possibly worth considering adding an option to complete() to support not filling "explicit" missing values.

2015 Hadley seemed to think this was a good idea #127 (comment)

complete2 <- function(data, 
                      ..., 
                      fill = list(), 
                      explicit = TRUE) {
  full <- expand(data, ...)
  names <- names(full)
  
  full <- dplyr::full_join(full, data, by = names)
  
  if (explicit) {
    full <- replace_na(full, replace = fill)
  } else {
    new <- !vec_in(full[names], data[names])
    slice <- vec_slice(full, new)
    slice <- replace_na(slice, replace = fill)
    full <- vec_assign(full, new, slice)
  }
  
  reconstruct_tibble(data, full)
}

complete2(d, gear, cyl, fill = list(mean_hp = 0), explicit = FALSE)
#> # A tibble: 6 × 3
#>    gear   cyl mean_hp
#>   <dbl> <dbl>   <dbl>
#> 1     3     4      0 
#> 2     3     6    108.
#> 3     3     8    175 
#> 4     4     4     93 
#> 5     4     6     NA 
#> 6     4     8      0

@hadley
Copy link
Member

hadley commented Dec 15, 2021

2021 Hadley still thinks this is a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ask :bowtie: feature a feature request or enhancement pivoting ♻️ pivot rectangular data to different "shapes"
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants