-
Notifications
You must be signed in to change notification settings - Fork 420
Description
With #1289 we added a grouped-df method for complete(). Originally I made the decision to pass cur_data_all() through to each ungrouped call to complete() because it seemed like people were (nonsensically) completing on grouping variables. This actually causes issues!
Lines 99 to 108 in c73cf21
| dplyr::summarise( | |
| data, | |
| complete( | |
| data = dplyr::cur_data_all(), | |
| ..., | |
| fill = fill, | |
| explicit = explicit | |
| ), | |
| .groups = "keep" | |
| ) |
I am thinking we should just pass cur_data() through instead, which would avoid the issue seen below.
This would result in a breaking change that people would no longer be able to complete on a grouping variable (You'd get an error saying that the grouping variable isn't found). But I think that completing on a grouping variable is a bit useless, as the whole point is to complete within that variable, so you shouldn't have access to it.
library(tidyr)
library(dplyr, warn.conflicts = FALSE)
df <- tibble(
item_id = c(1:2, 2, 3),
value1 = c(1, NA, 3, 4),
item_name = c("a", "a", "b", "b"),
value2 = 4:7,
group = c(1:2, 1, 2)
)
gdf <- group_by(df, group)
gdf
#> # A tibble: 4 × 5
#> # Groups: group [2]
#> item_id value1 item_name value2 group
#> <dbl> <dbl> <chr> <int> <dbl>
#> 1 1 1 a 4 1
#> 2 2 NA a 5 2
#> 3 2 3 b 6 1
#> 4 3 4 b 7 2
# expand "within" each group. so the group value should be repeated.
expand(gdf, item_id, item_name)
#> # A tibble: 8 × 3
#> # Groups: group [2]
#> group item_id item_name
#> <dbl> <dbl> <chr>
#> 1 1 1 a
#> 2 1 1 b
#> 3 1 2 a
#> 4 1 2 b
#> 5 2 2 a
#> 6 2 2 b
#> 7 2 3 a
#> 8 2 3 b
# (the NAs in the 1 `group` should be 1, and should be 2 in the 2 `group`)
# we don't get that here because we passed the "full" data including grouping
# cols through to `complete.data.frame()`, which did a full-join resulting in
# missing values for those columns. So then `summarize()` thought it didn't
# need to add the grouping columns back on.
complete(gdf, item_id, item_name)
#> # A tibble: 8 × 5
#> # Groups: group [3]
#> group item_id item_name value1 value2
#> <dbl> <dbl> <chr> <dbl> <int>
#> 1 1 1 a 1 4
#> 2 NA 1 b NA NA
#> 3 NA 2 a NA NA
#> 4 1 2 b 3 6
#> 5 2 2 a NA 5
#> 6 NA 2 b NA NA
#> 7 NA 3 a NA NA
#> 8 2 3 b 4 7Created on 2022-01-10 by the reprex package (v2.0.1)