Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

summarise() edge case recycling bug #6509

Closed
DavisVaughan opened this issue Oct 18, 2022 · 3 comments · Fixed by #6527
Closed

summarise() edge case recycling bug #6509

DavisVaughan opened this issue Oct 18, 2022 · 3 comments · Fixed by #6527

Comments

@DavisVaughan
Copy link
Member

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df <- tibble(x = 1:3)

# Why doesn't this recycle to size 0?
summarise(df, sum = sum(x), empty = tibble())
#> Error in `dplyr_col_modify()`:
#> ! Can't recycle `empty` (size 0) to size 1.

#> Backtrace:
#>     ▆
#>  1. ├─dplyr::summarise(df, sum = sum(x), empty = tibble())
#>  2. ├─dplyr:::summarise.data.frame(df, sum = sum(x), empty = tibble())
#>  3. │ └─dplyr:::summarise_build(.data, cols)
#>  4. │   ├─dplyr::dplyr_col_modify(out, cols$new)
#>  5. │   └─dplyr:::dplyr_col_modify.data.frame(out, cols$new)
#>  6. │     └─vctrs::vec_recycle_common(!!!cols, .size = nrow(data))
#>  7. └─vctrs:::stop_recycle_incompatible_size(...)
#>  8.   └─vctrs:::stop_vctrs(...)
#>  9.     └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = vctrs_error_call(call))

# This does
summarise(df, sum = sum(x), empty = tibble(a = integer()))
#> # A tibble: 0 × 2
#> # … with 2 variables: sum <int>, empty <tibble[,1]>

# And these work
summarise(df, sum = sum(x), tibble())
#> # A tibble: 1 × 1
#>     sum
#>   <int>
#> 1     6

summarise(df, sum = sum(x), tibble(a = integer()))
#> # A tibble: 0 × 2
#> # … with 2 variables: sum <int>, a <int>

Created on 2022-10-18 with reprex v2.0.2.9000

@DavisVaughan
Copy link
Member Author

DavisVaughan commented Oct 18, 2022

Something seems wrong with the summarise recycling implementation because it is treating data frames with 0 cols as "not useful" chunks, when they definitely are

return !Rf_inherits(ptype, "data.frame") || XLENGTH(ptype) > 0;

@kcarnold
Copy link

Probably related bug:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
starwars %>% summarize(sum_height = sum(height, na.rm = TRUE))
#> # A tibble: 1 × 1
#>   sum_height
#>        <int>
#> 1      14123
starwars %>% summarize(sum_height = sum(height, na.rm = TRUE), mean_height = sum(height, na.rm = TRUE) / nrow(height))
#> # A tibble: 0 × 2
#> # … with 2 variables: sum_height <int>, mean_height <dbl>

Created on 2022-11-14 by the reprex package (v2.0.1)

I would have expected the second dataframe to either have one row with the sum and some sort of missing value, or error.

@DavisVaughan
Copy link
Member Author

nrow(height) gives NULL, which is size 0, and <dbl> / NULL in the division gives numeric(0), also size 0, so the whole summarise() call is recycled to size 0.

This is more related to #6382 where some people disagree with summarise() being able to return multi-row (or zero row in this case) results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants