Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Printing error involving group_nest, rowwise , and across #6264

Closed
TylerGrantSmith opened this issue May 11, 2022 · 5 comments · Fixed by #6393
Closed

Printing error involving group_nest, rowwise , and across #6264

TylerGrantSmith opened this issue May 11, 2022 · 5 comments · Fixed by #6393
Labels
bug an unexpected problem or unintended behavior each-row ↕️ vctrs ↗️

Comments

@TylerGrantSmith
Copy link

I came upon the following error when attempting to debug some code. I think some combination of group_nest/rowwise/across is constructing a tibble that breaks the tbl formatter.

suppressWarnings(library(tidyverse))
tibble(x = 1:2, y = 1) %>% 
  group_nest(y) %>%
  rowwise() %>% 
  summarise(str = print(across()))
#> Error in `summarise()`:
#> ! Problem while computing `str = print(across())`.
#> i The error occurred in row 1.
#> Caused by error in `[<-.data.frame`:
#> ! replacement element 2 is a matrix/data frame of 2 rows, need 1
@hadley
Copy link
Member

hadley commented Jul 21, 2022

Possibly related error:

library(dplyr, warn.conflicts = FALSE)

tibble(x = 1, y = list(tibble(x = 1:2))) %>% 
  rowwise() %>% 
  summarise(x = across())
#> Error in `summarise()`:
#> ! Problem while computing `x = across()`.
#> ℹ The error occurred in row 1.
#> Caused by error in `vec_c()` at dplyr/R/summarise.R:335:2:
#> ! `value` should have been recycled to fit `x`.
#> ℹ In file 'slice-assign.c' at line 238.
#> ℹ This is an internal error in the vctrs package, please report it to the package authors.

Created on 2022-07-21 by the reprex package (v2.0.1)

@hadley hadley added bug an unexpected problem or unintended behavior vctrs ↗️ each-row ↕️ labels Jul 21, 2022
@DavisVaughan
Copy link
Member

Two slightly more minimal examples

library(tidyverse)

tibble(x = list(1:2)) %>%
  rowwise() %>%
  mutate(y = across())
#> Error in `vec_slice()`:
#> ! Column `x` (size 2) must match the data frame (size 1).
#> ℹ In file 'slice.c' at line 191.
#> ℹ This is an internal error in the vctrs package, please report it to the package authors.

tibble(x = list(1:2)) %>%
  rowwise() %>%
  summarise(y = across())
#> Error in `summarise()`:
#> ! Problem while computing `y = across()`.
#> ℹ The error occurred in row 1.
#> Caused by error in `vec_c()`:
#> ! `value` should have been recycled to fit `x`.
#> ℹ In file 'slice-assign.c' at line 238.
#> ℹ This is an internal error in the vctrs package, please report it to the package authors.

I think both of these are variants of this scenario:

library(tidyverse)

tibble(x = list(1:2)) %>%
  rowwise() %>%
  mutate(y = x)
#> Error in `mutate()`:
#> ! Problem while computing `y = x`.
#> ✖ `y` must be size 1, not 2.
#> ℹ Did you mean: `y = list(x)` ?
#> ℹ The error occurred in row 1.

I'm fairly the mutate() one should error since you are trying to change the number of rows.

Maybe the summarise() one should result in a df-col for y containing tibble(x = 1:2) (hard to say)

@DavisVaughan
Copy link
Member

DavisVaughan commented Jul 21, 2022

library(tidyverse)

tibble(x = list(1:2)) %>%
  rowwise() %>%
  mutate(y = {assign("foo", across(), envir = .GlobalEnv); 1})
#> # A tibble: 1 × 2
#> # Rowwise: 
#>   x             y
#>   <list>    <dbl>
#> 1 <int [2]>     1

# This is what `across()` is giving us
foo
#> Error:
#> ! Assigned data `map(.subset(x, unname), vectbl_set_names, NULL)` must be compatible with existing data.
#> ✖ Existing data has 1 row.
#> ✖ Assigned data has 2 rows.
#> ℹ Row updates require a list value. Do you need `list()` or `as.list()`?

unclass(foo)
#> $x
#> [1] 1 2
#> 
#> attr(,"row.names")
#> [1] 1

# Probably from something like this,
# since we "knew" the original number of rows
new_tibble(list(x = 1:2), nrow = 1L)
#> Error:
#> ! Assigned data `map(.subset(x, unname), vectbl_set_names, NULL)` must be compatible with existing data.
#> ✖ Existing data has 1 row.
#> ✖ Assigned data has 2 rows.
#> ℹ Row updates require a list value. Do you need `list()` or `as.list()`?

Created on 2022-07-21 by the reprex package (v2.0.1)

@hadley
Copy link
Member

hadley commented Jul 30, 2022

So the root cause seems to be that across() is generating a corrupt data frame because it's somewhere unchopping x?

library(dplyr, warn.conflicts = FALSE)

df <- tibble(x = list(1:2, 3:5)) %>%
  rowwise() %>%
  mutate(across = list(across()))

lapply(df$across, unclass)
#> [[1]]
#> [[1]]$x
#> [1] 1 2
#> 
#> attr(,"row.names")
#> [1] 1
#> 
#> [[2]]
#> [[2]]$x
#> [1] 3 4 5
#> 
#> attr(,"row.names")
#> [1] 1

Created on 2022-07-30 by the reprex package (v2.0.1)

@DavisVaughan
Copy link
Member

Note to ourselves:

I mentioned in an earlier comment that I thought these were variants of:

library(tidyverse)

tibble(x = list(1:2)) %>%
  rowwise() %>%
  mutate(y = x)
#> Error in `mutate()`:
#> ! Problem while computing `y = x`.
#> ✖ `y` must be size 1, not 2.
#> ℹ Did you mean: `y = list(x)` ?
#> ℹ The error occurred in row 1.

But I no longer believe that is correct.

across(cols, .fns = NULL) should actually be semantically equivalent to the proposed pick(cols) (#6204), so for this special case of .fns = NULL we would preserve the list-column "as is" rather than trying to access/unchop its elements.

.fns = NULL would be deprecated in favor of pick() to avoid this kind of confusion going forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior each-row ↕️ vctrs ↗️
Projects
None yet
3 participants