Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multidplyr doesn't like unnest_longer() #133

Open
xabriel opened this issue Mar 2, 2022 · 2 comments
Open

multidplyr doesn't like unnest_longer() #133

xabriel opened this issue Mar 2, 2022 · 2 comments
Labels
feature a feature request or enhancement

Comments

@xabriel
Copy link

xabriel commented Mar 2, 2022

unnest_longer(), function of tidyr, fails with what seems a red herring error message.

library(tidyverse)
library(multidplyr)

data <-
  tibble(
    list_col = list(list("a", "b"), list("c", "d")),
    int = c(1, 1)
  )

# unnest_longer works as expected w/o multidplyr
data %>%
  unnest_longer(col = list_col, values_to = "unlisted")
#> # A tibble: 4 × 2
#>   unlisted   int
#>   <chr>    <dbl>
#> 1 a            1
#> 2 b            1
#> 3 c            1
#> 4 d            1

cluster <- new_cluster(2)
data_partitioned <- data %>%
  partition(cluster)

# multidplyr fails
data_partitioned %>%
  unnest_longer(col = list_col, values_to = "unlisted")
#> Error: object 'list_col' not found

rlang::last_trace()
#> <error/rlang_error>
#> object 'list_col' not found
#> Backtrace:
#>     █
#>  1. ├─data_partitioned %>% unnest_longer(col = list_col, values_to = "unlisted")
#>  2. ├─tidyr::unnest_longer(., col = list_col, values_to = "unlisted")
#>  3. │ └─tidyselect::vars_pull(names(data), !!enquo(col))
#>  4. │   ├─tidyselect:::instrument_base_errors(...)
#>  5. │   │ └─base::withCallingHandlers(...)
#>  6. │   └─rlang::eval_tidy(enquo(var), set_names(seq_along(vars), vars))
#>  7. └─base::.handleSimpleError(...)
#>  8.   └─tidyselect:::h(simpleError(msg, call))
#> <error/simpleError>
#> object 'list_col' not found

Created on 2022-03-02 by the reprex package (v2.0.1)

@xabriel
Copy link
Author

xabriel commented Mar 2, 2022

I work around this by using collect() before using unnest_longer() and then doing partition(cluster) again when I need the parallelism.

Here is a repro from my project using multidplyr that shows the workaround for this issue as well as #132:

https://github.com/xabriel/wikipedia-vandalism/blob/e60c1925ffe8ebac98919e809046e4a77fd85f5c/wikipedia-vandalism.R#L378-L425

@hadley hadley added the feature a feature request or enhancement label Oct 31, 2023
@hadley
Copy link
Member

hadley commented Oct 31, 2023

Thanks for the suggestion! Will definitely consider it when I'm next working on multidplyr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants