Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make partition work with invoke_rows #51

Closed
stanstrup opened this issue Mar 30, 2017 · 6 comments
Closed

Make partition work with invoke_rows #51

stanstrup opened this issue Mar 30, 2017 · 6 comments

Comments

@stanstrup
Copy link

@stanstrup stanstrup commented Mar 30, 2017

It seems invoke_rows doesn't accept a party_df object. That would be useful...

cluster <- c(detectCores(), length(unique(mtcars$carb))/2) %>% min %>% create_cluster()
mtcars %>% partition(carb, cluster=cluster) %>% invoke_rows(.f = sum)

-->
Error: .d must be a data frame

@jepusto
Copy link

@jepusto jepusto commented Apr 1, 2017

Wrapping in do() makes the above example work:

cars_serial <- 
  mtcars %>% 
  invoke_rows(.f = sum) %>%
  unnest()

cars_parallel <- 
  mtcars %>% 
  partition(carb, cluster=cluster) %>% 
  do(invoke_rows(.f = sum, .d = .)) %>%
  collect() %>%
  unnest()

setdiff(cars_serial, cars_parallel) %>% nrow()

@stanstrup
Copy link
Author

@stanstrup stanstrup commented Apr 20, 2017

Thanks!

@stanstrup
Copy link
Author

@stanstrup stanstrup commented May 16, 2017

The work around now gives me:

Warning message:
group_indices_.grouped_df ignores extra arguments 

I am not understanding what goes wrong here...

R version 3.3.3 (2017-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server 2008 R2 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=Danish_Denmark.1252  LC_CTYPE=Danish_Denmark.1252    LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C                   
[5] LC_TIME=Danish_Denmark.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tidyr_0.6.2.9000      purrrlyr_0.0.1.9000   multidplyr_0.0.0.9000 dplyr_0.5.0.9005     

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10      digest_0.6.12     withr_1.0.2       assertthat_0.2.0  R6_2.2.1          git2r_0.18.0      magrittr_1.5     
 [8] httr_1.2.1        rlang_0.1.9000    lazyeval_0.2.0    curl_2.6          devtools_1.13.0   tools_3.3.3       glue_1.0.0       
[15] memoise_1.1.0     knitr_1.15.1      tibble_1.3.0.9006

@Ax3man
Copy link

@Ax3man Ax3man commented May 16, 2017

Most likely because you have updated dplyr to the latest dev version, but multidplyr isn't up to date.

@derekpowell
Copy link

@derekpowell derekpowell commented Nov 3, 2017

Sorry to resurrect this issue, I'm getting the same group_indices_.grouped_df ignores extra arguments warning. As far as I can tell it's not creating any real issues, but I'm concerned I'm missing something. So, I'm just wondering, should I be worried?

Here's a minimal example:

library(tidyverse)
library(multidplyr)

df <- data.frame(A=c(1,2,3,4,5,6),
                     B=c(4,5,5,6,8,4),
                     group=c(1,1,1,2,2,2))

cluster <- create_cluster(2)
byGroup <- partition(df, group, cluster=cluster)

The resulting byGroup is a party_df that looks correct to me:

> byGroup
Source: party_df [6 x 3]
Groups: group
Shards: 2 [3--3 rows]

# S3: party_df
      A     B group
  <dbl> <dbl> <dbl>
1     1     4     1
2     2     5     1
3     3     5     1
4     4     6     2
5     5     8     2
6     6     4     2

Here's the relevant parts of my sessionInfo():

R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.6

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] multidplyr_0.0.0.9000 modelr_0.1.1          dplyr_0.7.4           purrr_0.2.4          
 [5] readr_1.1.1           tidyr_0.7.2           tibble_1.3.4          ggplot2_2.2.1        
 [9] tidyverse_1.1.1       bnlearn_4.2          

@hadley
Copy link
Member

@hadley hadley commented Jul 11, 2019

This will eventually be fixed by an implementation group_map()/group_modify(); I don't currently have plans to add support for purrr/purrlyr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants