New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better dispatching strategy in bind_rows() #2457

Open
hadley opened this Issue Feb 21, 2017 · 10 comments

Comments

Projects
None yet
9 participants
@hadley
Member

hadley commented Feb 21, 2017

bind_rows() currently only works with data frames (or things very similar to dataframes), but it should work for more data structures:

Getting this to work is challenging because bind_rows() takes ...: how can we find the right method? One approach would be to use reduce() to reduce the problem to finding the right method for a pair of tables. However, this will lose many of the performance benefits of bind_rows() because it will have to grow a data frame.

I think the best compromise will be to find the GCD of the classes of all elements of .... If this is not null, call bind_rows_n.common_class(dots). If it is NULL, fall back to reduce(dots, bind_rows_2).

@tiernanmartin

This comment has been minimized.

Show comment
Hide comment
@tiernanmartin

tiernanmartin Feb 21, 2017

This might be implicit but bind_rows also fails with sf objects (#2459)

tiernanmartin commented Feb 21, 2017

This might be implicit but bind_rows also fails with sf objects (#2459)

@kevinykuo

This comment has been minimized.

Show comment
Hide comment
@kevinykuo

kevinykuo Feb 22, 2017

@hadley I'm working on a bind_rows implementation for sparklyr. What needs to happen in dplyr for me to expose bind_rows.tbl_spark?

kevinykuo commented Feb 22, 2017

@hadley I'm working on a bind_rows implementation for sparklyr. What needs to happen in dplyr for me to expose bind_rows.tbl_spark?

@hadley

This comment has been minimized.

Show comment
Hide comment
@hadley

hadley Feb 22, 2017

Member

@kevinykuo nothing, because it's not currently possible :(

Member

hadley commented Feb 22, 2017

@kevinykuo nothing, because it's not currently possible :(

@lionel-

This comment has been minimized.

Show comment
Hide comment
@lionel-

lionel- Apr 6, 2017

Member

I just realised that the problem is now even harder since we support vectors. Some vectors have classes (e.g. factors and dates), and we'll dispatch on more vector classes in the future as we sort out c-level vector dispatch.

Though one easy improvement would be to call collect() on objects inheriting from tbl_lazy. And we could still deal with objects inheriting from data frames if we find a reasonable solution for propagating attributes. However I don't think we'll be able to deal with objects like sp.

Edit: actually we could consider all S4 objects and S3 lists as potential data frames. Or we could have a is_recursive_vector() generic in rlang or S3.

Member

lionel- commented Apr 6, 2017

I just realised that the problem is now even harder since we support vectors. Some vectors have classes (e.g. factors and dates), and we'll dispatch on more vector classes in the future as we sort out c-level vector dispatch.

Though one easy improvement would be to call collect() on objects inheriting from tbl_lazy. And we could still deal with objects inheriting from data frames if we find a reasonable solution for propagating attributes. However I don't think we'll be able to deal with objects like sp.

Edit: actually we could consider all S4 objects and S3 lists as potential data frames. Or we could have a is_recursive_vector() generic in rlang or S3.

@kevinykuo

This comment has been minimized.

Show comment
Hide comment
@kevinykuo

kevinykuo Apr 6, 2017

Would it make sense to have a version of bind_rows() that works just within each backend?

EDIT: OK I misunderstood before so my previous question didn't make sense. Now to clarify, are you thinking bind_rows_2 would attempt some sort of class conversion, @hadley ?

kevinykuo commented Apr 6, 2017

Would it make sense to have a version of bind_rows() that works just within each backend?

EDIT: OK I misunderstood before so my previous question didn't make sense. Now to clarify, are you thinking bind_rows_2 would attempt some sort of class conversion, @hadley ?

@bramtayl

This comment has been minimized.

Show comment
Hide comment
@bramtayl

bramtayl May 15, 2017

Units get lost in bind_rows as well

bramtayl commented May 15, 2017

Units get lost in bind_rows as well

@phaverty

This comment has been minimized.

Show comment
Hide comment
@phaverty

phaverty Apr 4, 2018

I'd love to see a fix for this. I've got a bunch of data.frames with survival:Surv columns and mismatched columns that need rbind-ing. If there is a plan for dealing with this already, I might have time to put together an implementation.

phaverty commented Apr 4, 2018

I'd love to see a fix for this. I've got a bunch of data.frames with survival:Surv columns and mismatched columns that need rbind-ing. If there is a plan for dealing with this already, I might have time to put together an implementation.

@tgrushka

This comment has been minimized.

Show comment
Hide comment
@tgrushka

tgrushka Aug 24, 2018

Still no fix for sf objects?

tgrushka commented Aug 24, 2018

Still no fix for sf objects?

@sdanielzafar

This comment has been minimized.

Show comment
Hide comment
@sdanielzafar

sdanielzafar Sep 21, 2018

I'm building a package that has a special column variable and I'm facing this issue. Another thread mentioned that I can write a bind_rows method to fix this on my side? I looked at the code and it's all Rcpp, how could I add this functionality? What method(s) would I create?

sdanielzafar commented Sep 21, 2018

I'm building a package that has a special column variable and I'm facing this issue. Another thread mentioned that I can write a bind_rows method to fix this on my side? I looked at the code and it's all Rcpp, how could I add this functionality? What method(s) would I create?

@rituk

This comment has been minimized.

Show comment
Hide comment
@rituk

rituk Oct 5, 2018

I think I'm getting a similar error. If anyone already as a solution please let me know.
Here is the code snippet.

samp2 %>%
rowwise() %>%
mutate (OddsR =oddsRatio(matrix(c(PRE_CORRECT_NUMBER, PRE_INCORRECT, POST_CORRECT_NUMBER, POST_INCORRECT),2,2) ) )

Warning messages:
1: In mutate_impl(.data, dots) :
Vectorizing 'oddsRatio' elements may not preserve their attributes
2: In mutate_impl(.data, dots) :
Vectorizing 'oddsRatio' elements may not preserve their attributes

rituk commented Oct 5, 2018

I think I'm getting a similar error. If anyone already as a solution please let me know.
Here is the code snippet.

samp2 %>%
rowwise() %>%
mutate (OddsR =oddsRatio(matrix(c(PRE_CORRECT_NUMBER, PRE_INCORRECT, POST_CORRECT_NUMBER, POST_INCORRECT),2,2) ) )

Warning messages:
1: In mutate_impl(.data, dots) :
Vectorizing 'oddsRatio' elements may not preserve their attributes
2: In mutate_impl(.data, dots) :
Vectorizing 'oddsRatio' elements may not preserve their attributes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment