Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbind_all vs rbind #279

Closed
geoffjentry opened this issue Feb 22, 2014 · 11 comments
Closed

rbind_all vs rbind #279

geoffjentry opened this issue Feb 22, 2014 · 11 comments

Comments

@geoffjentry
Copy link

@geoffjentry geoffjentry commented Feb 22, 2014

Hi ...

I have a list of data.frame objects at http://geoffjentry.hexdump.org/example.rda - it contains the list as 'example'.

The following works for me:
do.call(rbind, example)

As suggested by dplyr I wanted to look at replacing that pattern with rbind_all(), but am running into this:

rbind_all(example)
Error: incompatible type (data index: 4, column: 'replyToSN', was collecting: logical (dplyr::Collecter_Impl<10>), incompatible with data of type: STRSXP

Note that replyToSN is NA in the first 3 data.frames and a string in the fourth, I'm assuming that's what is causing the issue here.

Thanks!
-J

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Feb 22, 2014

Not sure we can deal with this. What happens is that replyToSN is a logical vector on the first 3, and then character.

Promotion from logical to character is not handled yet. Maybe it should. @hadley ?

> sapply( example, function(.) sapply(., typeof) )
              [,1]        [,2]        [,3]        [,4]
text          "character" "character" "character" "character"
favorited     "logical"   "logical"   "logical"   "logical"
favoriteCount "double"    "double"    "double"    "double"
replyToSN     "logical"   "logical"   "logical"   "character"

Loading

@geoffjentry
Copy link
Author

@geoffjentry geoffjentry commented Feb 23, 2014

Right - wasn't sure if this was something that just wasn't supported at the moment and/or a known hole in the functionality. Makes sense that it isn't currently.

For my particular use case I'm going to have the same issue with numeric columns as well as the character columns. Not sure how things are done under the hood and if that muddies the waters even further.

Thanks Romain.

Loading

@hadley
Copy link
Member

@hadley hadley commented Feb 23, 2014

I think the current behavior is correct. Converting logical to character is not something that should be done automatically.

Loading

@kevinushey
Copy link
Contributor

@kevinushey kevinushey commented Feb 24, 2014

It would make sense to coerce 'up' the SEXP types, but maybe with warnings for columns that get coerced IMO. Similar to what happens in melt when the value variables are of different types.

Loading

@hadley
Copy link
Member

@hadley hadley commented Feb 24, 2014

Warnings would be a minimum. But it's not clear to me how you get in this situation in the first place - to me it seems like an problem that should be fixed earlier in the data pipeline. (This is something that fastread should make easier)

Loading

@geoffjentry
Copy link
Author

@geoffjentry geoffjentry commented Feb 24, 2014

On my end what triggered it was coercing a list of reference objects to a data.frame. I was converting each object into a 1 row DF and then rbind()ing them together and was looking to use this instead. Looking at it just now there are a few places where I should be able to clean this up as part of the conversion.

I was mainly curious if rbind_all() was intended to be a full drop in replacement for the do.call(rbind, ...) pattern and that this was a corner case, or if this was intentionally not being handled (which looks to be the case).

Loading

@geoffjentry
Copy link
Author

@geoffjentry geoffjentry commented Feb 26, 2014

I thought perhaps what would work would be to do something like as.character(NA) (or numeric, etc) which should then be a CHARSXP. But that causes a segfault:

xx = as.data.frame(list(a=as.numeric(NA), b="c", c="d"))
zz = as.data.frame(list(a=1, b=as.character(NA), c="b"))
rbind_all(list(zz, xx))

*** caught segfault ***
address 0x506066b00, cause 'memory not mapped'

Traceback:
1: .Call("dplyr_rbind_all", PACKAGE = "dplyr", dots)
2: rbind_all(list(zz, xx))

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

Since what I'm trying to do is convert a list of reference class objects to a data.frame and thus dealing with the issue of converting empty vectors to NA, I'm happy to just chalk this up as a goofy edge case that's just not going to work here. In practical terms this conversion tends to only happen once per object and not all the time so it's not the end of the world if it's slow and pokey from sticking with the old fashioned do.call/rbind pair.

Loading

@hadley
Copy link
Member

@hadley hadley commented Feb 26, 2014

That's definitely a bug. Can you please take a look @romainfrancois ?

Loading

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Feb 26, 2014

I'll pick it up tomorrow morning.

Loading

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Feb 27, 2014

There was a problem about collecting factors with NA.

Loading

@geoffjentry
Copy link
Author

@geoffjentry geoffjentry commented Feb 27, 2014

Thanks!

Loading

@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants