Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbind functions are too strict about types #321

Closed
otsaw opened this issue Mar 14, 2014 · 8 comments
Closed

rbind functions are too strict about types #321

otsaw opened this issue Mar 14, 2014 · 8 comments
Assignees
Labels
Milestone

Comments

@otsaw
Copy link

@otsaw otsaw commented Mar 14, 2014

> rbind_list(data.frame(x=1:3), data.frame(x=NA))
Error: incompatible type (data index: 2, column: 'x', was collecting: integer (dplyr::Collecter_Impl<13>), incompatible with data of type: LGLSXP

This works the other way around.

> rbind_list(data.frame(x=NA), data.frame(x=1:3))
   x
1 NA
2  1
3  2
4  3

Integer promotion to numeric doesn't work either way.

I understand if some strictness is a necessary consequence of a speecy C-implementation. However, if possible, there are at least two harmless promotions that I'd expect to happen automatically.

  1. Logical due to all NAs should get promoted to whatever type another piece with actual data has.
  2. Integers should get promoted to numeric.
@otsaw
Copy link
Author

@otsaw otsaw commented Mar 14, 2014

Sorry, I checked the integer promotion to numeric wrong. It does work one way too.

> rbind_list(data.frame(x=0.5), data.frame(x=1:3))
Error: incompatible type (data index: 2, column: 'x', was collecting: numeric (dplyr::Collecter_Impl<14>), incompatible with data of type: INTSXP
> rbind_list(data.frame(x=1:3), data.frame(x=3.5))
    x
1 1.0
2 2.0
3 3.0
4 3.5

@hadley
Copy link
Member

@hadley hadley commented Mar 14, 2014

Related to #228.

@hadley hadley added this to the v0.2 milestone Mar 17, 2014
@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Mar 24, 2014

I now get:

> rbind_list(data.frame(x=0.5), data.frame(x=1:3))
    x
1 0.5
2 1.0
3 2.0
4 3.0

and

> rbind_list(data.frame(x=1:3), data.frame(x=NA))
   x
1  1
2  2
3  3
4 NA

romainfrancois added a commit that referenced this issue Mar 24, 2014
@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Mar 31, 2014

I think this is now done.

@otsaw
Copy link
Author

@otsaw otsaw commented Apr 1, 2014

Thank you kindly, but not so fast mister. There's still problems with factor and character. I wrote down some test cases that pass with with base::rbind and that I'd expect all to work.

https://gist.github.com/otsaw/9911389

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Apr 1, 2014

I get :

> .rbind = function(x, y)
+     tryCatch({
+         dplyr::rbind_list(x, y)
+         message("OK")
+     }, error=function(e) print(e))
>
> message("logical --> integer")
logical --> integer
> .rbind(data.frame(x=TRUE), data.frame(x=1L))
OK
> .rbind(data.frame(x=1L), data.frame(x=TRUE))
OK
>
> message("logical --> numeric")
logical --> numeric
> .rbind(data.frame(x=TRUE), data.frame(x=1.5))
.rbind(data.frame(x=1.5), data.frame(x=TRUE))

message("integer --> numeric")
.rbind(data.frame(x=1L), data.frame(x=1.5))
.rbind(data.frame(x=1.5), data.frame(x=1L))

# Not sure if factor should be promoted to character or character to factor.
# base::rbind seems to keep type of the first argument.
message("factor <--> character")
.rbind(data.frame(x="foo"), data.frame(x="foo", stringsAsFactors=FALSE))
.rbind(data.frame(x="foo", stringsAsFactors=FALSE), data.frame(x="foo"))

message("NA --> integer")
.rbind(data.frame(x=NA), data.frame(x=1L))
.rbind(data.frame(x=1L), data.frame(x=NA))

message("NA --> factor")
.rbind(data.frame(x=NA), data.frame(x="foo"))
.rbind(data.frame(x="foo"), data.frame(x=NA))

message("NA --> numeric")
.rbind(data.frame(x=NA), data.frame(x=1.5))
.rbind(data.frame(x=1.5), data.frame(x=NA))

message("NA --> character")
.rbind(data.frame(x=NA), data.frame(x="foo", stringsAsFactors=FALSE))
.rbind(data.frame(x="foo", stringsAsFactors=FALSE), data.frame(x=NA)OK
> .rbind(data.frame(x=1.5), data.frame(x=TRUE))
OK
>
> message("integer --> numeric")
integer --> numeric
> .rbind(data.frame(x=1L), data.frame(x=1.5))
OK
> .rbind(data.frame(x=1.5), data.frame(x=1L))
OK
>
> # Not sure if factor should be promoted to character or character to factor.
> # base::rbind seems to keep type of the first argument.
> message("factor <--> character")
factor <--> character
> .rbind(data.frame(x="foo"), data.frame(x="foo", stringsAsFactors=FALSE))
OK
> .rbind(data.frame(x="foo", stringsAsFactors=FALSE), data.frame(x="foo"))
OK
>
> message("NA --> integer")
NA --> integer
> .rbind(data.frame(x=NA), data.frame(x=1L))
OK
> .rbind(data.frame(x=1L), data.frame(x=NA))
OK
>
> message("NA --> factor")
NA --> factor
> .rbind(data.frame(x=NA), data.frame(x="foo"))
<Rcpp::exception: incompatible type (data index: 2, column: 'x', was collecting: logical (dplyr::Collecter_Impl<10>), incompatible with data of type: factor>
> .rbind(data.frame(x="foo"), data.frame(x=NA))
<Rcpp::exception: incompatible type (data index: 2, column: 'x', was collecting: factor (dplyr::FactorCollecter), incompatible with data of type: logical>
>
> message("NA --> numeric")
NA --> numeric
> .rbind(data.frame(x=NA), data.frame(x=1.5))
OK
> .rbind(data.frame(x=1.5), data.frame(x=NA))
OK
>
> message("NA --> character")
NA --> character
> .rbind(data.frame(x=NA), data.frame(x="foo", stringsAsFactors=FALSE))
<Rcpp::exception: incompatible type (data index: 2, column: 'x', was collecting: logical (dplyr::Collecter_Impl<10>), incompatible with data of type: character>
> .rbind(data.frame(x="foo", stringsAsFactors=FALSE), data.frame(x=NA))
<Rcpp::exception: incompatible type (data index: 2, column: 'x', was collecting: integer (dplyr::Collecter_Impl<16>), incompatible with data of type: logical>

which feels right.

@hadley
Copy link
Member

@hadley hadley commented Apr 1, 2014

That looks good to me. I'd say that factor + character should always become character, regardless of the order.

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Apr 1, 2014

Yes. that's what's happening now.

  • If we were collecting a factor and we now have to collect a character, then the collecter is promoted to a character collecter.
  • If we are collecting a character, then it knows how to collect a factor.

Perhaps I can turn @otsaw's code into additional test cases.

@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants