Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in right_join when both by variables have label attributes #1636

Closed
rkingdc opened this issue Jan 22, 2016 · 12 comments
Closed

error in right_join when both by variables have label attributes #1636

rkingdc opened this issue Jan 22, 2016 · 12 comments
Assignees
Labels
Milestone

Comments

@rkingdc
Copy link

@rkingdc rkingdc commented Jan 22, 2016

I'm getting the error:

Error: cannot join on columns 'i' x 'i': Can't join on 'i' x 'i' because of incompatible types (numeric / numeric)

When the by variable i from both datasets have a label attribute pulled in from haven::read_sas.

Reproducible example via attached .sas7bdat files:

options(stringsAsFactors=FALSE)
library(haven)
library(dplyr)

tbl_right <- read_sas('right.sas7bdat')
tbl_left  <- read_sas('left.sas7bdat')

right_join(tbl_right, tbl_left, by='i')

attr(tbl_right$i, 'label') <- NULL
attr(tbl_left$i, 'label') <- NULL

right_join(tbl_right, tbl_left, by='i')
> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.4.3 haven_0.2.0

loaded via a namespace (and not attached):
[1] assertthat_0.1 DBI_0.3.1      magrittr_1.5   parallel_3.1.2 R6_2.1.0       Rcpp_0.11.6   
[7] tools_3.1.2

example_files.zip

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Feb 4, 2016

Can you make this reproducible. We don't have your .sas7bdat files

@rkingdc
Copy link
Author

@rkingdc rkingdc commented Feb 4, 2016

I attached a .zip file. Are they not in there?

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Feb 8, 2016

Ok thanks. I did not see the attachment before.

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Feb 8, 2016

Ok so what happens is that we don't know how to join these so it fails. We could instead drop attributes, but maybe this would lead to other issues. ping @hadley ?

@huftis
Copy link

@huftis huftis commented Feb 15, 2016

When the attributes are identical (which they typically are if the two data frames come from the same source), this could be handled gracefully, by just including the attributes on the resulting data frame.

If they are not identical, keep the first set of attributes (i.e. the ones from the source data frame) and issue a warning that the second ones have been dropped?

@hadley
Copy link
Member

@hadley hadley commented Mar 1, 2016

@ikkyle can you please make an inline reproducible example? You shouldn't need to use haven or external data

@rkingdc
Copy link
Author

@rkingdc rkingdc commented Mar 1, 2016

library(dplyr)

# I had to make the "by" variables numeric rather than integer--it works fine when they're integers
tbl_left <- tbl_df(data.frame(i = rep(c(1.0, 2.0, 3.0), each = 2),
                              x1 = letters[1:6]))
tbl_right <- tbl_df(data.frame(i = c(1.0, 2.0, 3.0),
                               x2 = letters[1:3]))

left_join(tbl_left, tbl_right, by = 'i')

attr(tbl_left$i, 'label') <- 'iterator'
attr(tbl_right$i, 'label') <- 'iterator'

left_join(tbl_left, tbl_right, by = 'i')

@hadley
Copy link
Member

@hadley hadley commented Mar 1, 2016

  • For future reference, please don't include session info unless explicitly asked
  • Can you please use data_frame() instead of tbl_df() + data.frame()

@rkingdc
Copy link
Author

@rkingdc rkingdc commented Mar 1, 2016

tbl_left <- data_frame(
  i = rep(c(1.0, 2.0, 3.0), each = 2),
  x1 = letters[1:6]
)
tbl_right <- data_frame(
  i = c(1.0, 2.0, 3.0),
  x2 = letters[1:3]
)

left_join(tbl_left, tbl_right, by = 'i')

attr(tbl_left$i, 'label') <- 'iterator'
attr(tbl_right$i, 'label') <- 'iterator'

left_join(tbl_left, tbl_right, by = 'i')

@hadley
Copy link
Member

@hadley hadley commented Mar 1, 2016

@romainfrancois this at least needs a better error message.

I think @huftis has the right strategy if the attributes are identical, but if they're not identical, I think we might need more than a warning.

@hadley hadley added the feature label Mar 1, 2016
@hadley hadley added this to the 0.5 milestone Mar 1, 2016
@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Mar 21, 2016

This might need some extra testing or better choices of error messages in fail cases but basically I implemented @huftis suggestion about letting pass when attributes are identical. And error when not.

@anhqle
Copy link

@anhqle anhqle commented Apr 5, 2016

I would suggest an error message indicating that the reason for failure is due to differing attributes. Currently there is no indication that points towards attributes.

@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants