Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inner_join not working as expected #450

Closed
rickyars opened this issue Jun 3, 2014 · 2 comments
Closed

inner_join not working as expected #450

rickyars opened this issue Jun 3, 2014 · 2 comments
Assignees
Labels
Milestone

Comments

@rickyars
Copy link

@rickyars rickyars commented Jun 3, 2014

I'm having problems with inner_join not working as expected. For some reason the data.frame ordering matters.

This does not work (tmp1 only has 1 row):

foo <- data.frame(x = "a", y = as.numeric(-10:10))
foo$var1 <- runif(nrow(foo))

bar <- data.frame(x = c(rep("b", 20), rep("a", 20)), y = as.integer(-10:9))
bar$var2 <- runif(nrow(bar))

tmp1 <- inner_join(foo, bar, by=c("x", "y"))
tmp2 <- inner_join(bar, foo, by=c("x", "y"))

This does work (the only difference is how bar is sorted):

foo <- data.table(x = "a", y = as.numeric(-10:10))
foo$var1 <- runif(nrow(foo))

bar <- data.frame(x = c(rep("a", 20), rep("b", 20)), y = as.integer(-10:9))
bar$var2 <- runif(nrow(bar))

tmp1 <- inner_join(foo, bar, by=c("x", "y"))
tmp2 <- inner_join(bar, foo, by=c("x", "y"))

I apologize for double posting (I originally wrote this under issue #326). I realized after posting that this probably should be its own issue and it's presumptuous to assume that the feature is the cause of the behavior.

@rickyars
Copy link
Author

@rickyars rickyars commented Jun 6, 2014

I found a smaller working example:

foo <- data.frame(id = 1:10, var1 = "foo")
bar <- data.frame(id = as.numeric(rep(1:10, 5)), var2 = "bar")

tmp1 <- inner_join(foo, bar, by="id")
tmp2 <- inner_join(bar, foo, by="id")

When the smaller data.frame is on the left, the answer is as we expect. When the smaller data.frame is on the right, we get the wrong answer. Perhaps this points to issue #326 as the culprit.

Note: this example fails to reproduce the error if the by variable "id" is of the same type (foo$id is integer, bar$id is numeric)

@hadley
Copy link
Member

@hadley hadley commented Sep 12, 2014

Here's a test

test_that("inner_join is symmetric (even when type of join var is different)", {
  foo <- tbl_df(data.frame(id = 1:10, var1 = "foo"))
  bar <- tbl_df(data.frame(id = as.numeric(rep(1:10, 5)), var2 = "bar"))

  tmp1 <- inner_join(foo, bar, by="id")
  tmp2 <- inner_join(bar, foo, by="id")

  expect_equal(names(tmp1), c("id", "var1", "var2"))
  expect_equal(names(tmp2), c("id", "var2", "var1"))

  expect_equal(tmp1, tmp2)
})

If I remove the as.numeric() in bar, the test passes. That suggests it's something to do with column coercion, and likely to be related to #455

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants