Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem when anti_join()ing with a factor #1571

Closed
jennybc opened this issue Dec 4, 2015 · 9 comments
Closed

problem when anti_join()ing with a factor #1571

jennybc opened this issue Dec 4, 2015 · 9 comments
Assignees
Labels
Milestone

Comments

@jennybc
Copy link
Member

@jennybc jennybc commented Dec 4, 2015

anti_join() isn't working right when by is a factor. The wrong rows are being retained and, despite what the warning says, the factor-ness of the by variable persists. I discovered this in a much larger example and the results, while always wrong, are not deterministic.

library(dplyr)
(big <- data.frame(letter = rep(c('a', 'b'), each = 2),
                   number = 1:2))
#>   letter number
#> 1      a      1
#> 2      a      2
#> 3      b      1
#> 4      b      2
(small <- data.frame(letter = 'b'))
#>   letter
#> 1      b
(aj_result <- big %>%
  anti_join(small))
#> Joining by: "letter"
#> Warning in anti_join_impl(x, y, by$x, by$y): joining factors with different
#> levels, coercing to character vector
#>   letter number
#> 1      b      2
#> 2      b      1
#> 3      a      1
str(aj_result)
#> 'data.frame':    3 obs. of  2 variables:
#>  $ letter: Factor w/ 2 levels "a","b": 2 2 1
#>  $ number: int  2 1 1

I'm running the current development version.

@coolbutuseless
Copy link
Contributor

@coolbutuseless coolbutuseless commented Dec 13, 2015

Confirming that I see this behaviour as well.
If you force "stringsAsFactors=FALSE" the error disappears. (Not that that's the solution - just confirming that it's a factors thing)

@hadley hadley added the bug label Mar 1, 2016
@hadley hadley added this to the 0.5 milestone Mar 1, 2016
@hadley
Copy link
Member

@hadley hadley commented Mar 1, 2016

@romainfrancois can you take a look please?

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Apr 30, 2016

Pretty sure the underlying issue has been fixed while tackling #1712

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Apr 30, 2016

@hadley @jennybc let me know what you think, but I kept the column letter as a factor because my understanding of anti_join(x,y) is that it produces a subset of x.

I eliminated the warning though.

> big <- data.frame(letter = rep(c('a', 'b'), each = 2), number = 1:2)
> small <- data.frame(letter = 'b')
>
> anti_join(big, small) %>% str
Joining by: "letter"
'data.frame':   2 obs. of  2 variables:
 $ letter: Factor w/ 2 levels "a","b": 1 1
 $ number: int  1 2

Also I wanted to test the absence of that warning, but I can't use expect_silent because of the "Joining by" message. Is there a way to test for "this does not produce a warning".

@jennybc
Copy link
Member Author

@jennybc jennybc commented Apr 30, 2016

It seems fixed for me 🙂 and I certainly like that letter is still a factor! But it does feel inconsistent with the other joins, which always take unequal factor levels as excuse to convert character to factor.

@hadley
Copy link
Member

@hadley hadley commented May 1, 2016

@jennybc semi_join() and anti_join() are more like filter() than mutate() so here I think it makes sense to preserve the original types.

@romainfrancois two options: expect_warning(..., NA) or use by = "letter" to suppress the message

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented May 1, 2016

Thanks, and sorry I missed what was right there in the documentation of expect_warning.
Would it be overkill to have:

expect_no_warning <- function( object, ...) expect_warning( object, NA, ...)

for expressiveness

@hadley
Copy link
Member

@hadley hadley commented May 1, 2016

@romainfrancois I think I decided against that because asserting that warnings are "missing" seemed relatively clear for me (and starts to get a bit grammatically complicated because I think it would need to be expect_no_warnings())

@krlmlr
Copy link
Member

@krlmlr krlmlr commented May 9, 2016

The test left_join handles mix of encodings in column names (#1571) fails on my system (Ubuntu 15.10, 3074cf7).

Perhaps related: #1507 (comment)

@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants