Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

full_join generates <NA> entries when joining character vectors with different encodings #2271

Closed
jarauh opened this issue Nov 29, 2016 · 2 comments
Labels
Milestone

Comments

@jarauh
Copy link

@jarauh jarauh commented Nov 29, 2016

This is another facet of the wellknown encodings problem (e.g. #1885). Sorry if the example is more complicated than necessary.

library(dplyr)
x <- "fa\xE7ile"
xx <- iconv(x, "latin1", "UTF-8")

x == xx  # TRUE

left <- matrix(c(x, "facile", "1", "2"), ncol = 2)
colnames(left) <- c("c1", "c2")
left <- data.frame(left, stringsAsFactors = FALSE)
right <- matrix(c(xx, "facile", "A", "B"), ncol = 2)
colnames(right) <- c("c1", "c3")
right <- data.frame(right, stringsAsFactors = FALSE)

full_join(left, right, by = "c1")

Output:

      c1   c2   c3
1 façile    1 <NA>
2 facile    2    B
3   <NA> <NA>    A

Note the last row that contains an entry in the column that is used for joining.

@krlmlr
Copy link
Member

@krlmlr krlmlr commented Nov 29, 2016

Thanks. I think you should be using UTF-8 only for column data, dplyr will be more careful about that in the future.

@jarauh
Copy link
Author

@jarauh jarauh commented Nov 29, 2016

@krlmlr I agree. I can work around this, but I wanted to report it, since even knowing that comparing strings with different encodings is problematic, having new entries appear in the by-column when doing a full_join seems like a separate bug.

@krlmlr krlmlr added this to the data frame 1 milestone Feb 20, 2017
@krlmlr krlmlr closed this in #2451 Feb 20, 2017
@lock lock bot locked as resolved and limited conversation to collaborators Jun 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants