-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strings with and without encoding are not matched when joining #1885
Comments
Maybe tibble should force encoding to utf-8? Dplyr would still need to warn but that would mitigate some of the hassle |
In my case, data came from a CSV file read using read.csv(); but readr already takes care of the encoding. If this done by tibble, perhaps readr doesn't have to do it anymore. match() is Rcpp::match(), and Rcpp seems to respect the declared encoding (RcppCore/Rcpp#189, RcppCore/Rcpp#466). To me, r_match() looks like a safe, if perhaps slower, alternative. Or we fix upstream. |
Would be good to fix upstream |
Doesn't look like an upstream fix will become available soon. We should just make sure that column data is always UTF-8. |
Related: #1950, column names with non-native encoding. |
Expected:
b == 1
in the result.This can be mitigated by using r_match() instead of match() in the JoinStringStringVisitor, but I wonder if dplyr should warn instead.
The text was updated successfully, but these errors were encountered: