Error in reading Unicode text from Excel file to R #125

leminhson · 2015-09-03T01:12:45Z

The command read_excel reads Unicode string from Excel to R and returns a string with non-Unicode characters.

Ex: A string "Sét lẫn laterite" is converted to "Sét l<U+1EAB>n laterite"

nortonle · 2015-12-18T17:13:07Z

Hi,

I am Vietnamese too, and I got exactly the same issue as you did. Surprisingly, the solution itself is very simple and straightforward. I update the system parameter as below.
Sys.setlocale("LC_ALL", 'en_US.UTF-8')

Then, Vietnamese characters are read correctly by using read_excel. Please be noted, if you print out using console, you can see Vietnamese, but if you use View command to view dataset, R may not displayed them properly. Please double check if my solution works for you.

leminhson · 2015-12-19T02:21:11Z

Hi nortonle,
Thank you for your solution. However, the syntax Sys.setlocale("LC_ALL", 'en_US.UTF-8') does not work on Windows system.

nortonle · 2015-12-19T02:49:48Z

Hi,

It's weird. I am running on Windows too. I executed the following command on my workplace laptop (windows 7), it worked. I just tried to executed it on my personal laptop (windows 10). Although it gave a warning, but it worked perfectly fine. Would you please share the error message of yours?
[1] ""
Warning message:
In Sys.setlocale("LC_ALL", "en_US.UTF-8") :
OS reports request to set locale to "en_US.UTF-8" cannot be honored

leminhson · 2015-12-19T12:08:11Z

Hi nortonle,

Here is a warning message (like yours):

Warning message:
In Sys.setlocale("LC_ALL", "en_US.UTF8") :
OS reports request to set locale to "en_US.UTF8" cannot be honored

When I read data to an object aaa using this command: aaa=read_excel("C:/Book1.xls")

The result is: Há»‘ khoan instead of "Hố khoan"

I run in RStudio 0.99.467 ; R version 3.2.2 ; Windows 7 32bit.

adpgithub · 2015-12-29T13:37:39Z

Hi,
I am also facing the same issue. My Excel file contains a name
O’Donnell
which changes to
Oâ€™Donnell
when loaded using read_excel() (readxl Version: 0.1.0)
Does anyone know any workaround or fix for this issue?
Thanks.

kyotin · 2016-06-27T06:37:46Z

I was facing with the problem like yours, i see the problem come from you had viewed whole your dataframe/datatable directly. Let try view only 1 field/columns.

>df[1:2,] name price Ford EcoSport 2014 621.000.000 camry nh<U+1EAD>p kh<U+1EA9>u 2.0e 770.000.000 >df[1:2,"name"] [1] "Ford EcoSport 2014" "camry nhập khẩu 2.0e"
You can also use '$' e.g df$name.

jennybc · 2017-02-05T00:21:16Z

It would be helpful to get some example sheets from the people in this thread. Also: please clarify how you are inspecting the imported data frame (for example, printing the data frame in the Console versus using View() in RStudio).

How to provide a readxl reprex

We're in a much better position to address your issue if you can provide a reprex (reproducible example). Provide as much of this as you can:

An actual xls or xlsx file. Pick one:
- Your personal xls or xlsx: try to strip it down to the minimal size and complexity to demonstrate your point. And, obviously, remove any sensitive data.
- A publicly available xls or xlsx: provide URL and the code you used to download.
A small bit of R code that uses readxl on the provided xls or xlsx file and demonstrates your point.
- Consider using the reprex package to prepare this. In addition to nice formatting, this ensures your reprex is self-contained.
Any details about your environment that seem clearly relevant, such as operating system.
reprex(..., si = TRUE)
will append a standard summary, folded neatly away, at the bottom of your reprex.

How to provide your own xls/xlsx file? In order of preference:

Attach the file directly to your issue. Instructions are always at the bottom of the issue or comment box.
Share via DropBox or Google Drive and provide the link in your issue.
Explain you absolutely cannot provide a relevant file via github.com and offer to provide privately.
Don't share a file and realize you're hoping for, e.g., a bug fix with no concrete example to go on.

leminhson · 2017-02-25T03:58:37Z

Hi jennybc,
I follow your instruction of using reprex package.

The data file "samples.xlsx" is attached directly to this comment.
A code is used to read data from Excel to R:
library(readxl)
read_excel("D:/samples.xlsx")
Run the command reprex(si=TRUE) in RStudio. Here is the result (not correct as expected):

library(readxl)
read_excel("D:/samples.xlsx")
#>                                     MÃ´ t<U+1EA3>
#> 1                                        BÃ™N SÃ‰T
#> 2             SÃ‰T l<U+1EAB>n s<U+1EA1>n laterite
#> 3 CÃ�T THÃ” l<U+1EAB>n s<U+1EA1>n th<U+1EA1>ch anh
#> 4                                 CÃ�T M<U+1ECA>N
#> 5                                      CÃ�T TRUNG
#> 6            SÃ‰T l<U+1EAB>n k<U+1EBF>t vÃ³n silic

Session info

sessionInfo()
#> R version 3.3.2 (2016-10-31)
#> Platform: i386-w64-mingw32/i386 (32-bit)
#> Running under: Windows 7 (build 7601) Service Pack 1
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] readxl_0.1.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] backports_1.0.5 magrittr_1.5    rprojroot_1.2   tools_3.3.2    
#>  [5] htmltools_0.3.5 yaml_2.1.14     Rcpp_0.12.9     stringi_1.1.2  
#>  [9] rmarkdown_1.3   knitr_1.15.1    stringr_1.2.0   digest_0.6.12  
#> [13] evaluate_0.10

The correct result should be like this:

samples.xlsx

hadley · 2017-03-09T17:42:57Z

@leminhson can you please confirm that you're on windows?

@jennybc to diagnose on the mac, you'll need

Sys.setlocale(, "en_US.ISO8859-1")
read_excel("~/Desktop/samples.xlsx")

This is almost certainly caused by assigning std::string into a CharacterVector somewhere, because that loses the UTF-8 encoding information that should be applied. I'm reasonably certain that RapidXml always returns UTF-8 encoded strings (although you should double-check that) - the problem is not that the data stored in the string is incorrect, it's that R isn't correctly informed about the encoding

leminhson · 2017-03-10T03:02:12Z

@hadley
Yes, I am using Windows 7.
After applying Sys.setlocale(, "English_United States.1252"), the result is still incorrect.
I think the problem is that the character code 1252 does not have Vietnamese characters. We need to set UTF-8 to display Vietnamese characters properly.

hadley · 2017-03-10T03:07:29Z

@leminhson changing locales will not fix the problem because you are on windows.

jennybc · 2017-03-21T23:57:10Z

@hadley

Here's what I see on my Mac in a branch where I am (print) debugging:

devtools::load_all(".")
#> Loading readxl
#> Re-compiling readxl
#> <output suppressed>

df <- read_excel(test_sheet("vietnamese-characters.xlsx"))
#> storing a column name:  Mô tả
#> storing cell contents:  BÙN SÉT
#> storing cell contents:  SÉT lẫn sạn laterite
#> storing cell contents:  CÁT THÔ lẫn sạn thạch anh
#> storing cell contents:  CÁT MỊN
#> storing cell contents:  CÁT TRUNG
#> storing cell contents:  SÉT lẫn kết vón silic

df
#> # A tibble: 6 × 1
#>                     `Mô tả`
#>                       <chr>
#> 1                   BÙN SÉT
#> 2      SÉT lẫn sạn laterite
#> 3 CÁT THÔ lẫn sạn thạch anh
#> 4                   CÁT MỊN
#> 5                 CÁT TRUNG
#> 6     SÉT lẫn kết van silic

Encoding(df[[1]])
#> [1] "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8"
df[[1]][1]
#> [1] "BÙN SÉT"
as.matrix(df)
#>      Mô tả                      
#> [1,] "BÙN SÉT"                  
#> [2,] "SÉT lẫn sạn laterite"     
#> [3,] "CÁT THÔ lẫn sạn thạch anh"
#> [4,] "CÁT MỊN"                  
#> [5,] "CÁT TRUNG"                
#> [6,] "SÉT lẫn kết vón silic"

Then I do as you say and change the locale:

Sys.setlocale(locale = "en_US.ISO8859-1")
#> [1] "en_US.ISO8859-1/en_US.ISO8859-1/en_US.ISO8859-1/C/en_US.ISO8859-1/en_CA.UTF-8"

df <- read_excel(test_sheet("vietnamese-characters.xlsx"))
#> storing a column name:  Mô tả
#> storing cell contents:  BÙN SÉT
#> storing cell contents:  SÉT lẫn sạn laterite
#> storing cell contents:  CÁT THÔ lẫn sạn thạch anh
#> storing cell contents:  CÁT MỊN
#> storing cell contents:  CÁT TRUNG
#> storing cell contents:  SÉT lẫn kết vón silic

df
#> # A tibble: 6 � 1
#>                                      `M� t<U+1EA3>`
#>                                                  <chr>
#> 1                                        B�N S�T
#> 2                S�T l<U+1EAB>n s<U+1EA1>n laterite
#> 3 C�T TH� l<U+1EAB>n s<U+1EA1>n th<U+1EA1>ch anh
#> 4                                    C�T M<U+1ECA>N
#> 5                                         C�T TRUNG
#> 6            S�T l<U+1EAB>n k<U+1EBF>t v�n silic

Encoding(df[[1]])
#> [1] "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8"

df[[1]][1]
#> [1] "B�N S�T"

as.matrix(df)
#>      M� t<U+1EA3>                                
#> [1,] "B�N S�T"                           
#> [2,] "S�T l<U+1EAB>n s<U+1EA1>n laterite"   
#> [3,] "C�T TH� l<U+1EAB>n s<U+1EA1>n th<U+1EA1>ch anh"
#> [4,] "C�T M<U+1ECA>N"                       
#> [5,] "C�T TRUNG"                            
#> [6,] "S�T l<U+1EAB>n k<U+1EBF>t v�n silica"

On the C++ side, it seems like the encoding is correctly specified. And, if I'm interpreting the above correctly, the character vector arrives in R with UTF-8 encoding. And yet something is clearly not right.

I can work on other things for now and we could discuss on Friday.

hadley · 2017-03-22T12:14:14Z

How are you doing the print debugging? In C++ or R? C++ doesn't have a notion of string encoding.

I think this suggests rapidxml isn't converting to utf-8. You'll probably need to printing binary representation as hex to debug.

hadley · 2017-03-23T19:46:29Z

@leminhson what do you see if you run this code?

x <- "BÙN SÉT"
x

lionel- · 2017-03-23T21:34:48Z

changing locales will not fix the problem because you are on windows.

I think changing the locale is the only way to fix the problem ;)

@leminhson should set his locale to a Windows codepage with support for Vietnamese characters, then everything will display properly.

lionel- · 2017-03-23T21:43:14Z

@leminhson Does this solve the problem?

Sys.setlocale("LC_CTYPE", "English_United States.1258")

jennybc · 2017-03-23T22:13:14Z

Here's some output from a branch that prints details on strings. Suggests that the strings are read and stored correctly and encoded as UTF-8. So, as conversation above indicates, this appears to be a printing problem and presumably one that is not specific to readxl and the tibbles it produces.

I hope that @lionel-'s suggestion to switch locales is fruitful.

> devtools::load_all()
Loading readxl
> Sys.getlocale()
[1] "en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8"
> Sys.setlocale(locale = "en_US.ISO8859-1")
[1] "en_US.ISO8859-1/en_US.ISO8859-1/en_US.ISO8859-1/C/en_US.ISO8859-1/en_CA.UTF-8"
> Sys.getlocale()
[1] "en_US.ISO8859-1/en_US.ISO8859-1/en_US.ISO8859-1/C/en_US.ISO8859-1/en_CA.UTF-8"

> df <- read_excel(test_sheet("vietnamese-characters.xlsx"))
Mô tả
  str.size() = 8
  M: 4d
  �: c3
  �: b4
   : 20
  t: 74
  �: e1
  �: ba
  �: a3
BÙN SÉT
  str.size() = 9
  B: 42
  �: c3
  �: 99
  N: 4e
   : 20
  S: 53
  �: c3
  �: 89
  T: 54
SÉT lẫn sạn laterite
  str.size() = 25
  S: 53
  �: c3
  �: 89
  T: 54
   : 20
  l: 6c
  �: e1
  �: ba
  �: ab
  n: 6e
   : 20
  s: 73
  �: e1
  �: ba
  �: a1
  n: 6e
   : 20
  l: 6c
  a: 61
  t: 74
  e: 65
  r: 72
  i: 69
  t: 74
  e: 65
CÁT THÔ lẫn sạn thạch anh
  str.size() = 33
  C: 43
  �: c3
  �: 81
  T: 54
   : 20
  T: 54
  H: 48
  �: c3
  �: 94
   : 20
  l: 6c
  �: e1
  �: ba
  �: ab
  n: 6e
   : 20
  s: 73
  �: e1
  �: ba
  �: a1
  n: 6e
   : 20
  t: 74
  h: 68
  �: e1
  �: ba
  �: a1
  c: 63
  h: 68
   : 20
  a: 61
  n: 6e
  h: 68
CÁT MỊN
  str.size() = 10
  C: 43
  �: c3
  �: 81
  T: 54
   : 20
  M: 4d
  �: e1
  �: bb
  �: 8a
  N: 4e
CÁT TRUNG
  str.size() = 10
  C: 43
  �: c3
  �: 81
  T: 54
   : 20
  T: 54
  R: 52
  U: 55
  N: 4e
  G: 47
SÉT lẫn kết vón silic
  str.size() = 27
  S: 53
  �: c3
  �: 89
  T: 54
   : 20
  l: 6c
  �: e1
  �: ba
  �: ab
  n: 6e
   : 20
  k: 6b
  �: e1
  �: ba
  �: bf
  t: 74
   : 20
  v: 76
  �: c3
  �: b3
  n: 6e
   : 20
  s: 73
  i: 69
  l: 6c
  i: 69
  c: 63

> df
# A tibble: 6 � 1
                                     `M� t<U+1EA3>`
                                                 <chr>
1                                        B�N S�T
2                S�T l<U+1EAB>n s<U+1EA1>n laterite
3 C�T TH� l<U+1EAB>n s<U+1EA1>n th<U+1EA1>ch anh
4                                    C�T M<U+1ECA>N
5                                         C�T TRUNG
6            S�T l<U+1EAB>n k<U+1EBF>t v�n silica

## these are the correct bytes for all of these characters
> (z <- names(df)[1])
[1] "M� t<U+1EA3>"

> charToRaw(z)
[1] 4d c3 b4 20 74 e1 ba a3

> (z <- df[[1]][1])
[1] "B�N S�T"

> charToRaw(z)
[1] 42 c3 99 4e 20 53 c3 89 54

leminhson · 2017-03-24T00:36:41Z

@hadley Yes. If I type directly in R command window, Vietnamese characters are displayed correctly

x <- 'BÙN SÉT'
x
[1] "BÙN SÉT"

jennybc · 2017-03-24T00:40:14Z

@leminhson What does Sys.getlocale() report? Did you try @lionel-'s suggestion to change your locale?

Sys.setlocale("LC_CTYPE", "English_United States.1258")

leminhson · 2017-03-24T00:45:41Z

@lionel- Sys.setlocale("LC_CTYPE", "English_United States.1258") could not solve this problem.

@jennybc Sys.getlocale( ) answers:
[1] "LC_COLLATE=English_United States.1258;LC_CTYPE=English_United States.1258;LC_MONETARY=English_United States.1258;LC_NUMERIC=C;LC_TIME=English_United States.1258"

lionel- · 2017-03-24T00:56:19Z

After switching the locale to 1258, what do you get when you do this on your sample data frame:

enc2native(df[[1]])

jennybc · 2017-03-24T00:57:53Z

@leminhson

Your current default locale

Note I asked to see output of Sys.getlocale() for your default locale: "get" not "set".

In your default locale, where the string below prints correctly, what encoding is reported? I.e. run this

Sys.getlocale()
#> [1] "en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8"
x <- 'BÙN SÉT'
x
#> [1] "BÙN SÉT"
Encoding(x)
#> [1] "UTF-8"

Re: changing your locale:

You may not be able to change the locale in a running RStudio session on Windows, so let's not give up on @lionel- 's suggestion just yet. You might need to put Sys.setlocale("LC_CTYPE", "English_United States.1258") in a startup file such as ~/.Rprofile and restart. Then run Sys.getlocale() to confirm the change took and try readxl::read_excel("samples.xlsx") again.

Also, will you run these tests (reading your example sheet and the above) in R in the Console, i.e. not in RStudio, just to make sure that has nothing to do with it?

lionel- · 2017-03-24T01:05:13Z

might be relevant: https://stat.ethz.ch/pipermail/r-devel/2015-May/071250.html

leminhson · 2017-03-24T01:13:29Z

@jennybc Here is the result from your instruction:

Sys.getlocale()
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
x <- 'BÙN SÉT'
x
[1] "BÙN SÉT"
Encoding(x)
[1] "latin1"

jennybc · 2017-03-24T01:13:36Z

@lionel- Yeah, that is why I included as.matrix(df) in my investigation far above, but it's true we've never had @leminhson do same. And clearly I'm not really emulating this user's problem very well.

jennybc · 2017-03-24T01:19:48Z

Also relevant 😐: Improve UTF-8 support on Windows, RConsortium/wishlist#2 by @kevinushey

lionel- · 2017-03-24T01:20:01Z

so @leminhson, does df[[1]] display any better than df?

If not, what happens when you do this:

Sys.setlocale("LC_CTYPE", "English_United States.1258")

df[[1]]
enc2native(df[[1]])

leminhson · 2017-03-24T01:52:14Z

@jennybc Here is the result when running from R console (not in RStudio):

Sys.getlocale()
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1258;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

library(readxl)

x <- read_excel("D:/Samples.xlsx")

x
Mô t<U+1EA3>
1 BÙN SÉT
2 SÉT l<U+1EAB>n s<U+1EA1>n laterite
3 CÁT THÔ l<U+1EAB>n s<U+1EA1>n th<U+1EA1>ch anh
4 CÁT M<U+1ECA>N
5 CÁT TRUNG
6 SÉT l<U+1EAB>n k<U+1EBF>t vón silic

These Unicode character codes are correct. However the characters are not displayed as they are but they show the codes only.

hadley · 2017-03-24T12:48:50Z

@leminhson are you running the latest GitHub version of readxl?

leminhson · 2017-03-24T13:28:19Z

@hadley Yes. The version of readxl is 0.1.1

lionel- · 2017-03-24T13:29:02Z

@leminhson did you miss my comment? #125 (comment)

hadley · 2017-03-24T13:36:15Z

@leminhson that's the current CRAN version, not the current github version. Please run install_github("tidyverse/readxl")

leminhson · 2017-03-24T14:22:41Z

@hadley The current github version of readxl is 0.1.1.9000
@lionel- The result of enc2native(df[[1]]) after changing the locale English United States.1258 is not fruitful:

enc2native(df[[1]])
[1] "BÙN SÉT"
[2] "SÉT l<U+1EAB>n s<U+1EA1>n laterite"
[3] "CÁT THÔ l<U+1EAB>n s<U+1EA1>n th<U+1EA1>ch anh"
[4] "CÁT M<U+1ECA>N"
[5] "CÁT TRUNG"
[6] "SÉT l<U+1EAB>n k<U+1EBF>t vón silic"

However, if I follow the steps from @jennybc without using reprex package, the result in R console is perfect. But if we view a dataframe df, the result is Unicode character codes only.

library(readxl)
df <- read_excel("D:/samples.xlsx")
Encoding(df[[1]])
[1] "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8"

as.matrix(df)
Mô tả
[1,] "BÙN SÉT"
[2,] "SÉT lẫn sạn laterite"
[3,] "CÁT THÔ lẫn sạn thạch anh"
[4,] "CÁT MỊN"
[5,] "CÁT TRUNG"
[6,] "SÉT lẫn kết vón silic"

View(df)
1 BÙN SÉT
2 SÉT l<U+1EAB>n s<U+1EA1>n laterite
3 CÁT THÔ l<U+1EAB>n s<U+1EA1>n th<U+1EA1>ch anh
4 CÁT M<U+1ECA>N
5 CÁT TRUNG
6 SÉT l<U+1EAB>n k<U+1EBF>t vón silic

hadley · 2017-03-24T14:42:36Z

Ok, I'm happy that the problem is not on the readxl end, but instead lies somewhere else.

leminhson · 2017-03-24T15:00:14Z

@hadley Yes. But we do not know which part is the cause yet. If I type directly Vietnamese characters in R or in RStudio, the result is always correct. That means the problem is not by the setting of locale, not by the displaying function...

hadley · 2017-03-24T15:09:08Z

Collectively, we've now spent a lot of time on this issue. It is getting very close to the point where I don't think we can afford to spend more. It's a bummer that we might not be able to fully resolve your issue, but we don't have unlimited resources and this problem is clearly only affecting a very small number of people, and it's highly likely that readxl is already doing all that it can.

lionel- · 2017-03-24T15:15:10Z

And it now seems clear that it's the data frame printing bug in R, I don't think this is related to readxl.

@leminhson df[[1]] prints just as well as as.matrix(df) right?

kevinushey · 2017-03-24T16:16:46Z

The ultimate problem is likely just that R's print() method on data.frames tries to round-trip characters through the active encoding, which is obviously lossy when converting UTF-8-encoded characters.

This implies that, if you have UTF-8 characters that are not representable in the current locale, you're hosed.

Inspired by investigations re: #125

jennybc · 2017-03-24T16:24:11Z

I'm going to close this. We've established it's not a readxl-specific issue, but an example of general printing difficulty with Windows + R data frames + Unicode characters. Thanks for all the help everyone! This thread will still be a useful reference going forward.

jennybc · 2017-03-24T16:37:43Z

@leminhson I might also add: as we've said, these strings are being read and stored just fine, this is "only" a printing problem. So if you can tolerate the ugly, you can work with the data frame as it is. But if you really want nice printing, you might explicitly convert these strings from UTF-8 to Latin-1 by using iconv() on the affected variables. But then you will have lost the UTF-8 encoding, which is superior in the long run.

cjens · 2018-08-14T06:35:19Z

The first suggestion in the very top - just type in console: Sys.setlocale("LC_ALL", 'en_US.UTF-8')
it immediately solved all my issues with reading in Polish language text strings in R!

leminhson · 2018-08-14T08:56:35Z

Now the package readxl can read Vietnamese characters without any error. I do not know what is the reason: due to new version of R (3.5.0) or new version of readxl (1.1.0) ???

cjens · 2018-08-14T09:14:41Z

I had all updated R and packages but still needed to apply the fix to be able to see the name of cities in Poland in proper text and not some strange codes that did not make sense..

…

On 14 Aug 2018, at 10.56, leminhson ***@***.***> wrote: Now the package readxl can read Vietnamese characters without any error. I do not know what is the reason: due to new version of R (3.5.0) or new version of readxl (1.1.0) ??? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#125 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALD9ErbXlEJvKYjd2k8Z0DFQCegdyGA-ks5uQpDPgaJpZM4F27wm>.

bpbraun · 2019-08-13T18:32:00Z

I had the same issue, years later. I just saved the excel file as a csv, didn't have any problems after that.

leminhson · 2019-08-14T12:37:59Z

Thank you for your solution.On 14 Aug 2019 01:32, Benjamin Braun <notifications@github.com> wrote:I had the same issue, years later. I just saved the excel file as a csv, didn't have any problems after that. —You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.

KaticaR · 2019-09-23T05:28:28Z

`# set local encoding for Serbian language
Sys.getlocale("LC_ALL")
Sys.setlocale(locale = 'Serbian (Cyrillic)')'

'# load the packages
library("readxl")
library("dplyr")'

'# load the dataset
dositej <- read_excel("slobodna_radna_mesta-17.05.2019.xls")'

'# look at the dataset
head(dositej)'

'# Take only the columns needed
dositej <- dositej[,c(1,2,3,8,10,16)]'

'#See what we got
View(dositej)'

'# Get a library for translating Cyrillic to Latin
library(stringi)'

'# take all the string data from dositej
i <- 1:5'
'# and translate it to Latin
dositej[ , i] <- apply(dositej[ , i], 2, function(x) stri_trans_general(x, "Serbian-Latin/BGN"))'
'# Now view it again to see what come out :)
Sys.setlocale(locale = 'Serbian (Latin)')
View(dositej)'

'# Calculate how much are there math free norm in Novi Sad
math_norma <- dositej %>%
filter(opstina == "Novi Sad", predmet == "Matematika") %>%
select(norma_slobodno) %>%
sum()`

Note:
The data in Cyrillic or in Latin looks ok when View ,
but trying to filter the data in Base R got me NA's,
for all data types in data frame. Using dplyr solved the problem,
so math_norma at the end is a number, not NA.

Thanks R for solving the issues! :)

leminhson closed this as completed Dec 19, 2015

leminhson reopened this Dec 19, 2015

jennybc added this to TODO in jennybc Feb 26, 2017

hadley mentioned this issue Mar 9, 2017

Hang while reading a xls #113

Closed

jennybc added the bug an unexpected problem or unintended behavior label Mar 21, 2017

jennybc mentioned this issue Mar 24, 2017

Add more tests re: encoding #297

Merged

jennybc added a commit that referenced this issue Mar 24, 2017

Add more tests re: encoding (#297)

b666c96

Inspired by investigations re: #125

jennybc closed this as completed Mar 24, 2017

jennybc moved this from TODO to Done in jennybc Mar 29, 2017

LienReyserhove mentioned this issue Oct 25, 2017

Importing and exporting to UTF-8 trias-project/alien-plants-belgium#41

Open

ghost mentioned this issue Jan 15, 2018

Encoding issue after using ms_simplify() ateucher/rmapshaper#67

Closed

nacnudus mentioned this issue Nov 28, 2020

Encoding issue when using cell references on Windows 10 nacnudus/tidyxl#64

Closed

Error in reading Unicode text from Excel file to R #125

Error in reading Unicode text from Excel file to R #125

Comments

leminhson commented Sep 3, 2015

nortonle commented Dec 18, 2015

leminhson commented Dec 19, 2015

nortonle commented Dec 19, 2015

leminhson commented Dec 19, 2015

adpgithub commented Dec 29, 2015

kyotin commented Jun 27, 2016

jennybc commented Feb 5, 2017

How to provide a readxl reprex

leminhson commented Feb 25, 2017

hadley commented Mar 9, 2017

leminhson commented Mar 10, 2017 • edited

hadley commented Mar 10, 2017

jennybc commented Mar 21, 2017

hadley commented Mar 22, 2017

hadley commented Mar 23, 2017

lionel- commented Mar 23, 2017

lionel- commented Mar 23, 2017

jennybc commented Mar 23, 2017 • edited

leminhson commented Mar 24, 2017 • edited

jennybc commented Mar 24, 2017

leminhson commented Mar 24, 2017 • edited

lionel- commented Mar 24, 2017

jennybc commented Mar 24, 2017 • edited

Your current default locale

Re: changing your locale:

lionel- commented Mar 24, 2017

leminhson commented Mar 24, 2017 • edited

jennybc commented Mar 24, 2017 • edited

jennybc commented Mar 24, 2017

lionel- commented Mar 24, 2017 • edited

leminhson commented Mar 24, 2017 • edited

hadley commented Mar 24, 2017

leminhson commented Mar 24, 2017

lionel- commented Mar 24, 2017

hadley commented Mar 24, 2017

leminhson commented Mar 24, 2017 • edited

hadley commented Mar 24, 2017

leminhson commented Mar 24, 2017

hadley commented Mar 24, 2017

lionel- commented Mar 24, 2017

kevinushey commented Mar 24, 2017

jennybc commented Mar 24, 2017

jennybc commented Mar 24, 2017

cjens commented Aug 14, 2018

leminhson commented Aug 14, 2018

cjens commented Aug 14, 2018 via email

bpbraun commented Aug 13, 2019

leminhson commented Aug 14, 2019 via email

KaticaR commented Sep 23, 2019 • edited

leminhson commented Mar 10, 2017 •

edited

jennybc commented Mar 23, 2017 •

edited

leminhson commented Mar 24, 2017 •

edited

leminhson commented Mar 24, 2017 •

edited

jennybc commented Mar 24, 2017 •

edited

leminhson commented Mar 24, 2017 •

edited

jennybc commented Mar 24, 2017 •

edited

lionel- commented Mar 24, 2017 •

edited

leminhson commented Mar 24, 2017 •

edited

leminhson commented Mar 24, 2017 •

edited

KaticaR commented Sep 23, 2019 •

edited