Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding problems: reading Hebrew characters with `read_sheet` #26

Closed
adisarid opened this issue May 31, 2019 · 9 comments
Closed

Encoding problems: reading Hebrew characters with `read_sheet` #26

adisarid opened this issue May 31, 2019 · 9 comments
Milestone

Comments

@adisarid
Copy link

@adisarid adisarid commented May 31, 2019

read_sheet fails when trying to read a google sheet with non standard characters (e.g., Hebrew text).

I'm comparing it to read_csv on a "web published" version of the same sheet, which behaves nicely.

The googlesheet4 version:

bad_encoding <- read_sheet(ss = "1ia41UMZVqOeTzjpElJmx9BpC1aEcYmK_8MLdJn-Jka0")
bad_encoding

comes up with this gibberish

> bad_encoding
# A tibble: 7 x 1
  hebrew_text                     
  <chr>                           
1 "׳©׳\u009c׳•׳\u009d"            
2 "׳\u009c׳›׳•׳\u009c׳\u009d"     
3 "׳”׳\u0090׳\u009d"              
4 "׳ ׳™׳×׳\u009f"                 
5 "׳\u009c׳§׳¨׳•׳\u0090 ׳\u0090׳×"
6 ׳”׳˜׳§׳¡׳˜                      
7 ׳”׳–׳”?  

were the web published csv version is read like this:

good_encoding <- read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vRhhpSeSY34D5-3bYZz64M_Z5FNaT_xS_QoVh_ehRvw7vT_ONPuI87TajPRHhfHBzxb_2u5mod-ysMQ/pub?gid=0&single=true&output=csv")
good_encoding

and provides the right version:

> good_encoding
# A tibble: 7 x 1
  hebrew_text
  <chr>      
1 שלום       
2 לכולם      
3 האם        
4 ניתן       
5 לקרוא את   
6 הטקסט      
7 הזה? 

Any suggestions about how to fix this?

@abubelinha
Copy link

@abubelinha abubelinha commented Jul 18, 2019

Exactly the same thing is happening to me, now with Spanish accents.

library(googledrive)

myfileid <- "a-googlesheet-file-id"
mysheetname <- "a-sheet-in-that-file"

# download as an excel file and read it:
library(readxl)
myxls <- drive_download(as_id(myfileid), overwrite = T)
path <- as.character(myxls[1,"local_path"])
myxlsdata <- read_excel(path, sheet = mysheetname)
as.character(myxlsdata[10,9])

# read the google sheet directly:
library(googlesheets4)
mygs <- sheets_get(myfileid)
mygsdata <- sheets_read(myfileid, sheet = mysheetname)
as.character(mygsdata[10,9])

The first output is correct, but the second is not:

[1] "España cañí y olé, puturrú defuá"

[1] "España cañí y olé, puturrú defuá"

Great library, but useless until this basic thing gets fixed

@jonthegeek
Copy link

@jonthegeek jonthegeek commented Oct 12, 2019

Someone submitted curly apostrophes in a form response to me, so I'm seeing similar issues.
googlesheets4::sheets_read("1TeKrcSlsWJlPQlZegZGJgFFubRsjXeiVlShNZ6oqmCY")

This is only bad on my Windows machine. It's fine on Rstudio Server on CentOS 7.

Windows result:

# A tibble: 1 x 1
  bad                        
  <chr>                      
1 I’ve made a huge mistake.

CentOS 7 result:

# A tibble: 1 x 1
  bad                      
  <chr>                    
1 I’ve made a huge mistake.
@jennybc
Copy link
Member

@jennybc jennybc commented Oct 17, 2019

This has to be fixed in gargle. I could replicate all of this on my Windows VM and the gargle fix resolves it for me.

It would be great to hear confirmation from others here.

Note that, until that version of gargle goes to CRAN, you'll need to install the dev version from GitHub: devtools::install_github("r-lib/gargle"). I'll release gargle soon-ish. But perhaps not before the initial googlesheets4 release, because you can only update with a certain frequency and I wouldn't be surprised if increasing usage of gargle, via all of these packages, uncovers some more gargle TODO's in the next few weeks.

@adisarid
Copy link
Author

@adisarid adisarid commented Oct 18, 2019

@jennybc Works like a charm in the gargle development version. Thanks!

@francisbarton
Copy link

@francisbarton francisbarton commented Nov 14, 2019

I've updated gargle to dev gargle_0.4.0.9002 but I'm still getting these encoding problems with googlesheets4. Pretty sure Google Sheets and RStudio are both operating with UTF-8 as default encoding so it's strange.

@silviaegt
Copy link

@silviaegt silviaegt commented Jan 15, 2020

@adisarid would you mind sharing your code? I updated gargle (like @francisbarton did) but I keep getting the same encoding problems

devtools::install_github("r-lib/gargle")
library(gargle)

I'm also posting my sessionInfo() in case it's useful

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)
Matrix products: default
Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 
 locale:
[1] LC_COLLATE=Spanish_Mexico.1252  LC_CTYPE=Spanish_Mexico.1252    LC_MONETARY=Spanish_Mexico.1252
[4] LC_NUMERIC=C                    LC_TIME=Spanish_Mexico.1252    
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
[1] gargle_0.4.0        googlesheets4_0.1.0 googledrive_1.0.0 
@jennybc
Copy link
Member

@jennybc jennybc commented Jan 15, 2020

@silviaegt Did you restart R? Above, I am seeing the released version of gargle, not dev.

@silviaegt
Copy link

@silviaegt silviaegt commented Jan 15, 2020

Yaaay, that was what was missing, also I guess I wasn't suppose to "library(gargle)". Thank you so much @jennybc!

@jennybc
Copy link
Member

@jennybc jennybc commented Jan 15, 2020

Great!

Yes, it is also true that you do not need to library(gargle). gargle is used internally by googlesheets4 (and it's where the encoding fix was needed) but in general users shouldn't need to work with it directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

6 participants
You can’t perform that action at this time.