Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding problems: reading Hebrew characters with `read_sheet` #26

Closed
adisarid opened this issue May 31, 2019 · 5 comments · Fixed by r-lib/gargle#114
Milestone

Comments

@adisarid
Copy link

@adisarid adisarid commented May 31, 2019

read_sheet fails when trying to read a google sheet with non standard characters (e.g., Hebrew text).

I'm comparing it to read_csv on a "web published" version of the same sheet, which behaves nicely.

The googlesheet4 version:

bad_encoding <- read_sheet(ss = "1ia41UMZVqOeTzjpElJmx9BpC1aEcYmK_8MLdJn-Jka0")
bad_encoding

comes up with this gibberish

> bad_encoding
# A tibble: 7 x 1
  hebrew_text                     
  <chr>                           
1 "׳©׳\u009c׳•׳\u009d"            
2 "׳\u009c׳›׳•׳\u009c׳\u009d"     
3 "׳”׳\u0090׳\u009d"              
4 "׳ ׳™׳×׳\u009f"                 
5 "׳\u009c׳§׳¨׳•׳\u0090 ׳\u0090׳×"
6 ׳”׳˜׳§׳¡׳˜                      
7 ׳”׳–׳”?  

were the web published csv version is read like this:

good_encoding <- read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vRhhpSeSY34D5-3bYZz64M_Z5FNaT_xS_QoVh_ehRvw7vT_ONPuI87TajPRHhfHBzxb_2u5mod-ysMQ/pub?gid=0&single=true&output=csv")
good_encoding

and provides the right version:

> good_encoding
# A tibble: 7 x 1
  hebrew_text
  <chr>      
1 שלום       
2 לכולם      
3 האם        
4 ניתן       
5 לקרוא את   
6 הטקסט      
7 הזה? 

Any suggestions about how to fix this?

@abubelinha

This comment has been minimized.

Copy link

@abubelinha abubelinha commented Jul 18, 2019

Exactly the same thing is happening to me, now with Spanish accents.

library(googledrive)

myfileid <- "a-googlesheet-file-id"
mysheetname <- "a-sheet-in-that-file"

# download as an excel file and read it:
library(readxl)
myxls <- drive_download(as_id(myfileid), overwrite = T)
path <- as.character(myxls[1,"local_path"])
myxlsdata <- read_excel(path, sheet = mysheetname)
as.character(myxlsdata[10,9])

# read the google sheet directly:
library(googlesheets4)
mygs <- sheets_get(myfileid)
mygsdata <- sheets_read(myfileid, sheet = mysheetname)
as.character(mygsdata[10,9])

The first output is correct, but the second is not:

[1] "España cañí y olé, puturrú defuá"

[1] "España cañí y olé, puturrú defuá"

Great library, but useless until this basic thing gets fixed

@jonthegeek

This comment has been minimized.

Copy link

@jonthegeek jonthegeek commented Oct 12, 2019

Someone submitted curly apostrophes in a form response to me, so I'm seeing similar issues.
googlesheets4::sheets_read("1TeKrcSlsWJlPQlZegZGJgFFubRsjXeiVlShNZ6oqmCY")

This is only bad on my Windows machine. It's fine on Rstudio Server on CentOS 7.

Windows result:

# A tibble: 1 x 1
  bad                        
  <chr>                      
1 I’ve made a huge mistake.

CentOS 7 result:

# A tibble: 1 x 1
  bad                      
  <chr>                    
1 I’ve made a huge mistake.
@jennybc

This comment has been minimized.

Copy link
Member

@jennybc jennybc commented Oct 17, 2019

This has to be fixed in gargle. I could replicate all of this on my Windows VM and the gargle fix resolves it for me.

It would be great to hear confirmation from others here.

Note that, until that version of gargle goes to CRAN, you'll need to install the dev version from GitHub: devtools::install_github("r-lib/gargle"). I'll release gargle soon-ish. But perhaps not before the initial googlesheets4 release, because you can only update with a certain frequency and I wouldn't be surprised if increasing usage of gargle, via all of these packages, uncovers some more gargle TODO's in the next few weeks.

@adisarid

This comment has been minimized.

Copy link
Author

@adisarid adisarid commented Oct 18, 2019

@jennybc Works like a charm in the gargle development version. Thanks!

@francisbarton

This comment has been minimized.

Copy link

@francisbarton francisbarton commented Nov 14, 2019

I've updated gargle to dev gargle_0.4.0.9002 but I'm still getting these encoding problems with googlesheets4. Pretty sure Google Sheets and RStudio are both operating with UTF-8 as default encoding so it's strange.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.