Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding problems: reading Hebrew characters with `read_sheet` #26

adisarid opened this issue May 31, 2019 · 5 comments · Fixed by r-lib/gargle#114


Copy link

@adisarid adisarid commented May 31, 2019

read_sheet fails when trying to read a google sheet with non standard characters (e.g., Hebrew text).

I'm comparing it to read_csv on a "web published" version of the same sheet, which behaves nicely.

The googlesheet4 version:

bad_encoding <- read_sheet(ss = "1ia41UMZVqOeTzjpElJmx9BpC1aEcYmK_8MLdJn-Jka0")

comes up with this gibberish

> bad_encoding
# A tibble: 7 x 1
1 "׳©׳\u009c׳•׳\u009d"            
2 "׳\u009c׳›׳•׳\u009c׳\u009d"     
3 "׳”׳\u0090׳\u009d"              
4 "׳ ׳™׳×׳\u009f"                 
5 "׳\u009c׳§׳¨׳•׳\u0090 ׳\u0090׳×"
6 ׳”׳˜׳§׳¡׳˜                      
7 ׳”׳–׳”?  

were the web published csv version is read like this:

good_encoding <- read_csv("")

and provides the right version:

> good_encoding
# A tibble: 7 x 1
1 שלום       
2 לכולם      
3 האם        
4 ניתן       
5 לקרוא את   
6 הטקסט      
7 הזה? 

Any suggestions about how to fix this?


This comment has been minimized.

Copy link

@abubelinha abubelinha commented Jul 18, 2019

Exactly the same thing is happening to me, now with Spanish accents.


myfileid <- "a-googlesheet-file-id"
mysheetname <- "a-sheet-in-that-file"

# download as an excel file and read it:
myxls <- drive_download(as_id(myfileid), overwrite = T)
path <- as.character(myxls[1,"local_path"])
myxlsdata <- read_excel(path, sheet = mysheetname)

# read the google sheet directly:
mygs <- sheets_get(myfileid)
mygsdata <- sheets_read(myfileid, sheet = mysheetname)

The first output is correct, but the second is not:

[1] "España cañí y olé, puturrú defuá"

[1] "España cañí y olé, puturrú defuá"

Great library, but useless until this basic thing gets fixed


This comment has been minimized.

Copy link

@jonthegeek jonthegeek commented Oct 12, 2019

Someone submitted curly apostrophes in a form response to me, so I'm seeing similar issues.

This is only bad on my Windows machine. It's fine on Rstudio Server on CentOS 7.

Windows result:

# A tibble: 1 x 1
1 I’ve made a huge mistake.

CentOS 7 result:

# A tibble: 1 x 1
1 I’ve made a huge mistake.

This comment has been minimized.

Copy link

@jennybc jennybc commented Oct 17, 2019

This has to be fixed in gargle. I could replicate all of this on my Windows VM and the gargle fix resolves it for me.

It would be great to hear confirmation from others here.

Note that, until that version of gargle goes to CRAN, you'll need to install the dev version from GitHub: devtools::install_github("r-lib/gargle"). I'll release gargle soon-ish. But perhaps not before the initial googlesheets4 release, because you can only update with a certain frequency and I wouldn't be surprised if increasing usage of gargle, via all of these packages, uncovers some more gargle TODO's in the next few weeks.


This comment has been minimized.

Copy link

@adisarid adisarid commented Oct 18, 2019

@jennybc Works like a charm in the gargle development version. Thanks!


This comment has been minimized.

Copy link

@francisbarton francisbarton commented Nov 14, 2019

I've updated gargle to dev gargle_0.4.0.9002 but I'm still getting these encoding problems with googlesheets4. Pretty sure Google Sheets and RStudio are both operating with UTF-8 as default encoding so it's strange.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
5 participants
You can’t perform that action at this time.