New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_excel on Windows fails to open a file from a UTF-8 encoded path containing special characters #370

Closed
mbeer opened this Issue Jun 22, 2017 · 5 comments

Comments

Projects
None yet
3 participants
@mbeer
Copy link

mbeer commented Jun 22, 2017

Using readxl 1.0.0 in R 3.4.0 for Windows, I recently came across an unexpected error when trying to open an Excel file containing a non-ASCII character in the file name. Depending on the encoding of the path variable, the import either works or fails:

> library(readxl)

> # Copy an example file to a path including a special character
> my.path <- "Tür.xlsx"

> file.copy(system.file("extdata", "datasets.xlsx", package="readxl"), my.path, overwrite = TRUE)
[1] TRUE

> # Try to open file with path in native encoding
> read_excel(enc2native(my.path))
# A tibble: 150 x 5
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl>   <chr>
 1          5.1         3.5          1.4         0.2  setosa
 2          4.9         3.0          1.4         0.2  setosa
 3          4.7         3.2          1.3         0.2  setosa
 4          4.6         3.1          1.5         0.2  setosa
 5          5.0         3.6          1.4         0.2  setosa
 6          5.4         3.9          1.7         0.4  setosa
 7          4.6         3.4          1.4         0.3  setosa
 8          5.0         3.4          1.5         0.2  setosa
 9          4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
# ... with 140 more rows

> # Try to open file with path in UTF-8 encoding
> read_excel(enc2utf8(my.path))
Error in read_fun(path = path, sheet = sheet, limits = limits, shim = shim,  : 
  Evaluation error: zip file 'Tür.xlsx' cannot be opened.
@daniel-allington

This comment has been minimized.

Copy link

daniel-allington commented Feb 13, 2018

I appear to be experiencing the same issue. However, using enc2native also leads to errors:

fp <- "副本 (2).xlsx"

d <- read_excel(fp)

Error in read_fun(path = path, sheet = sheet, limits = limits, shim = shim, : Evaluation error: zip file '副本 (2).xlsx' cannot be opened.

d <- read_excel(enc2native(fp))

Error in read_fun(path = path, sheet = sheet, limits = limits, shim = shim, : Evaluation error: zip file '<U+526F><U+672C> (2).xlsx' cannot be opened.
@jennybc

This comment has been minimized.

Copy link
Member

jennybc commented May 9, 2018

@daniel-allington The result you get after using enc2native() (zip file '<U+526F><U+672C> (2).xlsx' cannot be opened) suggests that this path cannot be represented in your native encoding. I don't know how to solve that, other than ... renaming the file? Somehow capturing the filename from your system? How do you get into this situation exactly? I find it hard to wrap my head around, because it seems like once the file is on your system, there must be some way to successfully represent its name ... like, by definition.

Related R bug: https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17120 (don't be discouraged by the title -- there is Windows content there).

We wil soon make some fix re: path encoding (see #477) that enacts something like what @mbeer is doing and will hopefully solve the problem in many (but not all?) real world situations.

@daniel-allington

This comment has been minimized.

Copy link

daniel-allington commented May 9, 2018

@jennybc

This comment has been minimized.

Copy link
Member

jennybc commented May 9, 2018

Next time this happens, it would be interesting for you to capture the file name from R, through list.files() or the like. Surely R can see the file ... what does it think the name is? Then use that. I agree this isn't a real solution but I don't know what else to do with an OS like Windows that "supports" such filenames in a weird and half-assed way.

@daniel-allington

This comment has been minimized.

Copy link

daniel-allington commented May 9, 2018

@jennybc jennybc closed this in #477 May 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment