New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with reading xls file on Centos #189

Closed
KarinaMarks opened this Issue Jul 6, 2016 · 9 comments

Comments

Projects
None yet
4 participants
@KarinaMarks

KarinaMarks commented Jul 6, 2016

Hello,

I have noticed a problem when using read_excel for xls files on Centos OS (using readxl 0.1.1).
The column names, columns of character objects and blank columns are not read in.

An example of this is when running your test test-missing-values.R, and all the tests for the xls files fail on my OS. Using the read_excel function for the file missing-values.xls, it returns;

     structure(c("NA", "8", "8"), class = "AsIs")
1 NA                                           NA
2  1                                            8
3  1                                            8

Also for the file empty-named-column.xls, it returns;

     structure(c(" 2", "NA", "NA"), class = "AsIs")
1 1                                              2
2 2                                             NA
3 3                                             NA

Similar behaviour is shown for the files utf8-sheets.xls and blanks.xls.

However, for the files dates-1900.xls and dates-1904.xls, the function works correctly, but I believe this is because it has no column names, blank columns or character objects.

This issue is similar to that mentioned in #128 and #180.

> sessionInfo()
R version 3.2.5 (2016-04-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.3 (Final)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] validation.readxl_0.1.1   validationTools_0.2.0-166 png_0.1-7                
 [4] knitr_1.12.3              XML_3.98-1.4              RUnit_0.4.31             
 [7] roxygen2_5.0.1            devtools_1.10.0           readxl_0.1.1             
[10] testthat_0.11.0          

loaded via a namespace (and not attached):
[1] Rcpp_0.12.4   digest_0.6.9  crayon_1.3.1  withr_1.0.1   magrittr_1.5  stringi_1.0-1 stringr_1.0.0
[8] memoise_1.0.0
@scientificbruno

This comment has been minimized.

scientificbruno commented Jul 7, 2016

This sounds like an issue I ran into today as well. I was testing a script on my local OS X workstation (OS X 10.10.5) and it was working fine. I started running into issues when I tried testing the same script on a Linux machine (CentOS 6.6). Similar to @KarinaMarks, columns names and character columns aren't included in the data frame returned by read_excel().

Below is the session info for the Linux machine on which I'm having issues.

> devtools::session_info()
Session info -------------------------------------------------------------------
 setting  value
 version  R version 3.3.0 (2016-05-03)
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  en_CA.UTF-8
 tz       <NA>
 date     2016-07-06

Packages -----------------------------------------------------------------------
 package  * version date       source
 devtools   1.12.0  2016-06-24 CRAN (R 3.3.0)
 digest     0.6.9   2016-01-08 CRAN (R 3.2.3)
 memoise    1.0.0   2016-01-29 CRAN (R 3.2.3)
 Rcpp       0.12.5  2016-05-14 CRAN (R 3.3.0)
 readxl   * 0.1.1   2016-03-28 CRAN (R 3.3.0)
 withr      1.0.2   2016-06-20 CRAN (R 3.3.0)
@gergness

This comment has been minimized.

Contributor

gergness commented Nov 1, 2016

I was also hit by this. There are already failing tests, but it must be something specific to CentOS. See:
https://builder.r-hub.io/status/readxl_0.1.1.9000.tar.gz-8d95452225cca69abdf3ed77ac4f8f0a

gergness added a commit to gergness/readxl that referenced this issue Nov 7, 2016

Assume all linux flavors have asprintf
Fixes tidyverse#189 and doesn't seem to have negative impacts on other platforms (I used r-hub to test Debian, Fedora, Ubuntu, CentOS and Windows).

It seems like we might be able to delete the if statement altogether, but this change seemed less likely to cause problems.

@jennybc jennybc added the xls 👵 label Jan 31, 2017

@jennybc

This comment has been minimized.

Member

jennybc commented Feb 4, 2017

I believe recent commits by @jeroenooms have fixed this. Can any of you install current dev version and confirm?

@scientificbruno

This comment has been minimized.

scientificbruno commented Feb 6, 2017

The latest version on master seems to work with the XLS file I was having problems with. Thanks, everyone! 🎉

@jennybc jennybc closed this Feb 17, 2017

@scientificbruno

This comment has been minimized.

scientificbruno commented Feb 22, 2017

@jennybc: Just an FYI, I'm running into an issue again with the latest commit on master (6c76a9b) when I try to load in the Excel spreadsheet. I get the output posted below. I can't be sure if it's exactly the same issue. For now, I'm using a past commit (from when I said it was working, i.e. 3338f92). If you need me to narrow down the commit where the issue arose or create a new GitHub issue, let me know.

> readxl::read_excel(my_path, my_sheet)
Error: Variables must be length 1 or 153.
Problem variables: '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''
@gergness

This comment has been minimized.

Contributor

gergness commented Feb 23, 2017

I think this is a different problem - I just checked on my machine that was affected by the original issue and all tests pass.

Can you share a spreadsheet that gives that error message?

@jennybc

This comment has been minimized.

Member

jennybc commented Feb 24, 2017

I posted, unsuccessfully I suppose, that this is different and, as @gergness said, should go in a new issue with a reprex. However I know many things are currently broken for xls (is that what my_path is here @brunogrande?) and am working on a PR that will un-break lots of things. So I suspect this will self-resolve shortly.

@scientificbruno

This comment has been minimized.

scientificbruno commented Feb 27, 2017

Unfortunately, I cannot share the spreadsheet without modifying it because it contains sensitive patient metadata. That being said, the issue I mentioned in my previous comment is resolved in the latest commit on master (f95e0ae). Since I do have a commit that works and you are currently actively developing this package, I will hold off until the next release to see if any issues remain. If you want, you can ping me to test the release candidate on my Excel spreadsheet. If that doesn't work, I could then spend time creating a derivative Excel spreadsheet that reproduces the same issue and doesn't contain sensitive information.

Thanks for your work on readxl! It's really appreciated. 😄

@jennybc

This comment has been minimized.

Member

jennybc commented Feb 27, 2017

@brunogrande Thanks. I suspected that recent improvements on the xls side would address this. So let's just be happy it's resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment