Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readxl is not able to read an xls file #619

Closed
smsaladi opened this issue Jun 4, 2020 · 2 comments
Closed

readxl is not able to read an xls file #619

smsaladi opened this issue Jun 4, 2020 · 2 comments

Comments

@smsaladi
Copy link

smsaladi commented Jun 4, 2020

I have an excel file that's generated by an application that reads data off an instrument. It looks like read_excel is not able to successfully parse excel files exported by the application, returning just an empty tibble:

library(readxl)
data <- read_xls("2020-06-02_02-39-48_Quantitation_Summary.xls")
#> data                                                                                                             
## A tibble: 0 x 0

In case it's helpful, it looks like pandas is able to process it ok, but with an warning:

In [3]: df = pd.read_excel("2020-06-02_02-39-48_Quantitation_Summary.xls")
WARNING *** file size (8461) not 512 + multiple of sector size (512)

In [4]: df.head()
Out[4]:
   Unnamed: 0 Well Fluor  Content Sample        C(t)  SQ
0         NaN  A03  SYBR  Unkn-01    H2O   62.048927 NaN
1         NaN  A04  SYBR  Unkn-05    H2O   68.577469 NaN
2         NaN  A09  SYBR   NTC-09    H2O   60.147350 NaN
3         NaN  A10  SYBR   NTC-13    H2O   85.389522 NaN
4         NaN  B03  SYBR  Unkn-02    CVS  106.360012 NaN

Github doesn't like .xls files attached, so I zipped it up:
2020-06-02_02-39-48_Quantitation_Summary.xls.zip

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] reprex_0.3.0    readxl_1.3.1    cowplot_1.0.0   forcats_0.4.0   stringr_1.4.0   dplyr_0.8.3     purrr_0.3.3    
 [8] readr_1.3.1     tidyr_1.0.0     tibble_2.1.3    ggplot2_3.3.0   tidyverse_1.3.0

loaded via a namespace (and not attached):
 [1] tidyselect_0.2.5 xfun_0.12        haven_2.2.0      lattice_0.20-38  colorspace_1.4-1 vctrs_0.2.1     
 [7] generics_0.0.2   htmltools_0.4.0  yaml_2.2.0       base64enc_0.1-3  rlang_0.4.2      pillar_1.4.3    
[13] withr_2.1.2      glue_1.3.1       DBI_1.1.0        dbplyr_1.4.2     modelr_0.1.5     lifecycle_0.1.0 
[19] munsell_0.5.0    gtable_0.3.0     cellranger_1.1.0 rvest_0.3.5      evaluate_0.14    knitr_1.27      
[25] callr_3.4.0      ps_1.3.0         fansi_0.4.1      broom_0.5.3      Rcpp_1.0.3       clipr_0.7.0     
[31] backports_1.1.5  scales_1.1.0     jsonlite_1.6     fs_1.3.1         hms_0.5.3        digest_0.6.23   
[37] stringi_1.4.5    processx_3.4.1   grid_3.6.1       cli_2.0.1        tools_3.6.1      magrittr_1.5    
[43] whisker_0.4      crayon_1.3.4     pkgconfig_2.0.3  zeallot_0.1.0    xml2_1.2.2       lubridate_1.7.4 
[49] assertthat_0.2.1 rmarkdown_2.0    httr_1.4.1       rstudioapi_0.10  R6_2.4.1         nlme_3.1-143    
[55] compiler_3.6.1
@smsaladi
Copy link
Author

smsaladi commented Jun 6, 2020

Looks like this is an issue with libxls not being able to parse this file. I've opened up an issue there: libxls/libxls#76

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Jul 14, 2020
Changes:
 - Allow truncated XLS files #55 #60 #76 tidyverse/readxl#619
 - Fix long-standing "extra column" bug #73
 - Support for RSTRING records (rich-text cells in older BIFF5 files)
   tidyverse/readxl#611

Minimum version updated in bl3 due to header changes.
@jennybc
Copy link
Member

jennybc commented Jul 29, 2021

As expected, now I've pulled in the most recent libxls, this now works 🎉
Note that this currently applies only to the dev version of readxl.

library(readxl)

data <- read_xls("investigations/2020-06-02_02-39-48_Quantitation_Summary.xls")
data
#> # A tibble: 32 x 6
#>    Well  Fluor Content Sample `C(t)` SQ   
#>    <chr> <chr> <chr>   <chr>   <dbl> <lgl>
#>  1 A03   SYBR  Unkn-01 H2O      62.0 NA   
#>  2 A04   SYBR  Unkn-05 H2O      68.6 NA   
#>  3 A09   SYBR  NTC-09  H2O      60.1 NA   
#>  4 A10   SYBR  NTC-13  H2O      85.4 NA   
#>  5 B03   SYBR  Unkn-02 CVS     106.  NA   
#>  6 B04   SYBR  Unkn-06 CVS      48.7 NA   
#>  7 B09   SYBR  NTC-10  CVS      63.5 NA   
#>  8 B10   SYBR  NTC-14  CVS      82.7 NA   
#>  9 C03   SYBR  Unkn-03 QTIP     69.8 NA   
#> 10 C04   SYBR  Unkn-07 QTIP     73.2 NA   
#> # … with 22 more rows

Created on 2021-07-29 by the reprex package (v2.0.0.9000)

@jennybc jennybc closed this as completed Jul 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants