Problem Loading xls file #483

jameshunterbr · 2018-05-17T21:21:31Z

Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/ or https://community.rstudio.com/.

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.

I tried to import a large xls file from the Los Alamos National Lab's HIV database. The original version did not work. It showed a green sidebar in RStudio's R markdown file, but resulted in 0 observations. However, when I cut it down to 48 cases to make it easier to send to you, it worked! Worked means that lanl_epi was loaded into memory with 1879 cases.

The second command below (48 cases) functioned fine. I am sending both commands in the reprex, and a compressed version of both files. The smaller one has "mod" as the first letters instead of "lanl". The version of readxl I am using is 1.1.0.

library(tidyverse)
library(readxl)
# did not work; 1879 cases
lanl_epi <- read_excel("lanl_ctl_summary.xls", skip = 2,
                       col_types = c(rep("text", 2), rep("numeric", 2), 
                                     rep("text", 5)), 
                       col_names = c("Epitope", "Protein", "HXB2_start", 
                          "HXB2_end", "Subprotein", "DNA_contig", "Subtype",
                          "Species", "HLA"))
# worked; 48 cases
lanl_epi <- read_excel("mod_ctl_summary.xls", skip = 2,
                       col_types = c(rep("text", 2), rep("numeric", 2), 
                                     rep("text", 5)), 
                       col_names = c("Epitope", "Protein", "HXB2_start", 
                          "HXB2_end", "Subprotein", "DNA_contig", "Subtype",
                          "Species", "HLA"))

files for test.zip

jennybc · 2018-05-17T22:36:31Z

Thanks! Is the original xls file available on the web somewhere, i.e. can you point me at a URL?

jameshunterbr · 2018-05-17T22:59:06Z

Here's the page: https://www.hiv.lanl.gov/content/immunology/tables/ctl_summary.html Just click on the XLS button on the top line. Jim ————————————————— James R. Hunter Laboratório de Retrovirologia Disciplina de Infectologia Departamento de Medicina Escola Paulista de Medicina Universidade Federal de São Paulo Cel: (11) 9-5327-5656 Lab 1 Rua Pedro de Toledo 669 6º Andar Fundos 04039-032 São Paulo, SP, BRASIL Fone (11) 5576-4834 —————————————————

…

On May 17, 2018, 19:36 -0300, Jennifer (Jenny) Bryan ***@***.***>, wrote: Thanks! Is the original xls file available on the web somewhere, i.e. can you point me at a URL? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

jennybc · 2018-05-17T23:16:21Z

In future, if the troublesome xls is available online, you can save yourself the pain of attaching and just point us at it. Thanks.

p0bs · 2018-06-01T14:21:27Z

If it helps you, I also have similar trouble with this spreadsheet, although not other spreadsheets in a broadly similar format.

Many thanks for your great work.

jennybc · 2018-12-14T01:25:44Z

Unfortunately the current version of libxls and, therefore, readxl, still can't read this one. I've reported upstream in libxls.

jennybc · 2018-12-14T01:27:42Z

The only readxl workaround I can offer is to open it in Excel and save as .xlsx. Then I can read it fine with readxl.

jennybc · 2018-12-14T02:15:16Z

Thanks to speedy work by @evanmiller, this is now fixed in libxls and therefore in the dev version of readxl.

readxl::read_excel("investigations/ctl_summary.xls")
#> # A tibble: 1,901 x 9
#>    Epitope Protein `HXB2 start` `HXB2 end` Subprotein `HXB2 DNA Conti…
#>    <chr>   <chr>          <dbl>      <dbl> <chr>      <chr>           
#>  1 Data l… <NA>              NA         NA <NA>       <NA>            
#>  2 MGARAS… Gag                1         10 p17(1-10)  790..819        
#>  3 MGARAS… Gag                1         11 p17(1-11)  790..822        
#>  4 ASVLSG… Gag                5         13 p17(5-13)  802..828        
#>  5 ASILRG… Gag                5         15 p17(5-15)  802..834        
#>  6 ASVLSG… Gag                5         18 p17(5-18)  802..843        
#>  7 SVLSGG… Gag                6         15 p17(6-15)  805..834        
#>  8 SVLSGG… Gag                6         18 p17(6-18)  805..843        
#>  9 SVLSGG… Gag                6         19 p17(6-19)  805..846        
#> 10 LSGGEL… Gag                8         18 p17(8-18)  811..843        
#> # … with 1,891 more rows, and 3 more variables: Subtype <chr>,
#> #   Species <chr>, HLA <chr>

^{Created on 2018-12-13 by the reprex package (v0.2.1.9000)}

jennybc mentioned this issue Dec 14, 2018

xls file throws no error, but no cell data retrieved ("Los Alamos sheet") libxls/libxls#28

Closed

jennybc added xls 👵 libxls labels Dec 14, 2018

jennybc closed this as completed Dec 14, 2018

lock bot locked and limited conversation to collaborators Dec 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem Loading xls file #483

Problem Loading xls file #483

jameshunterbr commented May 17, 2018

jennybc commented May 17, 2018

jameshunterbr commented May 17, 2018 via email

jennybc commented May 17, 2018

p0bs commented Jun 1, 2018

jennybc commented Dec 14, 2018

jennybc commented Dec 14, 2018

jennybc commented Dec 14, 2018

Problem Loading xls file #483

Problem Loading xls file #483

Comments

jameshunterbr commented May 17, 2018

jennybc commented May 17, 2018

jameshunterbr commented May 17, 2018 via email

jennybc commented May 17, 2018

p0bs commented Jun 1, 2018

jennybc commented Dec 14, 2018

jennybc commented Dec 14, 2018

jennybc commented Dec 14, 2018