New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading xlsx and xls #208

Closed
msgoussi opened this Issue Oct 10, 2016 · 4 comments

Comments

Projects
None yet
5 participants
@msgoussi

msgoussi commented Oct 10, 2016

Dear Mr. Hadley,
I have a file downloaded from a database (in both version 2007 (xlsx) and 2003 (xls) )
If I use read_excel with use xlsx, get an error
Error: Couldn't find 'xl/styles.xml' in 'D:/My Documents/DRSD/R/EIU/EIU_Data.xlsx'

I think that sheets are xml files combined in Workbook.

if I use read_excel xls, get an error
Error in x[needs_ticks] <- paste0("", gsub("", "\`", x[needs_ticks]), :
NAs are not allowed in subscripted assignments

I really wish to help with this his error and show me how to read every sheet.

Thanks
xls file
https://drive.google.com/file/d/0B3Z74IvmfSYQQjF4bzhDXy1iTlU/view

xlsx file
https://drive.google.com/file/d/0B3Z74IvmfSYQeWNTemJJUzZ5WkE/view

@MichaelChirico

This comment has been minimized.

MichaelChirico commented Oct 22, 2016

@msgoussi I don't have any problem with the first file:

fl = "EIU_Data.xls"
library(readxl)

all(sapply(excel_sheets(fl),
           function(x) 
             is.data.frame(read_excel(fl, sheet = x, na = "n.a."))))
# [1] TRUE

I'm running version: readxl_0.1.1.9000.

Your xlsx file is not shared

@donboyd5

This comment has been minimized.

donboyd5 commented Dec 8, 2016

Hi,

I have the same problem when trying to read some xls files from the U.S. Bureau of Economic Analysis. Reproducible example below.

Not shown: xlsx::read.xlsx can read the files.

Don

# reproducible example of problem reading xls files from U.S. Bureau of Economic Analysis
library("readxl")
tfile <- tempfile()
download.file("http://www.bea.gov//national/nipaweb/GetCSV.asp?GetWhat=SS_Data/SectionAll_xls.zip&Section=11", tfile, mode="wb")
unzip(tfile, list=TRUE) # verify that we really have the file
tdir <- tempdir()
unzip(tfile, exdir=tdir)
dir(tdir) # verify that the xls files are there
excel_sheets(paste0(tdir, "/Section1all_xls.xls")) # verify that we can identify the file we want
df <- read_excel(paste0(tdir, "/Section1all_xls.xls"), sheet=2)
df
# Results in this error:
# Error in x[needs_ticks] <- paste0("`", gsub("`", "\\\\`", x[needs_ticks]),  : NAs are not allowed in subscripted assignments
@donboyd5

This comment has been minimized.

donboyd5 commented Dec 10, 2016

I now see that the above problem is caused by bad column names in the Excel file. It is solved by using col_names=FALSE. It would be great if read_excel could generate a more descriptive warning in a case like this.

Don

@hadley hadley added the bug label Jan 3, 2017

@jennybc

This comment has been minimized.

Member

jennybc commented Jan 6, 2017

  1. Agree readxl should not return a tibble with NA for 1 or more column names. That is #199.
  2. However, tibble should still be able to print such a thing. I've opened a pull request there tidyverse/tibble#207.

@jennybc jennybc closed this Jan 6, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment