Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Invalid cell: lacks ref attribute -- cannot read certain xlsx files properly unless file is opened and re-saved first #102

Closed
donboyd5 opened this issue May 20, 2015 · 2 comments
Labels
bug an unexpected problem or unintended behavior cells 🔲

Comments

@donboyd5
Copy link

Hi,

When I try to use read_excel to read certain .xlsx files that I have downloaded from the internet, I get "Error: Invalid cell: lacks ref attribute," although xlsx:read.xlsx can read them properly. Furthermore, similar to issue #99, if I open the file manually with Excel and save it, the problem goes away. Here is an example:

> library(readxl)
> library(xlsx)
> library(dplyr)
> 
> url <- "http://mort.soa.org/Export.aspx?Type=xlsx&TableIdentity=1594"
> fname <- "e:/temp/tabl1594.xlsx"
> 
> download.file(url, fname, mode="wb")
trying URL 'http://mort.soa.org/Export.aspx?Type=xlsx&TableIdentity=1594'
Content type 'application/octet-stream' length 9227 bytes
downloaded 9227 bytes

> 
> df1 <- read.xlsx(fname, sheetIndex = 1) # this works
> glimpse(df1)
Observations: 91
Variables:
$ Table.Name.                                            (fctr) Table Identity:, Provider Domain:, Provider Name:, Table Reference:, Content...
$ RP.2000.Mortality.Table.â...Male.Aggregate...Employees (fctr) 1594, soa.org, Susie Lee, Retirement Plans Experience Committee of the Socie...
> 
> df2 <- read_excel(fname) # Error: Invalid cell: lacks ref attribute
Error: Invalid cell: lacks ref attribute
> glimpse(df2)
Error in nrow(tbl) : object 'df2' not found
> 

> # now manually open file with Excel and save it, then retry:
> df3 <- read_excel(fname)
> glimpse(df3)
Observations: 93
Variables:
$ Table Name:                                          (chr) "Table Identity:", "Provider Domain:", "Provider Name:", "Table Reference:", "C...
$ RP-2000 Mortality Table – Male Aggregate - Employees (chr) "1594", "soa.org", "Susie Lee", "Retirement Plans Experience Committee of the S...
> 

Many thanks.

Don

@jennybc jennybc added the bug an unexpected problem or unintended behavior label Jan 20, 2017
@jennybc
Copy link
Member

jennybc commented Jan 20, 2017

I can reproduce this, but can't immediately lump with one of the non-standard xml problems I already know about.

@jennybc
Copy link
Member

jennybc commented Jan 31, 2017

Will be fixed by #240. Related to #163.

@jennybc jennybc closed this as completed Jan 31, 2017
jennybc added a commit that referenced this issue Jan 31, 2017
#240)

* Load cells at xlsx worksheet ingest; handle skipping and/or blank rows; fixes #224

* Actually these *should* be the same

* Make skipping tests more challenging (blank row btwn col names and data, plus another embedded blank row)

* Simplify return of 0x0 tibble for completely empty worksheet

* Mark cells to start reading from at the time of worksheet construction

* Remove vestigial, internal cell printing function

* Be quiet about empty worksheet

* More tests of nothingness

* Inform worksheet about its own name

* Tighten up the NEWs bullet for this PR

* Make some accessor member functions const

* Use skip() to issue note-to-future-self

Expedient place to park a summary of what this entire PR does.

* Improved parsing of sheet geometry for xlsx. (#240, @jennybc).

    - Better handling of leading and embedded blank rows and explicit row skipping. (#224, #194, #178, #156, #101)
    - Worksheets that are completely empty or that contain only column names no longer error, but return a tibble with zero rows. (#222, #144, #65)
    - Location is inferred for cells that do not declare their location (e.g. xlsx written by JMP). (#163, #102)
@lock lock bot locked and limited conversation to collaborators Oct 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior cells 🔲
Projects
None yet
Development

No branches or pull requests

2 participants