New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on reading empty worksheet #222

Closed
msgoussi opened this Issue Dec 28, 2016 · 3 comments

Comments

Projects
None yet
3 participants
@msgoussi

msgoussi commented Dec 28, 2016

Dear Mr.
I am trying to read sheets from xls file

sfile<-"EIU_Data.xls"
sheetlist<-readxl::excel_sheets(sfile)
for (i in 1:length(sheetlist)){
  temp<-readxl::read_excel(sfile,sheet=i,col_names=FALSE)
if (nrow(temp)!=0){
# rest of my own code
}}

However I get this error, since the last 3 sheets are empty.
Error in xls_cols(path, sheet, col_names = col_names, col_types = col_types, :
col_names and col_types must have the same length
Is there a way to solve this problem

EIU_Data.zip

@hadley hadley added the bug label Jan 3, 2017

@jennybc jennybc changed the title from Error in Reading to Error on reading empty worksheet Jan 4, 2017

@jennybc

This comment has been minimized.

Member

jennybc commented Jan 4, 2017

Last 3 worksheets are boilerplate, empty Sheet1, Sheet2, Sheet3. This is about reading empty worksheets. Same outcome but different error message on xlsx.

## https://github.com/hadley/readxl/issues/222
library(readxl)
sfile <- "EIU_Data.xls"
sheetlist <- readxl::excel_sheets(sfile)
head(sheetlist)
#> [1] "DGDP" "GDPP" "RGDP" "GDPD" "EXPD" "IMPD"
tail(sheetlist)
#> [1] "INPX"   "INPY"   "EFIR"   "Sheet1" "Sheet2" "Sheet3"
# for (i in 1:length(sheetlist)){
#   temp<-readxl::read_excel(sfile,sheet=i,col_names=FALSE)
#   if (nrow(temp)!=0){
#     # rest of my own code
#   }}
i <- 1
temp <- read_excel(sfile, sheet=i, col_names=FALSE)
str(temp)
#> Classes 'tbl_df', 'tbl' and 'data.frame':    214 obs. of  39 variables:
#>  $ X1 : chr  "No of countries : 202 " NA "Long name" "Percentage change in real GDP, over previous year." ...
#>  $ X2 : chr  "GDP (% real change pa) - DGDP [YQ]" "%" "Note" NA ...
#>  $ X3 : chr  NA NA "Source" "Derived from IMF, World Economic Outlook; Asian Development Bank" ...
#>  $ X4 : chr  NA NA "Eiu Analyst" "Alice Mummery" ...
#> snip, snip

i <- length(sheetlist)
sheetlist[i]
#> [1] "Sheet3"
(temp <- read_excel(sfile, sheet=i, col_names=FALSE))
#> Error in xls_cols(path, sheet, col_names = col_names, col_types = col_types,  : 
#>  `col_names` and `col_types` must have the same length

## an xlsx workbook with a single empty worksheet named Sheet1
df <- read_excel("empty.xlsx")
#>  Error in read_xlsx_(path, sheet, col_names = col_names, col_types = col_types,  : 
#>    Skipped over all data 
@jennybc

This comment has been minimized.

Member

jennybc commented Jan 7, 2017

Make sure to consider the case of header row but no data row (#144, #65).

@jennybc

This comment has been minimized.

Member

jennybc commented Jan 31, 2017

The xlsx part of this is covered under and fixed by #224, #240.

@jennybc jennybc added the xls 👵 label Jan 31, 2017

@jennybc jennybc closed this in #240 Jan 31, 2017

jennybc added a commit that referenced this issue Jan 31, 2017

Load cells at xlsx worksheet ingest; handle skipping and/or blank row… (
#240)

* Load cells at xlsx worksheet ingest; handle skipping and/or blank rows; fixes #224

* Actually these *should* be the same

* Make skipping tests more challenging (blank row btwn col names and data, plus another embedded blank row)

* Simplify return of 0x0 tibble for completely empty worksheet

* Mark cells to start reading from at the time of worksheet construction

* Remove vestigial, internal cell printing function

* Be quiet about empty worksheet

* More tests of nothingness

* Inform worksheet about its own name

* Tighten up the NEWs bullet for this PR

* Make some accessor member functions const

* Use skip() to issue note-to-future-self

Expedient place to park a summary of what this entire PR does.

* Improved parsing of sheet geometry for xlsx. (#240, @jennybc).

    - Better handling of leading and embedded blank rows and explicit row skipping. (#224, #194, #178, #156, #101)
    - Worksheets that are completely empty or that contain only column names no longer error, but return a tibble with zero rows. (#222, #144, #65)
    - Location is inferred for cells that do not declare their location (e.g. xlsx written by JMP). (#163, #102)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment