New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty rows are placed at the bottom of the data frame #156

Closed
huftis opened this Issue Jan 13, 2016 · 2 comments

Comments

Projects
None yet
3 participants
@huftis

huftis commented Jan 13, 2016

When importing Excel files, empty rows seem to be placed at the bottom of the resulting data frame. Here’s an example file:

http://huftis.org/nedlasting/excel/row-col-test.xlsx

The file contains two similar worksheets/tabs. Both have 8 columns (named A–H) and 8 rows. The diagonal is filled with the numbers 1 to 8 for the first worksheet. The second worksheet is similar, but is missing the number 4 and 7, resulting in row 5 and 8 being empty.

Reading the first worksheet works fine:

> read_excel("row-col-test.xlsx", sheet=1)
   A  B  C  D  E  F  G  H
1  1 NA NA NA NA NA NA NA
2 NA  2 NA NA NA NA NA NA
3 NA NA  3 NA NA NA NA NA
4 NA NA NA  4 NA NA NA NA
5 NA NA NA NA  5 NA NA NA
6 NA NA NA NA NA  6 NA NA
7 NA NA NA NA NA NA  7 NA
8 NA NA NA NA NA NA NA  8

But reading the second worksheet results in the empty rows being removed and two new empty rows placed at the bottom:

> read_excel("row-col-test.xlsx", sheet=2)
   A  B  C  D  E  F  G  H
1  1 NA NA NA NA NA NA NA
2 NA  2 NA NA NA NA NA NA
3 NA NA  3 NA NA NA NA NA
4 NA NA NA NA  5 NA NA NA
5 NA NA NA NA NA  6 NA NA
6 NA NA NA NA NA NA NA  8
7 NA NA NA NA NA NA NA NA
8 NA NA NA NA NA NA NA NA

Expected results:

> read_excel("row-col-test.xlsx", sheet=1)
   A  B  C  D  E  F  G  H
1  1 NA NA NA NA NA NA NA
2 NA  2 NA NA NA NA NA NA
3 NA NA  3 NA NA NA NA NA
4 NA NA NA NA NA NA NA NA
5 NA NA NA NA  5 NA NA NA
6 NA NA NA NA NA  6 NA NA
7 NA NA NA NA NA NA NA NA
8 NA NA NA NA NA NA NA  8
@Nihasa

This comment has been minimized.

Nihasa commented Jun 21, 2016

I'm having this issue as well using data generated from a machines software (tecan machine, maggelan software). It seems for some of my data this happens, and it seems to be because something is written in the cells; opening the file and pasting another empty row into the empty rows stops the rows from being placed at the bottom, but it took a while to figure out that there was a problem in the first place.

Also, reading in the file with the java based xlsx packages read.xlsx2 does not cause this issue.

@jennybc

This comment has been minimized.

Member

jennybc commented Jan 7, 2017

I've replicated what @huftis reports, with the provided example sheet (thanks!). Now covered by #224.

@jennybc jennybc closed this Jan 7, 2017

jennybc added a commit that referenced this issue Jan 31, 2017

Load cells at xlsx worksheet ingest; handle skipping and/or blank row… (
#240)

* Load cells at xlsx worksheet ingest; handle skipping and/or blank rows; fixes #224

* Actually these *should* be the same

* Make skipping tests more challenging (blank row btwn col names and data, plus another embedded blank row)

* Simplify return of 0x0 tibble for completely empty worksheet

* Mark cells to start reading from at the time of worksheet construction

* Remove vestigial, internal cell printing function

* Be quiet about empty worksheet

* More tests of nothingness

* Inform worksheet about its own name

* Tighten up the NEWs bullet for this PR

* Make some accessor member functions const

* Use skip() to issue note-to-future-self

Expedient place to park a summary of what this entire PR does.

* Improved parsing of sheet geometry for xlsx. (#240, @jennybc).

    - Better handling of leading and embedded blank rows and explicit row skipping. (#224, #194, #178, #156, #101)
    - Worksheets that are completely empty or that contain only column names no longer error, but return a tibble with zero rows. (#222, #144, #65)
    - Location is inferred for cells that do not declare their location (e.g. xlsx written by JMP). (#163, #102)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment