New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicating values for all previously merged cells, for .xlsx formats #220
Conversation
may god have mercy on my soul
Fatally aborts on merged data
Added test for merged cells! |
Can someone try installing this and testing it out on the 'tests/testthat/merge_test.xlsx' file? When I install and run it, it works fine, but Travis CI seems to be not replicating my results. EDIT: I've run it on two different versions of OSX and it seems fine. No idea why Travis fails the tests I've made. |
readxl is focused on reading rectangular data and the scope does not include dealing with merged cells. For dealing with such headaches, you may want to check out: https://github.com/nacnudus/tidyxl#readme Thanks. |
Thanks for the info! |
Apologies if this is the wrong place to continue this discussion but I'm also facing this issue, dealing with data with multiple headers. In the past I've handled it first in python, but I'd like a tidyverse solution. See https://howisonlab.github.io/datawrangling/Handling_multi_indexes.html#a-tidyverse-solution Could this be helped by a tidyr::fill with direction="right"? |
I think this sort of table remains squarely NOT in the target zone for readxl. It is in the target zone for packages like tidyxl and jailbreaker, however. If I were to process this with readxl, which is not crazy, I would use Then I would read the header rows in separately, using You're unlikely to ever see One minor comment on the code at your link: Since you are setting the I'd be pleased if you want to open a new issue requesting merged cell support, i.e. repeating the value instead filling with NA. I can imagine how that would work. |
Currently, when merged cells in the
.xlsx
format become an R object, all individual cells that made up the merged cell are assignedNA
, except for the top-left-most cell, which retains the original value. Something like the following:becomes:
I believe that should not be thought of as the appropriate behavior--almost all information about where the merged cells were is lost, and at the very least, it seems less intuitive and aesthetically pleasing.
To me, the preferred behavior should result in:
In this pull request, I've added code that results that behavior for
.xlsx
files. In order to keep as much of the original code unaltered as possible, the function I added automatically duplicates the attributes and values of the node with the original value for the merged cell, and clones it into the empty cells that were once a part of the same merged cell, keeping their original references or whatever they're called (e.g. "B3", "F9", etc.).If people think it would be better to have this behavior be optional, I can add that in. Additionally, I can add tests if need be.
Also, I was testing/editing the code on GitHub, so I have a lot of commits. I don't know what people's philosophy is about that, but if it's preferable, I can go back and get rid of a lot of the annoying commits I made.