-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Just re-saving XLSX file without making changes causes cell count mismatch #14
Comments
Looks like is has something to do with merged cells in my case: At the example of row 1, there is cell A1, and cells B1 to F1 are merged into one "visual cell". |
Indeed, in the original file generated via Apache POI 3.17,
But after resaving with Excel it has
So it looks like Apache POI is not writing out cell spans correctly 😢 Edit: Or phrased more correctly, Apache POI does not seem to make |
See tobyweston/simple-excel#14 for some information. Signed-off-by: Sebastian Schuberth <sebastian.schuberth@here.com>
See tobyweston/simple-excel#14 for some information. Signed-off-by: Sebastian Schuberth <sebastian.schuberth@here.com>
See tobyweston/simple-excel#14 for some information. Signed-off-by: Sebastian Schuberth <sebastian.schuberth@here.com>
Awesome spot. Thanks for posting and apologies for not looking sooner. So I'm clear, is this an issue with POI over simple-excel? Anything you think we should do in simple-excel now? Think we can get a (java) test in simple-excel to at least demonstrate the issue? I'm never confident in upgrading POI dependency but that might be worth considering? |
I believe by now this is an issue in POI rather than in simple-excel, yes. The question is, can simple-excel do anything to work around it? Something like "if there is a cell count mismatch and merged cells are involved do X to double-check", but me not knowing the POI API very well I have not idea what that X could be. Anyway, it would certainly be interesting to see whether POI 4.0.1 would fix the issue. |
From memory, POI was limited with merged cells. I may be wrong though. I'll try to take a look but a JUnit test demonstrating it would be super helpful (wink wink) |
I'll try to come up with a test, @tobyweston. As a small preparation for that, see #15 for some minor fixes. |
Thanks. Perhaps the first thing to look at is a general dependency update in the I'd also like to split the matcher stuff from the general poi wrapper. If no one finds my DSL/wrapper around poi useful, I'd be happy to remove it and leave this as a Hamcrest Matcher library only... any thoughts? |
Actually, my plan was to first try to come up with a test that reproduces my issue and fails initially. And then see what needs to be done to fix the issue; maybe upgrading to POI 4.1.0 would fix it. However, I'm currently running into some strange issue where a test passes that should actually fail. I'll probably create a draft PR to demonstrate the issue and ask for your help there. |
Sorry for the delay, @tobyweston. I've now created PR #16 to demonstrate how a test about merged cells succeeds that I'd expect to fail. Could you please have a look at the PR and comment on it? |
See my comment in #16 (comment) for a potential fix |
…orkbook as they seem equivalent (with the simple-excel version offering a little more around exception handling)
…than a literal comparison of merged cells)
Tracking on a branch |
…. Cell contents aren't used just the cell addresses within the region. Renamed the count based matcher for clarity.
…fined by the MergedRegionsMatcher)
Hmmm, rereading your comments. I've been focusing on including merged regions in the sheet comparisons and realise this probably doesn't address the original issue. I'll carry on (simple-excel will now fail when comparing sheets when two sheets have differing merged regions, regardless of content within those regions). Let me know if that helps / hinders you. Have you tried raising an issue with Apache POI? |
I'm sorry if my comments were setting you on the wrong track, but differences in the way merged regions are specified really were the only difference I could obverse. No, I haven't raised the issue upstream with Apache POI, mainly because I wasn't sure enough whether it really is an upstream issue, and I did come up with an MWE yet. |
No worries, I’ll have a closer look and see what I can do. Does this fix help you though? |
Unfortunately not. I just tried simple-excel 1.2 with Apache POI 4.1.0 / 4.1.1, but I still get
for Excel sheets that are actually the same, but where one has been re-saved with Excel. |
If you want to reproduce this, you can checkout the excel-poi branch of https://github.com/heremaps/oss-review-toolkit and then
See how the test succeeds. Then open |
Thanks,
The tweaks to merged cells is on 1.3 which isn’t released yet. It currently resides in the merged_refions branch. At least we know POI 4.x doesn’t help. I’ll look in more detail at your case and see if I can help before I actually release 1.3. If the merged regions are an authogonal issue, I may not include that “fix” (as I’m not sure how I feel about it). |
The mismatch occurs in
(for the above 6 cells are reported despite the
(for the above 2 cells are reported). Which is essentially the same that I had found out earlier by looking at the unzipped contents of the xlsx files. |
Ah, right, sorry, will try that branch later. |
So I've tried your merged_regions branch and the output is already much more telling:
Note that the merged region actually are the same, they are just not reported in the same order! That is, in the expected output for some reason |
Cool easily done! I was also thinking that if we need some custom behaviour, like a workaround to some of the matching, we could enable it with a flag so it doesn’t affect anyone unless they opt in... |
The simple-excel project has not been updated for about a year despite having issues [1] [2], so get rid of it in favor of using kotest matchers directly on the sheet properties. [1] tobyweston/simple-excel#14 [2] tobyweston/simple-excel#19 Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
The simple-excel project has not been updated for about a year despite having issues [1] [2], so get rid of it in favor of using kotest matchers directly on the sheet properties. [1] tobyweston/simple-excel#14 [2] tobyweston/simple-excel#19 Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
The simple-excel project has not been updated for about a year despite having issues [1] [2], so get rid of it in favor of using kotest matchers directly on the sheet properties. [1] tobyweston/simple-excel#14 [2] tobyweston/simple-excel#19 Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
The simple-excel project has not been updated for about a year despite having issues [1] [2], so get rid of it in favor of using kotest matchers directly on the sheet properties. [1] tobyweston/simple-excel#14 [2] tobyweston/simple-excel#19 Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
The simple-excel project has not been updated for about a year despite having issues [1] [2], so get rid of it in favor of using kotest matchers directly on the sheet properties. [1] tobyweston/simple-excel#14 [2] tobyweston/simple-excel#19 Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
FYI, I was finally able to solve this by writing my own code which "de-duplicates" cells that are actually part of merged cells before comparing cell contents, see |
Awesome but I feel bad we didn't resolve this in simple-excel. We were close no? I've re-read the trail but am fuzzy about the details. If I ever get some time (unlikely atm) I can remind myself and try and merge the |
TBH, in retrospect I don't believe the Here's an example: Let's assume there's a merged region at B1:D1. In the original workbook generated by Apache POI itself, the cell iterator for the first row would return A1, B1, C1, D1, E1 etc. I.e. it returns the individual cells of a merged region. When re-saving the workbook created by Apache POI in Excel, without doing any modifications, then the cell iterator suddenly returns A1, B1, E1 etc. So cells of a merged region other than the first cell are skipped. That's why I'm changing the cell iterator to always skip cells of a merged region other than the first cell of a merged region. In any case, I'd consider this to be a bug in Apache POI. |
Sorry for this support-type of question. I have saved an XLSX file as the expected result and the
sameWorkbook()
assertion passes whem comparing against a programmatically created workbook. However, if I just open and re-safe the XLSX file (without making any changes) with Excel on Windows, I getSo somehow the number of (non-physical) cells seems to have changed. Any idea why, and how to deal with it? I wanted to update the XLSX file with new test expectations, but now I cannot due to the above issue which always make the test fail.
The text was updated successfully, but these errors were encountered: