Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't misinterpret colors as dates #559

Merged
merged 4 commits into from Mar 23, 2022
Merged

Conversation

nacnudus
Copy link
Contributor

@nacnudus nacnudus commented Mar 9, 2019

Fixes #388
Closes #692

This caught me again recently.

Excel number formats support colours for, e.g. negative values 0.00;0.00[Red]. Currently the d in Red (and some letters of other colours) is confused with the d in a date format dd/mm/yyyy, so the cell is incorrectly treated as a date cell.

The tests force a warning if this happens, by explicitly setting col_type = "numeric".

The 'lowercase' variant of the test spreadsheet uses red (and similar) rather than Red, because this is allowed by the spec, though Excel itself doesn't create such files.

@jennybc
Copy link
Member

jennybc commented Mar 9, 2019

Thanks for tacking this!

Do we know if similar code is already in use somewhere and is therefore a bit battle-tested?

It would be very reassuring to use a variant of something that is proven to work.

@nacnudus
Copy link
Contributor Author

Hmm, this does deserve more work. I hadn't really set aside time for it so I'll be a bit slow coming back to it. Tidyxl has used this code for a while, but that's hardly battle tested so I'm comparing it with xlrd, which takes a similar approach. There are subtle bugs in both.

@jennybc
Copy link
Member

jennybc commented Mar 10, 2019

It doesn't have to be perfect -- we have a rather large bug at the moment. But it's our bug and it seems to cause very few problems and I would hate to implement a solution that caused a net increase in user pain.

@kcrt
Copy link

kcrt commented Nov 5, 2021

In the Chinese, Japanese, and Korean versions of Excel, the default representation of "number" is "0_);[Red](0)".
So, many files are difficult to load because of this problem.
I hope this problem will be solved someday.

@jennybc jennybc changed the title Don't misinterpret colors as dates (fixes #388) Don't misinterpret colors as dates Mar 23, 2022
Add a test and test file based on a format where the confusion can come via currency, not colour.

Link to issues.

We generally incorporate `-xls` or `-xlsx` into test file names, so the format is obvious even if the file is open in Excel.

Added the actual implicit number formats in comments.
@jennybc jennybc merged commit a31c83c into tidyverse:main Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The 'd' in [red] is interpreted as a date format
3 participants