Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of leap seconds #110

Open
d70-t opened this issue Feb 28, 2021 · 3 comments
Open

Handling of leap seconds #110

d70-t opened this issue Feb 28, 2021 · 3 comments
Labels
protocol-extension Protocol extension related issue

Comments

@d70-t
Copy link

d70-t commented Feb 28, 2021

I just discovered the draft of the upcoming zarr-specs and am delighted to see that an official extension for proper encoding of datetimes is planned. This is great news 🎉 as encoding dates and times unambiguously often seems to be a difficult task.

One thing which I came across often when working on times from field campaign data (which were defined according to CF-Conventions within netCDF files) during the last year are leap seconds.
Documents about encoding times in n-d datasets almost always write about "seconds since 1970-01-01" or something similar. Sometimes the reference to POSIX timestamps is implicit, sometimes it is explicit (as in the current zarr-spec) and sometimes not existent. But the proper way of handling leap seconds is almost always missing. Opposed to that, in most of the code which is built around those specifications, leap seconds are simply ignored, so this is what I think is meant to be done here as well.

What I would like to see that some words about how exactly leap seconds are to be handled should be part of the spec. This can be as simple as

Leap seconds are ignored during the calculation of all dates and times. That is, they are treated as if all clocks stand still during the presence of an added leap second or as if the clocks jump by one second across a removed leap second (if that should ever occur).


A downside of this approach would be that measured data which has been captured exactly during a leap second can not be represented using this specification. While this does not happen often, there are definitely many long-running high-resolution sensor systems around which do capture data during these times. Enabling the handling of leap seconds for everyone might however cause a lot of trouble as leap second aware code is way more complicated that leap second unaware code. Thus there should probably be a widely accepted method of defining if leap seconds are to be counted or not.

C++20 goes this way and defines multiple clocks in the chrono standard library which all have precisely defined semantics. In particular system_clock behaves like the classical POSIX clocks, while utc_clock keeps counting during leap seconds and in particular is able to represent and distinguish times like:

2015-06-30 23:59:60.400 UTC
2015-06-30 23:59:60.800 UTC

Interesting material and recorded talks on this topic can be found at Howard Hinnant's excellent date library, which is the foundation of the new clock types in C++20.
This particular issue is discussed in the talk about tz.h.


Another thing which might become relevant is the possibility to specify different calendars. This could be common calendars like Julian or Islamic calendars, but may also be synthetic calendars, i.e. to represent times in a model run. I don't want to discuss these in detail in this issue, but if there would be a mechanism to specify how leap seconds are to be treated, this mechanism could maybe also be used to treat other calendars as well.

@rabernat
Copy link
Contributor

I just wanted to note the overlap with xarray's datetime handling. Because xarray users come from the climate science community, we have a ton of weird calendars we have to support, well beyond the standard ones. Xarray's datetime decoding is essentially outsourced to the cftime library. A common scenario is that we store encoded date data in netcdf / hdf / zarr as integer or float, together with calendar and units attributes. These are decoded with the cftime num2date method.

This works as-is with current versions of zarr. That's because datetime encoding / decoding is handled at a higher level of the stack (xarray), and zarr just has to worry about storing floats or ints. Not sure how this will change with the v3 spec.

@d70-t
Copy link
Author

d70-t commented Mar 1, 2021

I completely agree. There's a large overlap and that CF-based time handling will work on top of the current zarr variant. If zarr should opt for not handling dates and times explicitly and forward it to the higher level, then this issue doesn't apply.

However, if dates and times will end up being handled by zarr, it should explicitly mention leap seconds as that is (up to my knowledge) one weak point in the CF-Conventions. ... probably I should have a deeper look into CF, if we could introduce some wording which would tighten the conventions in this regard.

@d70-t
Copy link
Author

d70-t commented Mar 1, 2021

For reference, I discovered that more proper handling of leap seconds is already discussed since quite a while by the CF community in cf-convention/cf-conventions#148. As that issue seems to progress slowly, I've added another issue on the simpler solution above at cf-convention/cf-conventions#313.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
protocol-extension Protocol extension related issue
Projects
None yet
Development

No branches or pull requests

3 participants