Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More clarity about expected lunisolar calendar behavior for large dates #2869

Open
Manishearth opened this issue May 30, 2024 · 2 comments
Open

Comments

@Manishearth
Copy link

Manishearth commented May 30, 2024

Prior context: unicode-org/icu4x#4917 in ICU4X, as well as unicode-org/icu4x#4713, unicode-org/icu4x#4904, and some others.

Temporal.PlainDate has a validity range of ≈ Unix epoch ± 250,000 years. This is quite a large range, but it makes perfect sense for working with mathematically defined calendars like the Gregorian calendar: the concept of a Gregorian day 200,000 years into the future is something where there is a reasonable answer to the question.

However, when it comes to lunisolar calendars dependent on astronomical concerns1, and even to some extent solar calendars like the Persian calendar, answering the question "what is $date in $calendar" becomes far murkier. For such calendars, there are three potential sources of answers:

 - "the ground truth": what people actually believe to be the details of the calendar: This is what is printed in almanacs and generally only extends at most 100 years into the future. When there are potential ambiguities; for example when moonrise occurs extremely close to sunrise time, the user community tends to make a call in some direction.
 - "the space truth": what is actually going on in space, plugged in to the definition of the calendar. This can be affected by higher-order characteristics of the celestial orbits, as well as some kinds of unpredictable uncertainties in the really long run.
 - "the math truth": what the algorithms say, and what computers say when they run the algorithms. This is what's actually implementable, but will diverge from the space truth due to celestial approximations, floating point error, and unpredictable higher order factors of space.

The long term intangibility of ground truth means that there is no right answer for the behavior of such a calendar beyond maybe 100 years into the future. You can make informed guesses, but their accuracy starts dwindling quickly as time passes. Of course, the usefulness of the question also dwindles over time: the precise date of the Chinese calendar exactly 10,000 years from now is not really that usable for anything other than idle curiosity.

(Similar considerations apply for the far past: there's little point debating the accuracy of a calendrical calculation for dates before the inception of the calendar)

Given that Temporal expects implementations to support dates in a very large range, it is probably useful to provide guidance and invariants that implementations should follow when dealing with these issues.

Some questions that could be answered:

  • Should such dates be accepted by calendared Temporal.PlainDate in the first place?
  • What are the date ranges we strongly care about for different calendars? Ones where we really do want accuracy with ground truth and relatively predictable behavior where there is no directly known ground truth?
  • Would it be acceptable for such dates to "fall back" to showing Gregorian when "out of range", similar to how the modern Japanese calendar falls back to Gregorian for pre-Meiji eras?
  • Would it be acceptable for such dates to go through simplified arithmetical calculations that are known to not match the calendar definition but will mostly be fine anyway? (E.g. the Chinese calendar could be approximated to following a Metonic cycle with some method for determining which month is the leap month, or even just fixing the leap month to one specific month)
  • How important is ISO roundtripping for these dates (probably extremely important)
  • How important are calendar internal invariants for such dates? These are invariants that are part of the definition of the calendar. For example:
    • Is it acceptable for an Islamic or Chinese month in the far future to have a number of days other than 29 or 30?
    • Is it acceptable to have an Islamic year with a number of days other than 354 or 355?
  • How important are general calendrical invariants for such dates? These are invariants that deal with generic expectations on how dates and calendars work. For example should:
    • adding one day always produce the next day in the month (or the first day in the next month)
    • adding and then subtracting a duration always roundtrip?
    • (@sffc please add others here if you think them important)

(We found that "calendar internal" invariants and "general calendrical" invariants are often in tension when attempting to patch up algorithms to behave nicely for such dates)

cc @hsivonen @anba

Footnotes

  1. All of them except Islamic Tabular and Hebrew. The former follows a fixed roughly-alternating cycle of short and long years, and the latter at the moment is considered to follow a purely arithmetical system where the lunation time is a known approximation expressed as an integer number of ḥalakim. This is a case where the ground truth is basically defined to deliberately ignore the space truth. This means the Hebrew calendar will slowly desynchronize from the lunar cycle but that is ultimately expected and okay. There are, of course, chances for future adjustment happening anyway.

@sffc
Copy link
Collaborator

sffc commented May 30, 2024

Temporal gives 3 ways of representing a particular PlainDate:

  1. ISO: isoYear, isoMonth, and isoDay with a calendar system (this is the one we use internally in the spec)
  2. Codes: era, eraYear, monthCode, and day
  3. Scalars: year, month, and day (this is what ICU4X uses internally)

Being able to convert between all three representations without ambiguity is I think the most important invariant. I will call this the equivalence relation.

Temporal also defines the following invariants for the scalar properties:

  • year is a signed integer representing the number of years relative to a calendar-specific epoch
  • The first month in every year has month equal to 1. The last month of every year has month equal to the monthsInYear property.
  • day is a positive integer representing the day of the month.

Following from these definitions and the equivalence relation are the following arithmetic invariants that I coded into ICU4X in unicode-org/icu4x#4904: The following operations must be equivalent: adding or subtracting 1 day to ISO, adding or subtracting 1 day to Codes, and adding or subtracting 1 day to Scalars. One can write a proof that these invariants must be true for the above definitions to hold.

Calendars that seem like they don't obey these invariants should be modified to do so. For example, juligreg skips about 10 days in the 1600s, violating the definition of day. To fix this problem, that particular month should be shortened and the day field should be adjusted to fill the gap. The offset can and should be fixed during formatting.

For reasons the champions have discussed previously, I think it is wise for Temporal to enforce these invariants. It allows careful developers to craft calendar-independent logic: no matter which calendar is in use, there are certain operations that are always sound, operations derived from the above invariants.

@khawarizmus
Copy link
Contributor

@Manishearth

How important are calendar internal invariants for such dates? These are invariants that are part of the definition of the calendar. For example:
Is it acceptable for an Islamic or Chinese month in the far future to have a number of days other than 29 or 30?
Is it acceptable to have an Islamic year with a number of days other than 354 or 355?

We have finally finalized a proposal named Hijri week calendar (HWC) that is a counterpart of the ISO calendar for Hijri calendars. We have a working Temporal implementation for it.

When doing so we realised that some Hijri calendars like the islamic or the islamic-rgsa don't follow the invariants of the Hijri calendar as you have mentioned. We considered that as a bug that will potentially be fixed in the ICU implementation.

I am mentioning this as to consider a fix for these calendars to make them compatible with the HWC as we are exploring to port the HWC to CLDR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants