Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to reduce cross-implementation incompatibility when new IDs are introduced? #6

Open
justingrant opened this issue Mar 23, 2023 · 20 comments

Comments

@justingrant
Copy link
Collaborator

justingrant commented Mar 23, 2023

While reviewing bug reports on existing ECMAScript canonicalization behavior, I ran across a problematic case: when a new TZDB identifier is introduced, inevitably some environments will get it before others, and the slower-to-update environments will throw an exception when provided with that new ID. @Yqwed in #5 also raised this case.

When environments are evergreeen browsers, then the delay to get new IDs (and their accompanying Zone rules) is relatively short. But other environments could take months or years to update. For example:

  • Server apps, e.g. that may be running on a 2-year old Node version
  • Libraries that embed TZDB data, which may require a PR to update. typo in zone name Europe/Kyiv ? iamkun/dayjs#2059 is one example.
  • IoT devices, which may require a firmware update to change their TZDB
  • Android devices, which are dependent on the carrier to roll out updates
  • Enterprise environments where IT delays updates for a long time.
  • My family, all of whom seem to wait to update their devices until they stop working without applying updates. :-)
  • Or other slow-to-update cases

How can we help make this problem less bad?

When the new ID is present because of a name change like Kiev=>Kyiv, then there is something we could recommend in the spec: that implementations first introduce the identifier without making it canonical (so that it will be recognized if others send it, but because non-canonical it won't be sent to others), wait a bit (maybe a few months?), and then make the new name canonical?

This would have a weird side effect though, because currently (at least once we fix the 13 outdated canonical identifiers in V8 and WebKit) non-canonical identifiers never become canonical; instead, the transitions only go one way. Is it OK to break that regularly? And even if yes, is it good to delay canonicalizing new renames?

On the other hand, when it's a brand-new identifier representing a new Zone, I'm not sure there's much the spec can do, other than perhaps adding additional metadata to time zones, e.g. an addedDate property to Temporal.TimeZone which could allow applications to decide to hide newly-introduced identifiers from dropdown time zone choosers in UIs. This would add a maintenance burden because this added-date data is not available in TZDB. And it'd complicate the Temporal API. So I'm on the fence about whether the value of such an additional metadata is worth it.

I have also been thinking about adding a Temporal.TimeZone.p.metadata getter which initially could return an object, e.g. { version: '2022g' } and could be extended with additional metadata later like an added date if there's enough customer demand. See #7 for discussion.

Another thing we could do would be to encourage implementations to dynamically update to the latest TZDB more quickly (e.g. dynamically on install instead of being bundled into the distribution) but this would vary behavior for the same version of a browser, so I'm not sure that this would be good.

Regardless, one thing we should do is to encourage (e.g. in MDN docs) ECMAScript developers to anticipate this "unknown but not invalid identifier" case and to defend against it, using clearer error messages when an identifier is not recognized.

Feedback encouraged on this issue!

@justingrant
Copy link
Collaborator Author

something we could recommend in the spec: that implementations first introduce the identifier without making it canonical (so that it will be recognized if others send it, but because non-canonical it won't be sent to others), wait a bit (maybe a few months?), and then make the new name canonical?

After seeing other bug reports about this topic, I now think this idea is a good one. The time when a new identifier is released will be a sensitive one, so delaying making a brand-new identifier canonical seems like a good idea so that when the change does happen, most of the Web will already know about the new identifier.

Here's a few example bugs where this approach might help.

Firefox 1796393: Javascript returns problematic timezone, breaking sites
Firefox 1825512: Europe/Kyiv is not a valid IANA timezone identifier
typo in zone name Europe/Kyiv ? - iamkun/dayjs - #2059

@Yqwed
Copy link

Yqwed commented Apr 19, 2023

I don't think using brand new timezone is reasonable (ignoring emotional part). You never practically need to expose raw Olson ID and who is likely to suffer are your users. I agree with IANA's position that Olson ID is just ID and it could be anything - random string, UUID, a number and so on.

implementations first introduce the identifier without making it canonical (so that it will be recognized if others send it, but because non-canonical it won't be sent to others), wait a bit (maybe a few months?), and then make the new name canonical?

What is the use case of canonicalisation? When an user of the a timezone library will need it?

wait a bit (maybe a few months?)

Items you mentioned are unlikely to be updated in that time frame.

Android devices, which are dependent on the carrier to roll out updates

nit: it is up to OEMs, not carriers. Though Android 10+ phones (roughly speaking, it is more complicated) can get updates from Mainline.

@justingrant
Copy link
Collaborator Author

justingrant commented Apr 27, 2023

Hi @Yqwed, thanks so much for your thoughtful replies. A few follow-up notes and questions are below.

What is the use case of canonicalisation? When an user of the a timezone library will need it?

I agree that the use cases for canonicalization are dubious! The current ECMAScript spec uses canonicalization everywhere a time zone identifier is used. There's no way for ECMAScript to return a non-canonical ID back to the caller from any ECMAScript API, including Temporal Stage 3.

A goal of this proposal is to change this to reduce the scope of user-observable canonicalization in ECMAScript. The basic idea is that identifiers should never be canonicalized before being returned back to callers via APIs like Temporal.ZonedDateTime.prototype.toString() or Intl.DateTimeFormat.prototype.resolvedOptions().timeZone.

So if a caller provides Asia/Calcutta as input, they'll get Asia/Calcutta in their output. If a caller provides Asia/Kolkata as input, they'll get Asia/Kolkata in their output. This would help make most ECMAScript programs less vulnerable to IANA changing the canonical ID for a time zone.

One thing I'm not sure about is how we should handle cases where the time zone ID comes not from an ECMAScript caller, but from the OS itself. Assume that an OS's IANA ID is, for example, Europe/Kiev but ECMAScript's canonical ID is Europe/Kyiv. If an ECMASCript program calls Temporal.Now.timeZoneId(), which ID do you think should be returned? The OS's ID or the canonical one according to IANA?

The one place where I think that we must expose canonical values to developers is Intl.supportedValuesOf('timeZone'), because the use case for this API is to return one value for each group of aliases, instead of returning both , for example, Asia/Calcutta or Asia/Kolkata. Picking the currently canonical ID seems like the easiest way to choose which ID to return.

But other than that one API, I want to consider removing user-observable canonicalization everywhere else. See https://docs.google.com/presentation/d/1oapwqvqAtauGV5gqpMqWlSfsFE4A38Ylh7aFiMOSIBI/edit#slide=id.g23a7465b127_0_5 for more discussion about this.

I don't think using brand new timezone is reasonable (ignoring emotional part).

Could you explain more what you mean by this? It's IANA, not ECMAScript, that's adding new identifiers in response to geopolitical changes like Kiev being renamed to Kyiv. If IANA introduces a new identifier, we'd have to start using it at some point, right?

The context is that there are many years' worth of complaints about outdated city names being exposed by ECMAScript. So continuing to expose outdated names and angering developers isn't good either. Sadly there's no perfect answer here, only tradeoffs.

Android devices, which are dependent on the carrier to roll out updates

nit: it is up to OEMs, not carriers. Though Android 10+ phones (roughly speaking, it is more complicated) can get updates from Mainline.

This is really helpful info to understand. Thanks for sharing!

wait a bit (maybe a few months?)

Items you mentioned are unlikely to be updated in that time frame.

Makes sense. Thanks for clarifying.

@ljharb
Copy link
Member

ljharb commented Apr 27, 2023

How could I use Temporal to get from Asia/Calcutta to Asia/Kolkata? Is there some kind of .canonicalID() method or something?

@justingrant
Copy link
Collaborator Author

My current thinking is that offering a "canonicalize this ID" API is a bad idea, because it would encourage users to take a dependency on a potentially moving target, and make code fragile if the canonical ID changes (like Kiev=>Kyiv did in 2022).

There are essentially three choices:

  1. Always canonicalize, but never change the canonical ID. This is V8's and Safari's current behavior that developers complain about.
  2. Always canoncalize, and update the canonical ID when it changes (like Firefox does), which risks breaking working code when the engine updates.
  3. Stop canonicalizing user inputs (and maybe the OS ID too, per above), so that developers stop caring about which ID is the canonical one because it no longer matters. When developers need to compare two IDs for equality, we'd add (in this proposal) a new TimeZone.p.equals method that would (non-observably) canonicalize the IDs and return true if they match, false if not.

I'm currently thinking that (3) is the least-bad choice. The model I have in mind is like a case-insensitive SQL database colunm, where "Jordan", "jordan", and "JORDAN" are considered the same when it comes to comparison but the table still stores whatever string the user provided.

There are tradeoffs to this approach, notably that === becomes unreliable to compare IDs, but honestly it's already unreliable (at least in Firefox) because canonical IDs can change. If you compare an ID you stored last year to an ID you store today, they may be different. No matter what, there needs to be some ECMAScript code run to tell you if two IDs represent the same time zone.

It's possible that there may be use cases where a "canonicalize this ID" API is desperately needed, but so far I haven't heard any. Also, such an API could always be added later if needed, so I think it's probably good to ship without it and see how developers adjust to a world where canonicalization matters less before assuming we need that API.

@ljharb
Copy link
Member

ljharb commented Apr 28, 2023

I like 3 in that I then don't have to care about which is canonical; it would indeed make comparing the strings by === a bad practice (also, since strings are allowed and not just TimeZone instances, i assume a static method that takes two strings would be available, with or without the prototype method)

@justingrant
Copy link
Collaborator Author

I like 3 in that I then don't have to care about which is canonical; it would indeed make comparing the strings by === a bad practice

Yes, exactly! It's already a bad practice, but most developers probably don't realize that it's bad yet. The changes we're proposing will just make it more explicit that === should be avoided by providing a discoverable alternative.

since strings are allowed and not just TimeZone instances, i assume a static method that takes two strings would be available, with or without the prototype method

We considered a static method here, for exactly the reason you noted, but the existing convention in Temporal is that anything you do with a time zone (like getPossibleInstantsFor, getNextTransition, or getPreviousTransition) requires instantiating a TimeZone instance. So my assumption was that we'd follow the same convention with equals too.

If that becomes a problem, a later proposal could always add a batch of static methods to TimeZone. Sound OK?

@ljharb
Copy link
Member

ljharb commented Apr 29, 2023

The entire motivation of allowing strings, as i recall, was to avoid the performance penalty of creating an instance, so that seems like a very unfortunate thing to force.

@justingrant
Copy link
Collaborator Author

I think it's helpful to differentiate two cases:

  1. Cases where requiring objects made it harder for implementations to optimize common operations, for example the month getter or add method of a ZonedDateTime, where a calendar and time zone is used behind the scenes but userland callers don't call TimeZone or Calendar functionality directly.
  2. Cases where developers want to call TimeZone or Calendar functionality directly. In Temporal, almost all functionality lives on the prototype of each type, with the only exceptions being from factory methods and the compare methods used for sorting. So there'd have to be a really strong case to break from that pattern.

The problem with (1) was that implementers couldn't optimize unless they knew that no observable time zone or calendar code had to be run while performing those operations. With object-only calendars and time zones, knowing "is this an unmodified built-in calendar and/or time zone?" was hard and/or brittle. Now, it's easy: if the slot has a string, then it's safe to optimize.

For (2), I'd be hesitant to break Temporal's prototype-focused API for this relatively uncommon case. Especially since every other Temporal type's equals method lives in the prototype.

I guess we could also offer a TimeZone.compare static method that sorted alphabetically by canonical ID, matching the sort order of Intl.suportedValuesOf('timeZone'). That might be slightly faster than a prototype equals. But I'd be inclined to wait until a future proposal to see if that's really needed.

@Yqwed
Copy link

Yqwed commented May 2, 2023

which ID do you think should be returned? The OS's ID or the canonical one according to IANA?

Isn't it up to factory method? What I mean is that if timezone is X according to the OS and Temporal.Now.timeZoneId() result is different to TimeZone.of("X") that might be surprising and non-obvious.

nit: there is no "IANA canonical" thing.

If IANA introduces a new identifier, we'd have to start using it at some point, right?

Unfortunately I don't have a good answer here. Currently we are planning to use Europe/Kyiv in Android V only (to be released in 2024).

What I've meant is that one needs to be careful with new IDs: you really don't need to expose Olson ID to users and the use of new ID might affect user experience as it will take time for all the server you interact with to update.

So continuing to expose outdated names and angering developers isn't good either

I see it boiling down to "happy developers" vs "happy end users".

Android devices, which are dependent on the carrier to roll out updates

I forgot to mention that carriers are responsible for NITZ signal. They can't ignore timezone changes. So we depend on carriers - it's not phones they need to update, but signals their cell towers send.

@justingrant
Copy link
Collaborator Author

which ID do you think should be returned? The OS's ID or the canonical one according to IANA?

Isn't it up to factory method? What I mean is that if timezone is X according to the OS and Temporal.Now.timeZoneId() result is different to TimeZone.of("X") that might be surprising and non-obvious.

There are three sources of time zone IDs are output by ECMAScript:

  1. IDs that originally came from programmer inputs to ECMAScript APIs like Temporal.TimeZone.from (the ECMAScript factory method)
  2. The ID of the OS
  3. An enumeration of IDs (i.e. what you'd use as the value of a UI time-zone picker, where the user-visible text of that picker would be the localized name of the time zone) that comes from the version of the IANA TZDB baked into the ECMAScript implementation. (I know that other platforms like Java offer an out-of-band way to update the IANA TZDB without updating Java, but currently ECMAScript doesn't do this. Instead, ES implementations package the TZDB inside the implementation and it currently cannot be overridden.)

The fundamental change in this proposal is in case (1): when a developer provides an Olson ID as input to an ECMAScript API. Currently, that ID is always canonicalized to ECMAScript's current canonical ID for that time zone. We're proposing to stop doing that. The proposed behavior is to retain the ID that the caller provided: if I pass Asia/Calcutta as input, then I get Asia/Calcutta as output. If I pass Asia/Kolkata as input, then I get Asia/Kolkata as output. The goal is to reduce the impact to programs when canonical IDs change for a time zone.

Case (3) seems like we can't avoid returning the canonical ID back to the caller, because it would be bad to have a UI time zone picker with two identical entries with the same localized name, but one has a value of Asia/Calcutta and the other has a value of Asia/Kolkata. We have to pick a single ID for each set of IDs that correspond to the same time zone, and I'm assuming that the canonical ID would be a better choice to return than some other way to choose the winning ID, e.g. alphabetic sort.

For case (2), we're unsure. Should (2) align with (1) and always return the ID that the OS gives us without changing it? Or should (2) align with (3) and always canonicalize what the OS provides before returning it to the caller?

It sounds like your opinion is that aligning (1) and (2) is more important than aligning (2) and (3)? Is that correct?

So continuing to expose outdated names and angering developers isn't good either

I see it boiling down to "happy developers" vs "happy end users".

Yep, agree. Trying to resolve this tradeoff is one of the main problems that this proposal is trying to solve.

One way we're trying to resolve it is by reducing the importance of canonicalization to how ECMAScript programs behave. If ECMAScript doesn't change the input it gets from developers (and from the OS, per above?), then even if canonical IDs change then there's less of a chance that it will break existing code. At least that's the theory. What do you think about this plan?

If IANA introduces a new identifier, we'd have to start using it at some point, right?

Unfortunately I don't have a good answer here. Currently we are planning to use Europe/Kyiv in Android V only (to be released in 2024).

What I've meant is that one needs to be careful with new IDs: you really don't need to expose Olson ID to users and the use of new ID might affect user experience as it will take time for all the server you interact with to update.

Yeah, this is a hard problem without an obvious easy answer. The best idea I've had is similar to what you're doing in Android: wait for some period of time before showing that ID to users.

Specifically, what I'm thinking is that ECMAScript would do this:
a) when new IDs are added, start recognizing them ASAP to accommodate remote systems who send that ID
b) but for renamed IDs (like Kiev=>Kyiv), we'd recognize the ID but not make the new name canonical (meaning not return it in the enumeration case (3) above) until some period of time has elapsed, so that other servers have time to catch up. (It sounds like Android is using 2 years, which seems reasonable to me.)

I forgot to mention that carriers are responsible for NITZ signal. They can't ignore timezone changes. So we depend on carriers - it's not phones they need to update, but signals their cell towers send.

This is good info, thanks! Does the NITZ signal provide the Olson time zone name? Or just the UTC offset? The specification isn't clear and there are no examples in it. I found this document which suggests that it's just the offset, but I wasn't sure if that varied between carriers.

If it's just an offset, then does Android automatically translate between the NITZ timezone and Olson ID using geo-location? If not, then how is the OS's Olson ID determined after NITZ tells the phone that the time zone has changed?

nit: there is no "IANA canonical" thing.

Yep, what I mean when I say "IANA canonical" is "The canonical ID according to the time zone database that the ECMAScript implementation embeds, which may be different from the IANA TZDB used by the OS."

Note that some ECMAScript implementations like V8 don't use TZDB directly, but instead use CLDR data which roughly follows TZDB if it were built with PACKRATDATA=backzone PACKRATLIST=zone.tab, except that the canonical ID is never changed after it's introduced, so CLDR still lists Asia/Calcutta, Europe/Kiev, and Asia/Saigon as canonical.

So maybe a better way of saying what I was trying to ask would be this: "which ID do you think should be returned? The OS's ID or the canonical one according to ECMAScript's current time zone data?"

@Yqwed
Copy link

Yqwed commented May 3, 2023

UI time-zone picker

I think timezone picker and enumeration of all the supported timezones are related, but different problems.
(I might unknowingly push the way it is done in Android. We keep per-region list of timezones. It is not yet exposed via APIs though, result file is read by the Settings timezone picker only).

It sounds like your opinion is that aligning (1) and (2) is more important than aligning (2) and (3)? Is that correct?

Ahhh, now I see what you mean. I thought that Intl.supportedValuesOf('timeZone') returns equivalent IDs, like ICU's getEquivalentID.

Before I answer the question: Do I understand correctly that supportedValuesOf('timeZone') will return only one of Asia/Kolkata and Asia/Calcutta?

At least that's the theory. What do you think about this plan?

That sounds reasonable to me.

Does the NITZ signal provide the Olson time zone name

Unfortunately, they don't. NITZ provides offset and DST flag only. Then we try to find matching timezone for that country. Please see our documentation.

Geo-location based timezone detection was recently introduced, and is also documented.

PACKRATDATA=backzone PACKRATLIST=zone.tab

Be careful with backzone file [1]. TBH I thought ICU does not use it at all. Or maybe that's the way we build ICU dat files in Android.

So maybe a better way of saying what I was trying to ask would be this: "which ID do you think should be returned? The OS's ID or the canonical one according to ECMAScript's current time zone data?"

It's tough. Will that ID be communicated back to the OS? Or is that browser's implementation detail?

[1] https://mm.icann.org/pipermail/tz/2023-May/032948.html

@justingrant
Copy link
Collaborator Author

We keep per-region list of timezones.

This data is well-structured. Very helpful to see. Thanks so much for sharing it. This file seems to have a lot of overlap in its intent and usage as CLDR's timezone.xml. Except the Android one has more useful metadata about how to choose the right alias. I wonder if it might make sense at some point to merge these into a single CLDR-maintained data set? Or to otherwise more closely align CLDR's data with what you're doing here so that there'd be more consistency among various CLDR-using software that Android devices communicate with?

I really like how this file already has Europe/Kyiv added as an alias although it's not canonical yet. This is the same least-worst option I was considering for ECMAScript, where we'd quickly add new IDs to the "recognized" list, but wait some period of time (1-2 years?) before making them canonical if they're renames of an existing ID. It's encouraging that you adopted the same approach too.

What do you do in Android about brand-new IDs that are not renames, like America/Ciudad_Juarez that was recently introduced? Do you add them as separate zones as soon as they're added to TZDB, or do you also wait a few years for other systems to catch up? Or do you add them as aliases of an existing zone, and a few years later split them into their own zone?

Ahhh, now I see what you mean. I thought that Intl.supportedValuesOf('timeZone') returns equivalent IDs, like ICU's getEquivalentID.

Nope. Currently in V8, it returns the results of an ICU enumeration API:

TimeZone::createTimeZoneIDEnumeration(UCAL_ZONE_TYPE_CANONICAL_LOCATION, NULL, NULL, errcode);

In Safari, it returns the same thing but adds UTC. In Firefox, it returns canonical IDs according to FF's build of IANA data, including Etc/* zones that the other engines omit. See tc39/ecma402#778 for my concerns about this variance.

But regardless, only one ID per set of aliases are returned by Intl.supportedValuesOf('timeZone'). The core use case is supplying IDs for the value of a time zone picker UI, and that use case essentially requires having only one ID per set of aliases.

Before I answer the question: Do I understand correctly that supportedValuesOf('timeZone') will return only one of Asia/Kolkata and Asia/Calcutta?

Correct. It will return only one of Asia/Kolkata and Asia/Calcutta. Currently, Firefox (which gets its canonicalization directly from IANA) returns Asia/Kolkata. V8 and Safari return what ICU returns from TimeZone::GetCanonicalID, which is Asia/Calcutta. Fixing this divergence is one of the reasons why this proposal exists.

ICU and CLDR are currently investigating whether they can expose an API (likely a new one) that would enable returning Asia/Kolkata in that case.

Be careful with backzone file [1].

Thanks for the pointer. My assumption (may be wrong?) was that we needed to use backzone, otherwise not all IDs in zone.tab would have time zone rules in the build output of TZDB. And also that zones in backzone that do have valid data before 1970 (e.g. Europe/Copenhagen, Europe/Oslo, Atlantic/Reykjavik) would be replaced with other zones' data because TZDB's 2022 releases merged those zones into different Zones like Europe/Berlin or Africa/Abidjan. Maybe these were incorrect assumptions?

In general I think we agree most closely to the philosophy of the global-tz fork maintained by the Joda Time maintainer, and my understanding was that PACKRATDATA=backzone PACKRATLIST=zone.tab should generate the same data as global-tz. But that assumption also might be mistaken?

TBH I thought ICU does not use it at all. Or maybe that's the way we build ICU dat files in Android.

I know that ICU imports data from TZDB via some scripts they run when new TZDB releases come out. I'm not sure exactly how their import process works. I suspect you're correct here, although I am not sure.

One of the work items in this proposal is to come up with normative guidelines for how ECMAScript engines should build TZDB in order to ensure more consistency across engines. Part of these guidelines will probably be recommending the build options of TZDB and/or a particular way of accessing CLDR, if not as a requirement then as an example of one way to obtain time zone data in the recommended way.

So maybe a better way of saying what I was trying to ask would be this: "which ID do you think should be returned? The OS's ID or the canonical one according to ECMAScript's current time zone data?"

It's tough. Will that ID be communicated back to the OS? Or is that browser's implementation detail?

If "communicated back to the OS" means what I think it does, then no. The data flow for Temporal.Now.timeZoneId() is one-way: OS => ECMAScript engine => ECMAScript program calling that API. An ECMAScript program will almost never set the OS's time zone ID, with the only really-rare exception might be a native app or script that is written in ECMAScript and that changes OS settings.

That said, I could also see cases where an ECMAScript program (like a React Native app) might call Temporal.Now.timeZoneId() in some places, but also use native libraries that call native code to fetch the OS ID. So even if ECMASCript is (almost) never used to set the OS's time zone, there's still an opportunity for the variation between the OS and ECMAScript to be noticeable and to cause inconsistency inside of ECMAScript programs.

Also, another case I was thinking of was sysadmins who wanted to lock the OS on a particular older ID (e.g. Europe/Kiev) for backwards-compatibility or political reasons. My assumption is that those sysadmins would not like ECMAScript to ignore their overrides.

I'm not sure about the priority of those cases. Both seem realistic but probably uncommon.

Do you think that it's more important for Temporal.Now.timeZoneId() to align with the behavior of the factory method ("keep the input ID unchanged") or the enumeration method ("always return the current canonical ID")?

@Yqwed
Copy link

Yqwed commented May 4, 2023

I wonder if it might make sense at some point to merge these into a single CLDR-maintained data set?

There are Android-specific things like defaultTimeZoneBoost (it is more about timezone detection) and shownInPicker. It might be hard to please everyone.

BTW, that file is the one we maintain manually, but actually used one is tzlookup.xml. It is very similar, based on countryzones.txt, but it also has historic timezones (see Argentina, for example). Here is the code which does that.

What do you do in Android about brand-new IDs that are not renames, like America/Ciudad_Juarez that was recently introduced?

We add them to the list as soon as they are released. This situation is different from timezone renaming.

Or do you add them as aliases of an existing zone, and a few years later split them into their own zone?

If a new timezone is introduced (a brand new one, not an alias) it usually means that existing timezones are different from it (in the future, currently, or in the past).

In America/Ciudad_Juarez case technically you can replace it with America/Denver, as they are identical since the end of 2022, but we can use only it only as a temporary solution if a device does not recognize America/Ciudad_Juarez.

otherwise not all IDs in zone.tab would have time zone rules in the build output of TZDB

I think set of supported IDs will be the same. Haven't checked though.

And also that zones in backzone that do have valid data before 1970

That's where your definition of valid data diverges from Paul Eggert's :)

my understanding was that PACKRATDATA=backzone PACKRATLIST=zone.tab should generate the same data as global-tz

I haven't checked how exactly output of global-tz is different from the upstream. I don't have strong opinion here and we received no bugreports about weird API behaviour for pre-1970 dates, so...

there's still an opportunity for the variation between the OS and ECMAScript to be noticeable and to cause inconsistency inside of ECMAScript programs

I guess you situation is more complicated than ours: we can make sure that APIs (native (mktime/localtime) / ICU4(C|J)/java.util.TimeZone/java.time.*) are consistent as we control the data they use.

it's more important for Temporal.Now.timeZoneId() to align with the behavior of the factory method ("keep the input ID unchanged")

With such approach if a developer is unhappy about OS's timezone, they can provide their own overrides.

the enumeration method ("always return the current canonical ID")

But in such case Temporal.ZonedDateTime.from will work with timezones which are allegedly unsupported (not listed in Intl.supportedValuesOf('timeZone')). Is that OK?


Am really sorry if what say is confusing or make things appear more complicated than they are - I haven't developed timezone APIs so far, I am just responsible for already existing ones.

@justingrant
Copy link
Collaborator Author

it's more important for Temporal.Now.timeZoneId() to align with the behavior of the factory method ("keep the input ID unchanged")

With such approach if a developer is unhappy about OS's timezone, they can provide their own overrides.

@Yqwed could you explain more about what you mean by "provide their own overrides"? Do you mean overrides inside ECMAScript code by creating custom time zones? Overrides by providing their own TZDB data to the OS and/or to Node? Overrides meaning they can reassign the OS's ID themselves, or their sysadmin can do it? Something else?

I want to make sure that I understand what you had in mind.

the enumeration method ("always return the current canonical ID")

But in such case Temporal.ZonedDateTime.from will work with timezones which are allegedly unsupported (not listed in Intl.supportedValuesOf('timeZone')). Is that OK?

This is already true. Intl.supportedValuesOf('timeZone') only returns canonical time zone IDs, but Temporal.ZonedDateTime.from accepts any valid time zone ID, including deprecated aliases, variations in case like america/los_angeles, etc. IMO, "supportedValuesOf" was somewhat of a misleading name for this API because "supported" implies what inputs are OK, but what the API really does (at least for time zones) is tell the developer what list of IDs to use as the value in a UI dropdown list of time zones. But the name is too late to change! :-(

So the only change I'm wondering about is whether it's OK for Temporal.TimeZone.timeZoneId() to return a value that's not present in Intl.supportedValuesOf('timeZone').

And also that zones in backzone that do have valid data before 1970

That's where your definition of valid data diverges from Paul Eggert's :)

Yeah. My take on the TZDB controversy is that there are two reasonable but irreconcilable positions:

  • There's a lot of old data that's of dubious accuracy, and maintaining all those old IDs and data requires a lot of maintenance burden, so let's just remove them.
  • Some of the old data is quite accurate, esp. for European countries. Also, removing IDs (and worse, merging them across country boundaries) is quite disruptive, so we shouldn't do it.

I'm sympathetic to both sides of the debate. It seems like an appropriate minimal compromise to keep one zone per country even if most intra-country zones are merged. It's unfortunate that this kind of compromise seems to have been overtaken by the more extreme positions above.

The only really strong opinion I have is that merging across country boundaries is really bad, because then a stored timestamp string (with a time zone name) representing a future date might become inaccurate in the future if one of those two countries decides to change their offset or DST policy.

The same potential problem could happen with intra-country merges, but modern time zone policy is almost always done at the country level, so it's relatively rare that an intra-country region would, for example, stop using DST and then change it back later.

Although in countries that do tend to set DST policy at the sub-country level--for example Canada, Argentina, and perhaps Brazil, Russia, and a few others--I'd be inclined to leave at least the existing intra-country zones at the state/province level. For example, I think deprecating America/Montreal might end up being a bad idea, but I think the mergers of zones representing small counties in Indiana seem OK.

I haven't checked how exactly output of global-tz is different from the upstream. I don't have strong opinion here and we received no bugreports about weird API behaviour for pre-1970 dates, so...

Yeah, I don't think the pre-1970 data is really a big deal for computing. Very little software (other than astrology programs?) is written that cares about exact time of day over 50 years ago. AFAICT, per-country zone merges are a much bigger deal for most real software.

there's still an opportunity for the variation between the OS and ECMAScript to be noticeable and to cause inconsistency inside of ECMAScript programs

I guess you situation is more complicated than ours: we can make sure that APIs (native (mktime/localtime) / ICU4(C|J)/java.util.TimeZone/java.time.*) are consistent as we control the data they use.

Yep, that's much easier! At the last ECMA TG2 meeting, @Constellation said the same thing about iOS, where native ICU always matches WebKit's ICU. So I guess this is only a problem on desktop platforms. Good to know.

Am really sorry if what say is confusing or make things appear more complicated than they are - I haven't developed timezone APIs so far, I am just responsible for already existing ones.

No worries, your feedback is very helpful! I'm grateful for it.

@Yqwed
Copy link

Yqwed commented May 12, 2023

"provide their own overrides"? Do you mean overrides inside ECMAScript code by creating custom time zones? Overrides by providing their own TZDB data to the OS and/or to Node? Overrides meaning they can reassign the OS's ID themselves, or their sysadmin can do it? Something else?
I want to make sure that I understand what you had in mind.

I mean that if existing (or planned) APIs allow specifying timezone ID so that developers won't struggle with "this thing implicitly uses OS's timezone ID" and there API which do not canonicalise (sorry for using this term) then they can have list of overrides (Europe/Kiev -> Europe/Kyiv and so on) and it should be fine and no admin will stop them from using IDs they want.

what list of IDs to use as the value in a UI dropdown list of time zones

IDs are geopolitics sensitive and I am not sure that specific UI implementation should drive this decision - it can be dropdown list, but it also can be a map or "Choose region -> choose from a smaller subset" like solution.

I think if a developer wants to implement timezone picker of any sort, they should use zone1970.tab as first approximation and modify it to the market they are focused in.

Intl.supportedValuesOf('timezone') on Chrome returns America/Kentucky/Monticello.

It also has America/Kralendijk in the returns array, which is alias to America/Puerto_Rico.

So I guess this is only a problem on desktop platforms. Good to know.

I think Android's Firefox uses system APIs for timezones and Chrome uses built-in ICU. You might also see it beyond desktops :)

@justingrant
Copy link
Collaborator Author

justingrant commented Jun 28, 2023

Circling back on this discussion, this proposal will recommend that implementations delay 2 years after an existing ID is renamed in TZDB. Here's the relevant spec text:

    <ins class="block">
    <emu-note>
      <p>
        Although the IANA Time Zone Database maintainers strive for stability, in rare cases (averaging less than once per year) a Zone may be replaced by a new Zone.
        For example, in 2022 "*Europe/Kiev*" was deprecated to a Link resolving to a new "*Europe/Kyiv*" Zone.
      </p>
      <p>
        To reduce disruption from renaming changes, ECMAScript implementations are encouraged to initially add the new Zone as a non-primary time zone identifier that resolves to the current primary identifier.
        Then, after a waiting period, implementations are recommended to promote the new Zone to a primary time zone identifier while simultaneously demoting the deprecated name to non-primary.
        The recommended waiting period is two years after the IANA Time Zone Database release containing the changes.
        This delay allows other systems, that ECMAScript programs may interact with, to be updated to recognize the new Zone.
      </p>
      <p>
        A waiting period should only apply when a new Zone is added to replace an existing Zone.
        If an existing Zone and Link are swapped, then no waiting period is necessary.
      </p>
    </emu-note>
    </ins>

I'll leave this issue open for a while to allow for additional user and/or implementer feedback on this proposed solution.

For the other issues that @Yqwed raised in this thread, I agree that ECMAScript implementations should be more consistent, but after discussions with multiple implementers, I now believe that this consistency work will happen outside the scope of the ECMAScript spec. A good first step is for CLDR and ICU to expose the latest IANA canonical IDs. See https://unicode-org.atlassian.net/browse/CLDR-14453 for latest status of that.

When that's available, I think it makes sense to lobby V8 and JSC to start using the canonical IDs that CLDR provides. That will be a good time to encourage implementations to be more consistent and to rip out whatever special cases they may have added.

@bergus
Copy link
Contributor

bergus commented Oct 6, 2023

It's possible that there may be use cases where a "canonicalize this ID" API is desperately needed, but so far I haven't heard any.
[…]
Intl.supportedValuesOf('timeZone') only returns canonical time zone IDs, […] what the API really does (at least for time zones) is tell the developer what list of IDs to use as the value in a UI dropdown list of time zones.

Consider such a selection element:

<Picker
  value={database.getTimezoneId()} // or undefined, if not existing yet
  defaultValue={Temporal.Now.timeZoneId()} // or new Intl.DateTimeFormat().resolvedOptions().timeZone?
  data={Intl.supportedValuesOf('timeZone')}
  onChange={database.putTimezoneId}
/>

I see conflicting requirements:

  • the dropdown options should display the recent canonical time zone IDs
  • the dropdown should show one of them as the active value, even if a non-canonical time zone ID is returned from the database (due to manual data entry, or because it was a canonical ID at the point in time when it was stored).

So yes, I think that canonicalisation is needed. One could already use

  value={Intl.supportedValuesOf('timeZone').find(TimeZone.prototype.equals.bind(TimeZone.from(database.getTimezoneId()))}

but that's cumbersome. I'd really prefer to have a method/getter for this on the TimeZone object itself, such as

  value={TimeZone.from(database.getTimezoneId()).canonicalId} // or .canonical().id

Of course, changing canonical IDs are not good. Exposing .canonicalId might lead to developers writing database.putTimezoneId(TimeZone.from(input).canonicalId), under the assumption that this will always output the same for the same input, and not canonicalise upon reading from the database again. This incompatibility becomes a problem every time developers try to output or send a canonical time zone ID, be it to a system from the past (that may not yet have been updated) or to the future (via storage). For that kind of usage, the TimeZone API should also expose a .stableId field (or .stable().id), which hopefully makes developers realise that canonical IDs are not necessarily stable.

Is that what the statement "exposing canonical identifiers has been a source of grief in every software platform" is about? I would nonetheless disagree that this kind of API should be deferred to a later proposal. People will need, and find, ugly workarounds to achieve canonicalisation.

The only really strong opinion I have is that merging across country boundaries is really bad, because then a stored timestamp string (with a time zone name) representing a future date might become inaccurate in the future if one of those two countries decides to change their offset or DST policy.

I could not agree more. Merging timezones requires a prediction that the two time zones will never diverge. This should not be based only on the fact that they haven't diverged so far - predictions are difficult, especially so when concerning the future. This is where geopolitics come into play, not a favourite topic in technical discussions. But with isolationism on the rise, it's unfortunately not an unlikely scenario - e.g. when the European Union discussed their plans to abandon summer time, various members have signalled that they would adopt separate standard times not CET.
So I would hope that users can always choose a time zone from their current country, and not become unhappy in the future because it had been canonicalised to a time zone whose rules later diverged from their chosen one.

Edit: Ah, it seems that is solved by the rule a) for "primary time zone identifier" that is proposed in tc39/ecma402#806. I hope this lands asap and does not wait for Temporal.

@justingrant
Copy link
Collaborator Author

Thanks for your thoughtful comment!

So yes, I think that canonicalisation is needed. One could already use

  value={Intl.supportedValuesOf('timeZone').find(TimeZone.prototype.equals.bind(TimeZone.from(database.getTimezoneId()))}

but that's cumbersome.

The core issue here is that equals not === should always be used to compare two time zone IDs for equality. As long as users follow this guideline, they'll end up with a correct result. But a consequence of this guideline is that comparisons will be somewhat less ergonomic. Another thing I don't like about exposing canonicalId would be that it'd blur this guideline, because then sometimes comparing ids with === would be OK, and sometimes it wouldn't. I think it's more consistent (and hence easier to enforce via code reviews, ESLint rules, etc.) if the rule is simply "don't use ===".

Also, although it's less ergonomic than using ===, it's not necessarily that bad, esp. if you assign local variables to avoid running Intl or Temporal options repeatedly.

const ids = Intl.supportedValuesOf('timeZone');
timeZoneFromDatabase = Temporal.TimeZone.from(idatabase.getTimezoneId());
value = timeZoneFromDatabase && ids.find(id => timeZoneFromDatabase.equals(id));

if (timeZoneFromDatabase && !value) {
  // handle the "unknown time zone" case
}

<Picker
  value={ value }
  defaultValue={ timeZoneFromDatabase ?? Temporal.Now.timeZoneId() }
  data={ ids }
  onChange={ database.putTimezoneId }
/>

Is that what the statement "exposing canonical identifiers has been a source of grief in every software platform" is about?

Yes, although I probably should have been clearer than the biggest problem is auto-canonicalization, where an input ID is canonicalized before storing it. So you can use Asia/Calcutta as input and get Asia/Kolkata back, or vice versa. This is really problematic because it means that the behavior of working code will change based on updates to the IANA Time Zone Database. It's especially bad for Java. Unlike in (before this proposal) ECMAScript where engines control which IANA Time Zone Database data is used, in Java users can update TZDB on their own. This means that if TZDB makes a controversial canonicalization decision (like resolving Europe/Amsterdam to Europe/Berlin), there's nothing that a Java program can do about it. This is why, for example, some folks in the Java community decided to create the global-tz fork of TZDB.

Simply exposing canonicalization is not as bad as if it's always done, but it can still cause problems where users think that === is a safe way to compare time zone IDs.

that is solved by the rule a) for "primary time zone identifier" that is proposed in tc39/ecma402#806. I hope this lands asap and does not wait for Temporal.

AFAIK, current ECMAScript engines only allow one ID per country, with the one exception of Europe/Bratislava that we're trying to fix. The reason for this proposed change is to make the ECMAScript spec match the reality of what engines are already doing.

@bergus
Copy link
Contributor

bergus commented Oct 7, 2023

Oh I totally agree that .equals() should be preferred over .id == .id, and that the (non-canonicalised) .id should be used for transporting time zones as data without mangling. Canonicalisation in the new TimeZone constructor would be the nightmare you describe for Java.
But there are cases where you cannot avoid the === operator, e.g. in the <Picker> example where it does compare the value with the given data by equality to decide which of the options should be in the selected state. Another example might be the construction of a Map with one value per canonical time zone, i.e. where you need to use the canonical time zone ID as the key.

It's not necessarily that bad, esp. if you assign local variables to avoid running Intl or Temporal options repeatedly.

I think it is bad from a performance perspective if I have to iterate through the array of canonical identifiers. (I don't mind readability, code can always be written in different ways to suit different preferences). Sure, the array is not super large, but it is still inefficient, and I think it's a shame to have to do this when the runtime already has the mapping available internally. That is what I take issue with.

Similarly, it would be great to have .stableId available for the backcompat case where you have to send data to legacy systems that have not yet received the update with the latest time zone database, and would not be able to understand the new ids. (Sure, this is impossible to solve for completely new time zones, but at least covers renamings - which are rare but all the more annoying to deal with).

But back to ergonomics and making the correct usage easier to use than the broken one. Having a .canonicalId/.stableId property would make enforcing the rule "never compare .id" much easier than the alternative .canonical()/.stable() method returning another (or the same) TimeZone. People would write working code like a.canonical().id === b.canonical().id and fail to understand the risks of establishing this pattern (especially when .canonical() is called not within the same expression), and would complain about the linter asking them to use .equals().
I'm open for bikeshedding better names - for more precise semantics or awkwardness of use so that .equals() seems the easier choice. Although in my personal opinion, a.canonicalId === b.canonicalId is not much worse than a.equals(b) and does not need to be discouraged. The rule should be "don't use === with .id", not "don't use === with timezones at all", wouldn't that be just as enforceable/lintable?

Some other ideas that would make .equals() easier to use than the alternative, but still offer canonicalisation/stability in the API:

  • provide static methods, i.e. TimeZone.canonicalFor(someTzId) or TimeZone.from(…, {canonical: true})
  • make it a symbol property: new TimeZone(someTzId)[Symbol.canonicalTimeZoneId]

Btw, I just noticed custom timezones. To make .equals() play nicely with those, a .canonicalId (symbol) property or .canonical() method on TimeZone instances would be much preferable over a static solution. Then .equals() could use that property/method instead of using its internal resolution mechanism which knows only valid IANA tzdb identifiers, and a custom object could have an overridden property/method. A use case would be polyfilling "new" time zones on platforms that do not yet support them - you could create const kyev = Object.assign(new TimeZone("Europe/Kiev"), {id: "Europe/Kyiv"}) and it would work out of the box (and not just kyev.equals(kiev), which could easily have been overridden without support for this, but also kiev.equals(kyev)!). Otherwise one would have to monkeypatch TimeZone.prototype.equals to make it recognise the custom time zone and their canonicalisations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants