-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normative: specify time zone ID requirements to reduce divergence between engines #877
base: main
Are you sure you want to change the base?
Conversation
aa65347
to
3ea9cee
Compare
3ea9cee
to
6a544eb
Compare
Thanks for putting this together @justingrant! We can discuss this at the next TG2 meeting. In the mean time, I encourage the listed reviewers to take a look. |
It's fine to submit tests to test262 that no engine can pass yet, as long as they are correct according to the current snapshot of ECMA-262 or 402 or a Stage 3 proposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel qualified to review the details — I mostly haven't followed those discussions. At a general level, this all looks very reasonable.
c78de1d
to
3bb6dee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly editorial comments, but I am not comfortable with making the rename waiting period mandatory.
I assuem you mean to say "e.g. PST8PDT=>America/Los_Angeles" ? |
28d8a5a
to
116ec3b
Compare
I just pushed a new commit that includes what I think resolves all review feedback. @gibson042 @sffc (and anyone else who's interested) do you want to re-review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 editorial suggestions and one request for better specification.
0810433
to
b458ba3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The algorithm looks good, but I'd like to use more precise phrasing than «the first, space-delimited column in backzoneLinkLine after…».
spec/locales-currencies-tz.html
Outdated
1. Let _backzoneLinkLines_ be the subset of lines in the file <code>backzone</code> of the IANA Time Zone Database that start with either *"Link"* or *"#PACKRATLIST zone.tab Link"*. | ||
1. Assert: Exactly one line in _backzoneLinkLines_ contains _identifier_ in the second, space-delimited column after *"Link"* or *"#PACKRATLIST zone.tab Link"*. | ||
1. Let _backzoneLinkLine_ be the line in _backzoneLinkLines_ that contains _identifier_ in the second, space-delimited column after *"Link"* or *"#PACKRATLIST zone.tab Link"*. | ||
1. Set _primary_ to the contents of the first, space-delimited column in _backzoneLinkLine_ after *"Link"* or *"#PACKRATLIST zone.tab Link"*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've wanted a StringSplit operation for a while (IsWellFormedUnitIdentifier is practically begging for it), but in the absence of such I'd like to be crystal-clear in this kind of algorithm.
1. Let _backzoneLinkLines_ be the subset of lines in the file <code>backzone</code> of the IANA Time Zone Database that start with either *"Link"* or *"#PACKRATLIST zone.tab Link"*. | |
1. Assert: Exactly one line in _backzoneLinkLines_ contains _identifier_ in the second, space-delimited column after *"Link"* or *"#PACKRATLIST zone.tab Link"*. | |
1. Let _backzoneLinkLine_ be the line in _backzoneLinkLines_ that contains _identifier_ in the second, space-delimited column after *"Link"* or *"#PACKRATLIST zone.tab Link"*. | |
1. Set _primary_ to the contents of the first, space-delimited column in _backzoneLinkLine_ after *"Link"* or *"#PACKRATLIST zone.tab Link"*. | |
1. Let _backzone_ be *undefined*. | |
1. Let _backzoneLinkLines_ be the List of lines in the file <code>backzone</code> of the IANA Time Zone Database that start with either *"Link "* or *"#PACKRATLIST zone.tab Link "*. | |
1. For each element _line_ of _backzoneLinkLines_, do | |
1. If _line_ starts with *"#PACKRATLIST zone.tab "*, set _line_ to the substring of _line_ from 22. | |
1. Assert: _line_ starts with *"Link "*. | |
1. Let _i_ be 5. | |
1. Let _j_ be StringIndexOf(_line_, *" "*, _i_). | |
1. Assert: _j_ is not ~not-found~ and _j_ > _i_. | |
1. Let _k_ be StringIndexOf(_line_, *" "*, _j_ + 1). | |
1. If _k_ is ~not-found~, set _k_ to the length of _line_. | |
1. Assert: _k_ > _j_ + 1. | |
1. Let _alias_ be the substring of _line_ from _j_ + 1 to _k_. | |
1. If _alias_ is _identifier_, then | |
1. Assert: _backzone_ is *undefined*. | |
1. Set _backzone_ to the substring of _line_ from _i_ to _j_. | |
1. Assert: _backzone_ is not *undefined*. | |
1. Set _primary_ to _backzone_. |
A worked example of e.g. Pacific/Truk against current tzdata in an editorial note would also be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow that's hard to read. I wonder if it'd be better to introduce a string split AO into 402 in this PR and use it in this PR, then it could be re-used elsewhere in 402 later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at the commit I just pushed that includes a minimal string-split AO, borrowed from the lower half of the spec text of String.prototype.split
.
A worked example of e.g. Pacific/Truk against current tzdata in an editorial note would also be helpful.
When you say "worked example" what do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at the commit I just pushed that includes a minimal string-split AO, borrowed from the lower half of the spec text of
String.prototype.split
.
And while we're at it, I refactored IsWellFormedUnitIdentifier
to use the new AO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you say "worked example" what do you mean?
A note explaining something like this:
This algorithm attempts to resolve Links to primary time zone identifiers without crossing the boundaries of ISO 3166-1 Alpha-2 country codes, using data from files
zone.tab
andbackzone
of the IANA Time Zone Database as necessary.
For example, if "Pacific/Truk" (in country code "FM") is a Link to "Pacific/Port_Moresby" (in country code "PG") with default configuration, thenzone.tab
will be checked for lines corresponding with country code "FM". If there is only one such line, then the Zone of that line will be treated as the primary time zone identifier associated with "Pacific/Truk". But otherwise, that data will be taken from the unique Link inbackzone
with source name "Pacific/Truk" (e.g., line "Link Pacific/Chuuk Pacific/Truk" will result in primary time zone identifier "Pacific/Chuuk").
(or substitute any other example that requires use of backzone
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I added an editorial note in the latest commit. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Some final tweaks to StringSplitToList and its use, then I think this is ready.
spec/locales-currencies-tz.html
Outdated
<h1> | ||
StringSplitToList ( | ||
_S_: a String, | ||
_separator_: a String that is not the empty String, | ||
_limit_: a mathematical value in the range 1 to 2<sup>32</sup> - 1 | ||
): a List of Strings | ||
</h1> | ||
<dl class="header"> | ||
<dt>description</dt> | ||
<dd> | ||
The returned List contains substrings of _S_. | ||
These substrings are determined by searching from left to right for occurrences of _separator_; these occurrences are not part of any String in the returned List, but serve to divide _S_ into substrings. | ||
The output List will contain no more than _limit_ elements; any additional separators and/or substrings present in _S_ will be ignored. | ||
</dd> | ||
</dl> | ||
<emu-alg> | ||
1. If _S_ is the empty String, return « ». | ||
1. Let _separatorLength_ be the length of _separator_. | ||
1. Assert: _separatorLength_ is not 0. | ||
1. Let _substrings_ be a new empty List. | ||
1. Let _i_ be 0. | ||
1. Let _j_ be StringIndexOf(_S_, _separator_, 0). | ||
1. Repeat, while _j_ ≠ -1, | ||
1. Let _T_ be the substring of _S_ from _i_ to _j_. | ||
1. Append _T_ to _substrings_. | ||
1. If the number of elements in _substrings_ is _limit_, return _substrings_. | ||
1. Set _i_ to _j_ + _separatorLength_. | ||
1. Set _j_ to StringIndexOf(_S_, _separator_, _i_). | ||
1. Let _T_ be the substring of _S_ from _i_. | ||
1. Append _T_ to _substrings_. | ||
1. Return _substrings_. | ||
</emu-alg> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks; I love seeing this! But let's keep it super narrow for now to support evolution of e.g. limit (which really should limit the search for separators rather than truncating the results) and empty string handling.
<h1> | |
StringSplitToList ( | |
_S_: a String, | |
_separator_: a String that is not the empty String, | |
_limit_: a mathematical value in the range 1 to 2<sup>32</sup> - 1 | |
): a List of Strings | |
</h1> | |
<dl class="header"> | |
<dt>description</dt> | |
<dd> | |
The returned List contains substrings of _S_. | |
These substrings are determined by searching from left to right for occurrences of _separator_; these occurrences are not part of any String in the returned List, but serve to divide _S_ into substrings. | |
The output List will contain no more than _limit_ elements; any additional separators and/or substrings present in _S_ will be ignored. | |
</dd> | |
</dl> | |
<emu-alg> | |
1. If _S_ is the empty String, return « ». | |
1. Let _separatorLength_ be the length of _separator_. | |
1. Assert: _separatorLength_ is not 0. | |
1. Let _substrings_ be a new empty List. | |
1. Let _i_ be 0. | |
1. Let _j_ be StringIndexOf(_S_, _separator_, 0). | |
1. Repeat, while _j_ ≠ -1, | |
1. Let _T_ be the substring of _S_ from _i_ to _j_. | |
1. Append _T_ to _substrings_. | |
1. If the number of elements in _substrings_ is _limit_, return _substrings_. | |
1. Set _i_ to _j_ + _separatorLength_. | |
1. Set _j_ to StringIndexOf(_S_, _separator_, _i_). | |
1. Let _T_ be the substring of _S_ from _i_. | |
1. Append _T_ to _substrings_. | |
1. Return _substrings_. | |
</emu-alg> | |
<h1> | |
StringSplitToList ( | |
_S_: a String, | |
_separator_: a String, | |
): a List of Strings | |
</h1> | |
<dl class="header"> | |
<dt>description</dt> | |
<dd> | |
The returned List contains all disjoint substrings of _S_ that do not contain _separator_ but are immediately preceded and/or immediately followed by an occurrence of _separator_. Each such <emu-not-ref>substring</emu-not-ref> will be the empty String in between adjacent occurrences of _separator_, before a _separator_ at the very start of _S_, or after a _separator_ at the very end of _S_, but otherwise will not be empty. | |
</dd> | |
</dl> | |
<emu-alg> | |
1. Assert: _S_ is not the empty String. | |
1. Assert: _separator_ is not the empty String. | |
1. Let _separatorLength_ be the length of _separator_. | |
1. Let _substrings_ be a new empty List. | |
1. Let _i_ be 0. | |
1. Let _j_ be StringIndexOf(_S_, _separator_, 0). | |
1. Repeat, while _j_ is not ~not-found~, | |
1. Let _T_ be the substring of _S_ from _i_ to _j_. | |
1. Append _T_ to _substrings_. | |
1. Set _i_ to _j_ + _separatorLength_. | |
1. Set _j_ to StringIndexOf(_S_, _separator_, _i_). | |
1. Let _T_ be the substring of _S_ from _i_. | |
1. Append _T_ to _substrings_. | |
1. Return _substrings_. | |
</emu-alg> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But let's keep it super narrow for now to support evolution of e.g. limit (which really should limit the search for separators rather than truncating the results) and empty string handling.
I'm happy to remove limit, but could you explain more about "should limit the search for separators rather than truncating the results"? Do you mean that limit should cause the last substring to contain any remaining content in the string, even if there are separators in it?
I'm asking because the two cases in the spec so far both perform well using truncation because both of those cases only care about the first few substrings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we wanted to promote this AO to 262, String.prototype.split
uses truncation so a "rest" result wouldn't work for that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that limit should cause the last substring to contain any remaining content in the string, even if there are separators in it?
Yes, that is my position and IIRC there is regret on the committee that String.prototype.split
itself does not work like that (as it does e.g. Python str.split
).
the two cases in the spec so far both perform well using truncation because both of those cases only care about the first few substrings.
I agree, but am not confident that will remain the case and don't want to paint this into a corner. Anything that can be specified with truncation can also be specified without it, and implementations are not obligated to perform unobservable steps (which can even be reiterated in a note if we're concerned about it).
Also, if we wanted to promote this AO to 262,
String.prototype.split
uses truncation so a "rest" result wouldn't work for that case.
No, it would be fine as e.g.
1. Let _substrings_ be StringSplitToList(_S_, _R_, _lim_).
1. If the number of elements in _substrings_ > _lim_, remove the last element from _substrings_.
1. Assert: the number of elements in _substrings_ ≤ _lim_.
1. Return CreateArrayFromList(_substrings_).
or even (with a new maxLength parameter for CreateArrayFromList)
1. Let _substrings_ be StringSplitToList(_S_, _R_, _lim_).
1. Return CreateArrayFromList(_substrings_, _lim_).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK thanks for the explanation, I'll remove the limit parameter. Note that the assertion that S isn't empty means that some callsites like IsWellFormedUnitIdentifier
need to add steps to deal with the empty string case. Is that OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are in the latest commit. I updated IsWellFormedUnitIdentifier to handle the empty-string case before calling StringSplitToList.
spec/locales-currencies-tz.html
Outdated
1. Let _backzone_ be *undefined*. | ||
1. Let _backzoneLinkLines_ be the List of lines in the file <code>backzone</code> of the IANA Time Zone Database that start with either *"Link "* or *"#PACKRATLIST zone.tab Link "*. | ||
1. For each element _line_ of _backzoneLinkLines_, do | ||
1. If _line_ starts with *"#PACKRATLIST zone.tab "*, set _line_ to the substring of _line_ from 22. | ||
1. Assert: _line_ starts with *"Link "*. | ||
1. Set _line_ to the substring of _line_ from 5. | ||
1. Let _backzoneAndLink_ be StringSplitToList(_line_, *" "*, 2). | ||
1. Assert: _backzoneAndLink_ has exactly two elements. | ||
1. If _backzoneAndLink_[1] is _identifier_, then | ||
1. Assert: _backzone_ is *undefined*. | ||
1. Set _primary_ to _backzoneAndLink_[0]. | ||
1. Assert: _backzone_ is not *undefined*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Let _backzone_ be *undefined*. | |
1. Let _backzoneLinkLines_ be the List of lines in the file <code>backzone</code> of the IANA Time Zone Database that start with either *"Link "* or *"#PACKRATLIST zone.tab Link "*. | |
1. For each element _line_ of _backzoneLinkLines_, do | |
1. If _line_ starts with *"#PACKRATLIST zone.tab "*, set _line_ to the substring of _line_ from 22. | |
1. Assert: _line_ starts with *"Link "*. | |
1. Set _line_ to the substring of _line_ from 5. | |
1. Let _backzoneAndLink_ be StringSplitToList(_line_, *" "*, 2). | |
1. Assert: _backzoneAndLink_ has exactly two elements. | |
1. If _backzoneAndLink_[1] is _identifier_, then | |
1. Assert: _backzone_ is *undefined*. | |
1. Set _primary_ to _backzoneAndLink_[0]. | |
1. Assert: _backzone_ is not *undefined*. | |
1. Let _backzone_ be *undefined*. | |
1. Let _backzoneLinkLines_ be the List of lines in the file <code>backzone</code> of the IANA Time Zone Database that start with either *"Link "* or *"#PACKRATLIST zone.tab Link "*. | |
1. For each element _line_ of _backzoneLinkLines_, do | |
1. Let _i_ be StringIndexOf(_line_, *"Link"*, 0). | |
1. Set _line_ to the substring of _line_ from _i_. | |
1. Let _fields_ be StringSplitToList(_line_, *" "*). | |
1. Assert: _fields_ has at least three elements, _fields_[0] is *"Link"*, _fields_[1] is not the empty String, and _fields_[2] is not the empty String. | |
1. If _fields_[2] is _identifier_, then | |
1. Assert: _backzone_ is *undefined*. | |
1. Set _backzone_ to _fields_[1]. | |
1. Assert: _backzone_ is not *undefined*. | |
1. Set _primary_ to _backzone_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching the bug! I updated the spec text to take some of this suggestion but not all of it. Two things I didn't move over:
- The
_fields_[0] is *"Link"*
assertion was redundant with the earlier StringIndexOf call (or would have been if the StringIndexOf call included the terminating space separator. So I started the StringSplitToList after"Link "
. - I liked the clarity of
_backzoneAndLink_
as the local variable name instead of the more generic_fields_
. This algorithm is pretty complex so I think any additional documentation is helpful here.
This is in the latest commit. Let me know if this is OK!
dc691aa
to
b177cec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to update StringSplitToList to support empty-string S (with result in alignment with String.prototype.split, e.g. « "" »), I would not be opposed. But dealing with it at call sites for now seems fine to me.
ad23e59
to
8e1198a
Compare
This PR resolves tc39#825 by adding spec text that more clearly defines how ECMA-402 implementations should decide which IANA time zone IDs should be primary vs. non-primary. This PR implements "Option C" in tc39#825 by deterministically defining ECMAScript's exceptions from the IANA Time Zone Database's defaults, and then pointing implementers at ICU as a convenient implementation of those exceptions. This PR also accommodates to web reality by aligning the 402 spec text with the existing behavior of ICU and CLDR while providing deterministic rules that can guide future changes in CLDR data. Finally, this PR introduces a new StringSplitToList abstract operation and uses it to simplify AvailableNamedTimeZoneIdentifiers and IsWellFormedUnitIdentifier.
8e1198a
to
adc0cf3
Compare
Nah, I think it's fine as-is, and I actually think it's kinda good bad to force callsites to decide how they want to handle the empty-string case because it's not always intuitive how to handle that case. Anyway, I adopted both of your last suggestions, rebased, and squashed. I think this PR should be ready to merge. |
Actually, I assume we want Test262 tests for this new text. I'll start working on those. And of course it can't be merged until after it's approved at the next TC39 meeting in a few weeks. |
This proposed change resolves #825 by adding normative spec text to clarify how ECMA-402 implementations should decide which IANA time zone IDs should be primary vs. non-primary. This will enable more consistency between ECMAScript implementations and prevent future divergence.
This PR also accommodates to web reality by aligning ECMA-402 with CLDR and ICU. This should make it easier for all ECMAScript engines to comply with the spec while still being able to use ICU.
This PR is stacked on #876, so please ignore the first commit when reviewing this PR.
Note that the problem of out-of-date primary IDs for renamed Zones (like Asia/Calcutta) is out of scope to this PR, because the spec already requires current IDs, and there's already a plan to fix it that requires no spec changes.
Summary of proposed changes
This PR implements "Option C" in #825 by deterministically defining ECMAScript's exceptions from the IANA Time Zone Database's defaults, and then pointing implementers at ICU as a convenient implementation of those exceptions.
We'll start with a baseline of IANA's Zone and Link names and specify a few exceptions:
This PR also makes two other smaller text changes that we expect to have zero impact on current engines:
Per-engine changes required
Implementing the changes in this PR will impact JS engines differently, given the current divergence between engines:
For V8 (cc @FrankYFTang) and JSC (cc @Constellation), requirements 1-3 above are already how these engines behave, and (4) should be simple to implement. Note that this PR doesn't affect the plan to fix out-of-date canonicalizations like Asia/Calcutta and Europe/Kiev. This plan is unchanged: as part of landing Temporal Stage 4, switch to use ICU's new
icu::TimeZone::getIanaID()
, which returns the latest IANA IDs instead of out-of-date canonical IDs like Asia/Calcutta.For SpiderMonkey (cc @anba), more changes are needed because currently SpiderMonkey conforms to the spec which requires using
backward
in TZDB to determine canonicalization. SM could useicu::TimeZone::getIanaID()
to implement (2) and (3) above, or could implement the same behavior by reading CLDR data or IANA data directly. Also, this PR will reduce SM's differences inIntl.supportedValuesOf('timeZone')
vs. V8/JSC.Testing
Test262 changes will be needed to validate these normative changes, but I'm not sure how we can run those tests except using the Temporal polyfill. @ptomato I'll be looking for your advice (and perhaps help writing tests!) on this point.
Feedback requested
Feedback is welcome on any part of this proposal, but I'm most interested in making sure that the spec text actually accomplishes what the summary above claims that it does.