Skip to content
This repository has been archived by the owner on Jul 31, 2023. It is now read-only.

Include non-canonical names in the Available operations, then filter them out #49

Closed
wants to merge 2 commits into from

Conversation

ptomato
Copy link
Contributor

@ptomato ptomato commented Jul 16, 2022

The Available abstract operations (e.g. AvailableCalendars) should return all possible aliases, so that other places in the spec (e.g. the Temporal.Calendar constructor) can use them to determine whether a given input value is valid. This input value can subsequently be canonicalized by another abstract operation (e.g. CanonicalizeCalendar).

In Intl.supportedValuesOf(), on the other hand, we should not return all possible aliases, so we filter them out using a Canonicalize operation before returning the list of Available codes as an array to the caller.

Not all of the kinds of codes here have aliases, and not even all of them have a concept of "canonical". A quick investigation shows:

  • Calendar: aliased; case-regularized; limited values "available" but any well-formed value accepted, unknown values coerced to the locale's default

  • Collation: not aliased; case-regularized; limited values "available" but any well-formed value accepted, unknown values coerced to the locale's default

  • Currency: not sure if it is aliased because I don't have a copy of ISO 4217; case-regularized; limited values "available" but any well-formed value accepted and used

  • Numbering system: not aliased; not case-regularized; limited values "available" but any well-formed value accepted, unknown values coerced to the locale's default

  • Time zone: aliased; case-regularized

  • Unit of measurement: not aliased; not case-regularized; limited values "available" but simple combinations of core values also accepted and used

So, I conclude that we need Canonicalize operations for calendars, time zones, and possibly currency units.

An alternative approach would be to write Canonicalize operations for all of the kinds of codes, and have them perform the case-regularization (or for numbering systems and units of measurement they would be no-ops).

(This pull request is currently based on top of #48, I will rebase it when that is merged)

Closes: #37

@anba
Copy link
Collaborator

anba commented Jul 20, 2022

Calendar, collations, and numbering systems should probably be canonicalised according to the same language as in ResolveLocale, step 9.i.iii.1:

Let optionsValue be the string optionsValue after performing the algorithm steps to transform Unicode extension values to canonical syntax per Unicode Technical Standard #35 LDML § 3.2.1 Canonical Unicode Locale Identifiers, treating key as ukey and optionsValue as uvalue productions.

(That step occurs twice in ResolveLocale, I guess it's just a copy-paste error?)

@ptomato
Copy link
Contributor Author

ptomato commented Aug 10, 2022

I think that link might be stale? It seems to point to how to canonicalize the whole locale ID, but doesn't say anything about casing of ukey and uvalue. (In this table they are defined as alphanum alpha and alphanum{3,8} (sep alphanum{3,8})* respectively, no mention of casing.

@anba
Copy link
Collaborator

anba commented Aug 11, 2022

The section contains a link to Annex C. LocaleId Canonicalization, which defines the actual canonicalisation algorithm. I know that these two sections were restructured at some point, but I don't know if the links in ECMA-402 were created before or after that restructuring. (That also means I don't know if we intentionally link to Canonical Unicode Locale Identifiers instead of Annex C. LocaleId Canonicalization.)

@ptomato
Copy link
Contributor Author

ptomato commented Aug 30, 2022

@anba Is it specifically the step "Put all other subtags into lowercase" in https://unicode.org/reports/tr35/#5-canonicalizing-syntax that you're referring to?

@anba
Copy link
Collaborator

anba commented Aug 31, 2022

That's the first step when applying canonicalisation, at the end of it the source should be in "canonical syntax". But we also want it to be in "canonical form" (both terms are defined in UTS 35). Annex C. LocaleId Canonicalization, section Processing LocaleIds further defines how to canonicalise so-called alias entries:

  1. Use the bcp47 data to replace keys, types, tfields, and tvalues by their canonical forms. See Section 3.6.4 U Extension Data Files and Section 3.7.1 T Extension Data Files. The matches are in the alias attribute value, while the canonical replacement is in the name attribute value. For example:
    1. Because of the following bcp47 data: <key name="ms"…>…<type name="uksystem" … alias="imperial" … />…</key>
    2. We get the following transformation: en-u-ms-imperialen-u-ms-uksystem

This is relevant for "calendar" to canonicalise ethiopic-amete-alem to ethioaa and islamicc to islamic-civil. (The last case doesn't seem to be properly defined? The preferred attribute isn't handled in the UTS 35 canonicalisation algorithm, but that's a CLDR issue and shouldn't concern us right now.)

@ptomato ptomato force-pushed the ptomato/37-available-canonical branch from d70c303 to a97f16e Compare September 8, 2022 22:25
@ptomato
Copy link
Contributor Author

ptomato commented Sep 8, 2022

OK, I've made an attempt to address this comment; a couple things to note:

  • I didn't use the suggested language from ResolveLocale because as far as I can tell, it applies to whole language IDs, e.g. en-u-co-trad. There isn't really anything applicable in that algorithm that can be referred to directly; the relevant bits are "Put all other subtags into lowercase" in "Canonicalizing Syntax" in Annex C, and "Use the bcp47 data to replace keys, types, tfields, and tvalues by their canonical forms" in "Processing LocaleIDs". It seems more unambiguous to me to just say directly what needs to be done to the string, but we can also refer to Annex C instead, or do both.
  • I've found that the web reality is that Intl.Collator doesn't canonicalize collator types, e.g. new Intl.Collator('en', {collation: 'traditional'}) throws instead of unaliasing to trad. This is possibly because all of the currently defined aliases for collation types are >8 characters long.

@anba
Copy link
Collaborator

anba commented Sep 9, 2022

We should use the same language to canonicalize Unicode extension values everywhere. So if ResolveLocale applies it on standalone uvalue productions, we should use the same language here. If there's a problem with the definition in ResolveLocale, we should fix it in ECMA-402.

traditional can only be used in Unicode CLDR locale identifiers with the §3.8.1 Old Locale Extension Syntax, whereas we only support Unicode BCP 47 locale identifiers (see §3.3 BCP 47 Conformance). The canonicalisation algorithm supports both syntaxes, therefore things like traditional crop up. (For example es@collation=traditional is valid old-style syntax; we only support es-u-co-trad.)

The Available abstract operations (e.g. AvailableCalendars) should return
all possible aliases, so that other places in the spec (e.g. the
Temporal.Calendar constructor) can use them to determine whether a given
input value is valid. This input value can subsequently be canonicalized
by another abstract operation (e.g. CanonicalizeCalendar).

In Intl.supportedValuesOf(), on the other hand, we should _not_ return all
possible aliases, so we filter them out using a Canonicalize operation
before returning the list of Available codes as an array to the caller.

Not all of the kinds of codes here have aliases, and not even all of them
have a concept of "canonical". A quick investigation shows:

- Calendar: aliased; case-regularized; limited values "available" but any
  well-formed value accepted, unknown values coerced to the locale's
  default

- Collation: aliased, but web reality is that aliases are ignored;
  case-regularized; limited values "available" but any well-formed value
  accepted, unknown values coerced to the locale's default

- Currency: not sure if it is aliased because I don't have a copy of
  ISO 4217; case-regularized; limited values "available" but any
  well-formed value accepted and used

- Numbering system: not currently aliased; not case-regularized; limited
  values "available" but any well-formed value accepted, unknown values
  coerced to the locale's default

- Time zone: aliased; case-regularized

- Unit of measurement: not aliased; not case-regularized; limited values
  "available" but simple combinations of core values also accepted and
  used

So, I conclude that we need Canonicalize operations for calendars, time
zones, collation types, numbering systems, and possibly currency units.

An alternative approach would be to write Canonicalize operations for all
of the kinds of codes, and have them perform the case-regularization (and
for units of measurement they would be no-ops).

Closes: tc39#37
@ptomato ptomato force-pushed the ptomato/37-available-canonical branch from a97f16e to ee5047b Compare September 9, 2022 22:19
@ptomato
Copy link
Contributor Author

ptomato commented Sep 9, 2022

OK.

@ptomato
Copy link
Contributor Author

ptomato commented Sep 26, 2022

Any other reviews on this one?

@FrankYFTang
Copy link
Collaborator

other places in the spec (e.g. the Temporal.Calendar constructor) can use them to determine whether a given input value is valid.

This is beyond the scope of what this proposal aim to resolve. I am fine with you or anyone else to propose a ECMA402 PR to achieve that but I am very against to loop that beyond-scope objective into this proposal.

This proposal already reach Stage 3 and as the champion of it, I am not comfortable to enlarge the scope of this proposal for that reason. I rather we push this proposal to Stage 4 , merge it into ECMA402 and then you or someone else could propose a ECMA402 PR to achieve that.

@ptomato
Copy link
Contributor Author

ptomato commented Sep 28, 2022

That's disappointing to hear, since it would mean that I'll have to waste a lot of time later rewriting the work you're doing here, so that AvailableCalendars and AvailableTimeZones become useful for Temporal, when we could just unify this work now. I'd ask you to reconsider, please.

@ljharb
Copy link
Member

ljharb commented Sep 28, 2022

I’m confused why an editorial-only change would be considered beyond the scope of a proposal.

@FrankYFTang
Copy link
Collaborator

FrankYFTang commented Sep 29, 2022

I’m confused why an editorial-only change would be considered beyond the scope of a proposal.

1, I do NOT believe this is "an editorial-only change".
2. Look at this PR, it is a very big change. I totally understand why he want to make such change. And I am not oppose of he made such a change to ECMA402. I am opposing he channel such change through this Stage 3 proposal. This proposal is almost ready to merge into ECMA402 if we can advanced into Stage 4 in Nov. I would rather we go that route- merge this proposal into ECMA402 as Stage 4 and then he propose an enhancment to ECMA402 as a independent PR. This PR change InitializeCollator of Intl.Collator, CanonicalCodeForDisplayNames of Intl.DisplayNames which are not what this proposal aim to address.
3. Why should we include something need to be filtered out in the AvailableXXX AOs if we should not return them? If they should not be return, they should not be available. I do not think we have any agreed requirement or demand to support non canonical id. So to change the spec to support non-canonoical id is a new feature which I was not intend to loop into this Stage 3 proposal because that was not something I have receive agreement when we move this proposal into Stage 3.

@FrankYFTang
Copy link
Collaborator

AvailableCalendars and AvailableTimeZones become useful for Temporal, when we could just unify this work now. I'd ask you to reconsider, please.

I am not opposing you propose to change AvailableCalendars and AvailableTimeZones after it get into ECMA402. I am against you put that part of change into this proposal which is already in Stage 3 for a while and already shipped in chrome m99, Safari 15.4 and Firefox 93. You propose a very big change to spec text right before we could merge into ECMA402 as Stage 4 and that is why I am opposing. I am not opposing the spec change you made, i am opposing the process / channel you suggest to move that change into ECMA402.

@FrankYFTang
Copy link
Collaborator

I’m confused why an editorial-only change would be considered beyond the scope of a proposal.

If his change is truely editorial-only (which I do not believe so) then he can simply propose a ECMA402 editorial PR to merge that in now or after we merge this proposal into ECMA402, right? then why rush now ?

@ptomato
Copy link
Contributor Author

ptomato commented Sep 29, 2022

Here's my understanding of the situation:

Is my understanding wrong? If you would like to have a call to discuss this, or discuss it in the TG2 meeting, let me know.

@FrankYFTang
Copy link
Collaborator

Here are the differences:

You are assuming the "adding non-canonical id and filter out later" approach is acceptable and will reach consensus. In that case, your path is clear as what you stated and will save people's time.
I am assuming the "adding non-canonical id and filter out later" approach could be controversial and will not be easily resolved. In that case, this path will introduce unnecessary delay for this proposal to advanced into Stage 4.

I simply does not want this proposal to be part of the debate of "adding non-canonical id and filter out later" approach. As an alternative, you can easily define a different AOs - CanonicalAndNonCanonicalCalendars() , which return the union of the result of AvailableCalendars() and ExtraNonCanonicalCalendars() for your purpose. I prefer "only return canonical id and add non-cananoical ids" approach in the AO better since that will encourage optimized code for performance.

@ptomato
Copy link
Contributor Author

ptomato commented Sep 29, 2022

I don't understand what makes you think it would be controversial, if implementations don't have to change anything. Who do you expect will object to this?

Is there a way we could change the way it's expressed so that it has less potential to be controversial?

@FrankYFTang
Copy link
Collaborator

I don't understand what makes you think it would be controversial, if implementations don't have to change anything. Who do you expect will object to this?

The fact we arguing about this PR for so long here already prove it is controversial, right?

I ASSUME everything COULD BE controversial BY DEFAULT
It may or may not be controversial and there are no evidence to prove either case. I simply want to tight up change control on this Stage 3 proposal so it can be moved to Stage 4 with minimum risk. The amount of changes in this PR (by the natural of change of number of characters in the spec) simply dramatically increase the risk. I simply do not want this proposal to take that unnecessary risk.

@ptomato
Copy link
Contributor Author

ptomato commented Sep 29, 2022

I'm really annoyed that you insist no compromise is possible on this. Especially since over the last 1.5 years I've spent a considerable amount of my time adjusting the Temporal spec in Stage 3 to address concerns from you, working to find solutions that make everyone as happy as possible, including things that I felt were out of scope or introduced risk of delay.

This is just a waste of my time, so I'll close this PR.

At least, please revert the rename you just did from AvailableXXX to AvailableCanonicalXXX. It makes the situation worse because it presumes that the simpler change will never make it into ECMA-402.

@ptomato ptomato closed this Sep 29, 2022
@ljharb
Copy link
Member

ljharb commented Sep 29, 2022

For what it's worth, I find it far more controversial, and more likely to block this proposal hitting stage 3, to reject this change than to accept it ¯\_(ツ)_/¯

@ptomato
Copy link
Contributor Author

ptomato commented Sep 29, 2022

Just to be clear - I would absolutely not block Intl Enumeration on those grounds. As annoyed as I am about this, blocking in retaliation would drag everybody down.

@ljharb
Copy link
Member

ljharb commented Sep 29, 2022

Oh, same, I'm just clarifying that inaction can often be more controversial than action, not less.

@FrankYFTang
Copy link
Collaborator

I find it far more controversial, and more likely to block this proposal hitting stage 3, to reject this change than to accept it ¯_(ツ)_/¯

Fact: This proposal is already in Stage 3 since July 2021.
Thereefore, "to block this proposal hitting stage 3" as in Sept 2022, for whatever reason, you need to first build a Time Machine to travel back 14 months in time, which would be much more controversial for sure.

@ljharb
Copy link
Member

ljharb commented Sep 29, 2022

haha ok fair point, then i'm unclear on your resistance since it's only stage 4 you need to secure.

@FrankYFTang
Copy link
Collaborator

inaction can often be more controversial than action, not less.

Agree, I proposed #43 in Dec 2, 2021 to address #37 and the reviewers on that PR didn't provide feedback for 10 months is probably qualified as inaction.

@FrankYFTang
Copy link
Collaborator

haha ok fair point, then i'm unclear on your resistance since it's only stage 4 you need to secure.

Fact: chrome m99, Safari 15.4 and Firefox 93 already ship with this proposal.
see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/supportedValuesOf#browser_compatibility

Here is what I like to see

  1. Advanced this proposal into Stage 4 TC39 in Nov 2022
  2. Allow ECMA402 editors to merge the Stage4 PR (of this proposal) into ECMA402 so it will be part of 2023 edition of ECMA402.
  3. After 1) and 2) allow whoever need to depend on it to propose ECMA402 PR

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AvailableCalendars, AvailableCollations, and AvailableNumberingSystems should return canonical identifiers
4 participants