Change the CanonicalizeLanguageTag operation so that it removes duplicate attributes/keywords in Unicode locale extension sequences just as Intl.Locale does #83

jswalden · 2019-11-18T22:25:11Z

This fixes #82.

anba · 2019-11-19T17:18:00Z

LGTM!

It looks like V8 is already removing duplicate keywords, so there shouldn't be any web-compat problems:

d8> Intl.getCanonicalLocales("de-u-ca-gregory-ca-islamicc")
["de-u-ca-gregory"]

And duplicate attributes don't seem to be supported in V8, which is a V8 bug, but also means we don't need to worry about web-compat issues.

d8> Intl.getCanonicalLocales("de-u-attr-attr")
(d8):1: RangeError: Invalid language tag: de-u-attr-attr

…removes duplicate attributes/keywords in Unicode locale extension sequences just as Intl.Locale does. Fixes tc39#82.

zbraniecki · 2019-12-12T20:25:18Z

This doesn't cover several steps in https://www.unicode.org/reports/tr35/tr35.html#Canonical_Unicode_Locale_Identifiers

Sorting
Removal of true

is that ok?

jswalden · 2019-12-12T22:54:10Z

This is one step of two. This only deals with duplications. Sorting and removal of true I was going to do in a second PR. I'll get that second one going right pronto.

…canonical syntax.

jswalden · 2019-12-13T05:09:28Z

Okay, I added a second commit here that additionally makes CanonicalizeLanguageTag transform its input to canonical form. Moreover, for greater clarity that Intl.Locale always canonicalizes consistent with CanonicalizeLanguageTag, I made it so that if you follow through the algorithm, every possible result-string shoved into the returned locale.[[Locale]] has just passed through CanonicalizeLanguageTag:

ApplyOptionsToTag always returns something passed through CanonicalizeLanguageTag, either step 9 if the options-object doesn't provide language/script/region overrides, or step 13 otherwise;
the helpfully-renamed InsertUnicodeExtensionAndCanonicalize now obviously returns a CanonicalizeLanguageTag result;
ApplyUnicodeExtensionToTag is passed the result of ApplyOptionsToTag, so its tag argument must be a CanonicalizeLanguageTag result;
ApplyUnicodeExtensionToTag returns either tag (a CanonicalizeLanguageTag result) unaltered, or InsertUnicodeExtensionAndCanonicalize(tag) (also a CanonicalizeLanguageTag result);
so therefore locale.[[Locale]] in the return of Intl.Locale is a CanonicalizeLanguageTag result.

zbraniecki

lgtm!

zbraniecki · 2019-12-19T18:25:13Z

@srl295 - can you verify that this change is in line with ICU thinking?

sffc · 2020-01-08T17:27:29Z

This PR is blocking stage advancement, and it looks like the PR is waiting for review from a subject matter expert. Is that correct?

CC @markusicu @FrankYFTang

littledan

This change makes sense to me and seems like it should be an improvement vs previous behavior. I wonder if we should consider "upstreaming" this algorithm into the Unicode document.

srl295 · 2020-01-09T17:52:08Z

let me see if i can get a review

yumaoka · 2020-01-09T19:24:15Z

I think the abstract operation name is CanonicalizeLanguageTag is not good if the operation does something beyond BCP 47. Language Tag is a term used in BCP 47 and canonicalization is specified by https://tools.ietf.org/html/bcp47#section-4.5

In CLDR, Unicode BCP 47 locale identifier is defined separately from BCP 47 language tag. And such canonicalization is in a scope of LDML/u-extension, not BCP 47 language tag specification.

I think removal of duplicated key is good and should be clarified in LDML, but this operation is for Unicode locale identifier. If the abstract operation name is CanonicalizeUnicodeLocaleId then, it makes perfect sense.

littledan · 2020-01-09T19:43:26Z

@yumaoka Thanks for the review. It sounds like you're saying that there are editorial cleanups that we should do, but that the semantics are appropriate. Is that accurate?

srl295 · 2020-01-09T19:51:55Z

@yumaoka Thanks for the review. It sounds like you're saying that there are editorial cleanups that we should do, but that the semantics are appropriate. Is that accurate?

rename the operation to make it clear that it is not generic to BCP47 but specific to Unicode locale IDs (which are a strict subset of BCP47)?

…ore-precise name.

jswalden · 2020-01-24T11:19:20Z

Made the change to rename CanonicalizeLanguageTag to CanonicalizeUnicodeLocaleId. At first I thought maybe this ought happen after this proposal merges, so as not to bloat the "Modified algorithms" section of this proposal with single-word deletion/replacements, but then I looked and there really aren't that many, so it's not terrible to just do it here. Did so, in a further separate commit for readability.

zbraniecki · 2020-01-24T11:25:30Z

Thank you!

jswalden · 2020-01-24T11:30:26Z

...wait, that's not it. I did a git and forgot to git add the changes, so that third commit doesn't actually make all the name-changes. :-( Sec, I'll fix.

jswalden · 2020-01-24T11:37:08Z

Well, in theory I can fix. I have force-pushed to the same branch in my fork, but it does not appear to be showing up here. I'll open a new PR for it.

zbraniecki · 2020-01-24T11:37:58Z

yeah, since I merged the PR, we need a new PR. Sorry for rushing the merge!

Additionally change the CanonicalizeLanguageTag operation so that it …

b8b8b78

…removes duplicate attributes/keywords in Unicode locale extension sequences just as Intl.Locale does. Fixes tc39#82.

jswalden force-pushed the canonicalize-trim-duplicates branch from 121c725 to b8b8b78 Compare December 12, 2019 19:59

CanonicalizeLanguageTag should convert to canonical form, not merely …

4c5b861

…canonical syntax.

jswalden mentioned this pull request Dec 13, 2019

Update references to match current UTS 35 spec #77

Closed

zbraniecki requested a review from littledan December 17, 2019 21:11

zbraniecki approved these changes Dec 17, 2019

View reviewed changes

littledan approved these changes Jan 9, 2020

View reviewed changes

zbraniecki mentioned this pull request Jan 24, 2020

CanonicalizeLanguageTag should remove duplicate attributes/keywords in a Unicode extension, consistent with Intl.Locale #82

Closed

Rename CanonicalizeLanguageTag to CanonicalizeUnicodeLocaleId for a m…

bcd048d

…ore-precise name.

zbraniecki merged commit 3cf584f into tc39:master Jan 24, 2020

jswalden mentioned this pull request Jan 24, 2020

Rename CanonicalizeLanguageTag to CanonicalizeUnicodeLocaleId #86

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change the CanonicalizeLanguageTag operation so that it removes duplicate attributes/keywords in Unicode locale extension sequences just as Intl.Locale does #83

Change the CanonicalizeLanguageTag operation so that it removes duplicate attributes/keywords in Unicode locale extension sequences just as Intl.Locale does #83

jswalden commented Nov 18, 2019

anba commented Nov 19, 2019

zbraniecki commented Dec 12, 2019

jswalden commented Dec 12, 2019

jswalden commented Dec 13, 2019 •

edited

Loading

zbraniecki left a comment

zbraniecki commented Dec 19, 2019

sffc commented Jan 8, 2020

littledan left a comment

srl295 commented Jan 9, 2020

yumaoka commented Jan 9, 2020

littledan commented Jan 9, 2020

srl295 commented Jan 9, 2020

jswalden commented Jan 24, 2020

zbraniecki commented Jan 24, 2020

jswalden commented Jan 24, 2020

jswalden commented Jan 24, 2020

zbraniecki commented Jan 24, 2020

Change the CanonicalizeLanguageTag operation so that it removes duplicate attributes/keywords in Unicode locale extension sequences just as Intl.Locale does #83

Change the CanonicalizeLanguageTag operation so that it removes duplicate attributes/keywords in Unicode locale extension sequences just as Intl.Locale does #83

Conversation

jswalden commented Nov 18, 2019

anba commented Nov 19, 2019

zbraniecki commented Dec 12, 2019

jswalden commented Dec 12, 2019

jswalden commented Dec 13, 2019 • edited Loading

zbraniecki left a comment

Choose a reason for hiding this comment

zbraniecki commented Dec 19, 2019

sffc commented Jan 8, 2020

littledan left a comment

Choose a reason for hiding this comment

srl295 commented Jan 9, 2020

yumaoka commented Jan 9, 2020

littledan commented Jan 9, 2020

srl295 commented Jan 9, 2020

jswalden commented Jan 24, 2020

zbraniecki commented Jan 24, 2020

jswalden commented Jan 24, 2020

jswalden commented Jan 24, 2020

zbraniecki commented Jan 24, 2020

jswalden commented Dec 13, 2019 •

edited

Loading