-
Notifications
You must be signed in to change notification settings - Fork 17
CanonicalizeLanguageTag should remove duplicate attributes/keywords in a Unicode extension, consistent with Intl.Locale #82
Comments
…removes duplicate attributes/keywords in Unicode locale extension sequences just as Intl.Locale does. Fixes tc39#82.
I can't find anything in TR35 or RFC 6067 that describes the removal of duplicate attributes/keywords. So it doesn't look like there's anything in TR35 we could invoke to perform this operation, and it has to be hand-rolled -- as this patch does for |
@anba You should probably take a look at and comment on this, seeing as you understand this stuff better than everyone else here. :-) |
@jswalden 6067 says something about this:
This was intended to allow duplicates to be removed without effect (although I don't think we went on to encourage/permit it). |
@aphillips Yeah, the semantics of a tag with duplicates in it are clear enough even if they aren't removed. Just seems to me if we remove them one place -- and we don't really have a choice about it in |
The canonical form is in a specific order. I would just go ahead and remove duplicates - - - extension tags are already long and unwieldy even without useless cruft in them. |
…removes duplicate attributes/keywords in Unicode locale extension sequences just as Intl.Locale does. Fixes tc39#82.
#83 fixes this. |
Canonicalization performed by
CanonicalizeLanguageTag
and that performed byIntl.Locale
differ in two intended ways.CanonicalizeLanguageTag
doesn't remove duplicated attributes or keywords, e.g."en-u-attr-attr"
and"en-u-co-dict-co-phonebk"
are both considered to be canonical.Intl.Locale
does (and almost necessarily must, to integrate keywords in the input tag with keywords specified through theoptions
bag).CanonicalizeLanguageTag
doesn't replace aliased subtags in Unicode locale extension sequences with their preferred forms, e.g."en-u-ms-imperial"
is canonical according toCanonicalizeLanguageTag
, butIntl.Locale
will transform it to"en-u-ms-uksystem"
. (This latter behavior doesn't exist in the current spec because of changes to TR35 upstream. See Update references to match current UTS 35 spec #77 for dealing with that change.)On the call last week I had thought the latter TR35 upstream change was something we had accepted, and I didn't understand that the first problem still remained, so I was fine with this proposal moving forward. But the latter change was unintentional (#77 will deal with it), and the first problem is real. We need to fix both of these to move this proposal forward, IMO. :-(
I have a patch that augments this proposal with changes to the existing
CanonicalizeLanguageTag
algorithm such that duplicate attributes and keywords are removed. I am not sure that this is the most elegant way to implement deduplication. But it gets the job done, and of course implementations will choose whatever approach works best for them in reality. I'll create a PR once I've gotten this issue filed and have an issue number to refer to.The text was updated successfully, but these errors were encountered: