-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normative: Do not allow duplicate variants within the tlang
component of a transformed_extensions
#429
Conversation
f8d58ac
to
dbde17b
Compare
Another one to bring up in the next meeting. |
dbde17b
to
b5ea6fe
Compare
spec/locales-currencies-tz.html
Outdated
<li>does not include duplicate singleton subtags.</li> | ||
<li>_locale_ can be generated from the EBNF grammar for `unicode_locale_id` in <a href="https://unicode.org/reports/tr35/#Unicode_locale_identifier">Unicode Technical Standard #35 LDML § 3.2 Unicode Locale Identifier</a>;</li> | ||
<li>_locale_ does not use any of the backwards compatibility syntax described in <a href="https://unicode.org/reports/tr35/#BCP_47_Conformance">Unicode Technical Standard #35 LDML § 3.3 BCP 47 Conformance</a>;</li> | ||
<li>the `unicode_language_id` within _locale_ does not contain ASCII case-insensitively equivalent `unicode_variant_subtag` subtags; and</li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should be changed so reader won't misread it as there should be no variant tag at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I keep staring at this and can't see this meaning in it...but also I wrote it. ^_^ I updated this with different wording that puts "duplicate" closer to the start; let met know if there's still a problem with that.
@macchiati @zbraniecki Do either of you have comments on this structural-validity adjustment, and particularly how this intersects with BCP 47/UTS 35 validity requirements? The no-duplicates restriction at top level is already in specs; the one for |
b5ea6fe
to
7da6855
Compare
+ <li>the `unicode_language_id` within _locale_ does not
contain ASCII case-insensitively equivalent `unicode_variant_subtag`
subtags; and</li>
Rather than repeat "ASCII case-insensitively equivalent" for the
language around duplicates, I think it is cleaner to say in one place
that validity tests are made as if the entire locale (or language) ID
were lowercase. Thus *duplicate* as in "duplicate subtags" always
means ASCII case-insensitively equivalent.
Mark
…On Thu, May 21, 2020 at 4:32 PM Jeff Walden ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In spec/locales-currencies-tz.html
<#429 (comment)>:
> </p>
<ul>
- <li>represents a well-formed "Unicode BCP 47 locale identifier" as specified in <a href="https://unicode.org/reports/tr35/#Unicode_locale_identifier">Unicode Technical Standard 35 section 3.2</a>,</li>
- <li>does not include duplicate variant subtags, and</li>
- <li>does not include duplicate singleton subtags.</li>
+ <li>_locale_ can be generated from the EBNF grammar for `unicode_locale_id` in <a href="https://unicode.org/reports/tr35/#Unicode_locale_identifier">Unicode Technical Standard #35 LDML § 3.2 Unicode Locale Identifier</a>;</li>
+ <li>_locale_ does not use any of the backwards compatibility syntax described in <a href="https://unicode.org/reports/tr35/#BCP_47_Conformance">Unicode Technical Standard #35 LDML § 3.3 BCP 47 Conformance</a>;</li>
+ <li>the `unicode_language_id` within _locale_ does not contain ASCII case-insensitively equivalent `unicode_variant_subtag` subtags; and</li>
I keep staring at this and can't see this meaning in it...but also I wrote
it. ^_^ I updated this with different wording that puts "duplicate" closer
to the start; let met know if there's still a problem with that.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#429 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMCOG3LUVZFACUBGBADRSW2X3ANCNFSM4MQPV6RQ>
.
|
…formed content extension.
7da6855
to
66ba7c4
Compare
Is the de-duplication a canonicalization operation or validity check? I think in ICU4X we treat it as a canonicalization operation in accordance with the "Be liberal in what you accept, and conservative in what you send" principle. I'd suggest doing that for both, language identifier and tlang along with case normalization and sorting. |
It's a bad principle. :-) If you don't impose discipline in what you accept, senders will not self-discipline in what they send.
Duplicates in the language tag are excluded as a validity check -- and have been ever since BCP47. In principle that could be relaxed...but then you may need to drag along the rest of the ecosystem to also accept duplicates. That seems much harder than simply being more restrictive about what we accept, in the much-narrower area of As usual @anba figured this out before both of us. :-) |
I don't agree.
Yes. that makes sense. |
What is the status of this PR? Are we still waiting for feedback? |
I don't think we are any more -- the request for feedback was answered with "don't repeat the ASCII duplication language multiple times", the latest iteration states it only once, so this should be good to go, as far as I can tell. |
@sffc At this point, assuming you agree with my last comment, I think this can be merged. Unless adding test262 data at the same time is prerequisite for doing so, in which case I can file the issue and do the necessary work on it, or there's some other blocking reason I'm not aware of... |
Is it correct to say that this is technically a normative PR but it reflects web reality and therefore is a no-op change from the implementation's point of view? |
@sffc It's a normative change. I don't know if it reflects web reality, but it seems doubtful to me there are users out there who knowingly pass duplicate variants in |
tlang
component of a transformed_extensions
tlang
component of a transformed_extensions
tlang
component of a transformed_extensions
tlang
component of a transformed_extensions
Oh ok, so this is a normative PR. That means we need to do the dance of getting implementations, adding tests, MDN, etc. I opened tc39/test262#2857 for tests. If you're aware of implementation bug numbers, can you add them to the wiki? Thanks. |
FYI, JavaScriptCore rejects duplicate variants and throws RangeError from https://trac.webkit.org/changeset/266039/webkit |
SpiderMonkey bug for this is https://bugzilla.mozilla.org/show_bug.cgi?id=1670165 with patch posted. The test in it ought fairly readily be massaged into a test262 test -- I'll see about doing that, likely tomorrow. |
tc39/test262#2858 adds a test for this to test262. |
ECMA-402 consensus: https://github.com/tc39/ecma402/blob/master/meetings/notes-2021-01-14.md#normative-do-not-allow-duplicate-variants-within-the-tlang-component-of-a-transformed_extensions-429 @jswalden to evaluate whether we need MDN. Tracking issue: tc39/ecma402-mdn#18 |
This achieved TC39 consensus today. |
That is, because
en-emodeng-emodeng
with duplicatedemodeng
variant is not structurally valid, neither oughtde-t-en-emodeng-emodeng
be valid. This fixes one of the issues raised in #330.This patch's wording (and how it talks about restrictions on subtag sequences) is somewhat messy. Suggestions obviously appreciated for how to do this more clearly or concisely.