Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intl402/Locale/prototype/minimize/removing-likely-subtags-first-adds-likely-subtags.js doesn't match web reality & LDML spec #2628

Open
longlho opened this issue May 19, 2020 · 9 comments

Comments

@longlho
Copy link
Contributor

longlho commented May 19, 2020

Specifically

und -> und
und-Thai -> und-Thai
und-419 -> und-419
und-150 -> und-150
und-AT -> und-AT
und-CW -> und-CW
und-US -> und
zh-Hant -> zh-Hant

We caught this while working on our Intl.Locale polyfill. Checked on Chrome and that seems to agree.

@anba
Copy link
Contributor

anba commented May 19, 2020

See #2598, the linked ICU issues in the test case, and https://unicode-org.atlassian.net/browse/CLDR-13749.

@longlho
Copy link
Contributor Author

longlho commented May 20, 2020

ah thanks for the context @anba . In that case, everything checks out except for und-150 (which should be ru-150 instead of ru right)?

Maximize:
und-150 -> ru-Cyrl-RU -> ru-Cyrl-150 (since region 150 comes from source)

Minimize:
ru -> ru-Cyrl-RU (no match)
ru-150 -> ru-Cyrl-150 (match)

longlho added a commit to formatjs/formatjs that referenced this issue May 20, 2020
@anba
Copy link
Contributor

anba commented May 20, 2020

Hmm, this may need to be clarified on the CLDR side.

From http://unicode.org/reports/tr35/#Likely_Subtags:

Note that as of CLDR v24, any field present in the 'from' field, is also present in the 'to' field, so an input field will not change in "Add Likely Subtags" operation.

From supplemental/likelySubtags.xml:

<likelySubtag from="und_150" to="ru_Cyrl_RU"/>

The quoted text from UTS 35 doesn't seem to be true when the base language subtag is und.


Based on the results I see for ICU, maybe ICU evaluates this following step:

Let xr = xs if xs is not empty, and xm otherwise.

where empty is defined as:

A subtag is called empty if it is a missing script or region subtag, or it is a base language subtag with the value "und".

as if the empty was defined as:

A subtag is called empty if it is a missing script or region subtag, or if the base language subtag has the value "und".

@anba
Copy link
Contributor

anba commented May 21, 2020

cc @FrankYFTang

@longlho
Copy link
Contributor Author

longlho commented May 21, 2020

Yeah that's the confusing part where I'm also not quite sure if the typo explains it.

@longlho
Copy link
Contributor Author

longlho commented May 25, 2020

OK so the spec text isn't really correct IMO, tracing thru ICU4J (since I don't know C): https://github.com/unicode-org/icu/blob/4231ca5be053a22a1be24eb891817458c97db709/icu4j/main/classes/core/src/com/ibm/icu/util/ULocale.java#L2877-L2899

looks like maximize takes in language & region if there is a match for language_region, in this case it'd turn und-RU into ru-Cyrl-RU which matches the test result.

if (likelySubtags != null) {
                // Always use the language tag from the
                // maximal string, since it may be more
                // specific than the one provided.
                return createTagString(
                        null,
                        script,
                        null,
                        variants,
                        likelySubtags);
            }

So either it's an impl issue, or spec issue

@rwaldron
Copy link
Contributor

@longlho is there any further action to take in Test262?

@longlho
Copy link
Contributor Author

longlho commented Aug 21, 2020

I'll put in a jira for Unicode, or maybe @sffc can clarify?

@sffc
Copy link
Contributor

sffc commented Sep 4, 2020

The language tag canonicalization algorithm in UTS 35 is currently in flux, with an attempt to make it more clear and cover more of these edge cases.

@macchiati @FrankYFTang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants