Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BestAvailableLocale operation requires additional clarifications/changes to handle extensions #685

Closed
jedel1043 opened this issue May 31, 2022 · 7 comments · Fixed by #824
Labels
c: locale Component: locale identifiers s: in progress Status: the issue has an active proposal Small Smaller change solvable in a Pull Request
Milestone

Comments

@jedel1043
Copy link

jedel1043 commented May 31, 2022

Hello! I'm part of the development team of boa, where we are developing an ECMAScript engine with Intl support, and we have a bit of a problem with how the BestAvailableLocale operation is described.

For context, I had a discussion with some of the icu4x folks (boa-dev/boa#2072 (comment)) and we discovered that the BestAvailableLocale operation, as currently described in the latest revision, tries to match invalid Unicode BCP 47 locale identifiers when executing the algorithm described.

boa-dev/boa#2072 (comment)

I don't think Language Identifiers are the problem per se; I can easily manipulate LanguageIdentifiers using the provided API, and a canonicalized Language Identifier cannot become invalid by following the spec algorithm. The problem are extensions, which could become invalid if that algorithm is followed as stated e.g. with hi-t-en-h0-hybrid it would try to match with hi-t-en-h0, which has an invalid t extension since the h0 key ought to have a value.

The team kindly recommended us to treat the input locale as a simple Language Identifier instead, which should work but doesn't match the algorithm described. Knowing that, the question about extensions still remains, and I would like to open this issue as a request to add additional clarifications on BestAvailableLocale. Some of our questions are:

  • Should we treat locale as a plain Language Identifier?
  • Should we extract the Language Identifier part from locale and try to match with that, then append the removed extensions to the matched locale?
  • Is there another way to describe this algorithm that doesn't generate invalid locales and doesn't treat them as simple string literals?

Thanks!

cc @zbraniecki @sffc

@zbraniecki
Copy link
Member

The team kindly recommended us to treat the input locale as a simple Language Identifier instead, which is a bit of a "hack" but it should work.

FWIW, I don't think it's a hack. It is what the operation meant to be - take LanguageIdentifier and cut from the right. It's dummy (we need data to do better), but okayish. What we didn't anticipate when desigining it is that we'll have extensions in play. No CLDR data has extensions to match on, so we can safely just remove it.

@sffc
Copy link
Contributor

sffc commented Jun 1, 2022

A very basic algorithm is to first clear all extension keywords, which you can do by pulling the LanguageIdentifier out of the Locale, and then removing trailing subtags one by one.

@jedel1043
Copy link
Author

The team kindly recommended us to treat the input locale as a simple Language Identifier instead, which is a bit of a "hack" but it should work.

FWIW, I don't think it's a hack. It is what the operation meant to be - take LanguageIdentifier and cut from the right. It's dummy (we need data to do better), but okayish. What we didn't anticipate when desigining it is that we'll have extensions in play. No CLDR data has extensions to match on, so we can safely just remove it.

Completely agree. I edited the issue to better express this.

@zbraniecki
Copy link
Member

@sffc should we clarify it in the spec? The current algo makes it seem like you should keep cutting out from the tail end one subtag after another including extensions. Which, on top of being useless, leads to invalid states of locale.

@sffc sffc added s: help wanted Status: help wanted; needs proposal champion Small Smaller change solvable in a Pull Request c: locale Component: locale identifiers labels Jun 1, 2022
@sffc
Copy link
Contributor

sffc commented Jun 1, 2022

Yep; this issue is available for anyone who wants to work on it.

@sffc sffc modified the milestones: ES 2022, ES 2023 Jun 1, 2022
@sffc
Copy link
Contributor

sffc commented Jun 1, 2022

Possibly related to #213

@johndoe-glitch

This comment was marked as spam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: locale Component: locale identifiers s: in progress Status: the issue has an active proposal Small Smaller change solvable in a Pull Request
Projects
None yet
4 participants