Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update locale canonicalization to use bcp47 alias data #746

Open
dminor opened this issue May 31, 2021 · 5 comments
Open

Update locale canonicalization to use bcp47 alias data #746

dminor opened this issue May 31, 2021 · 5 comments
Labels
C-locale Component: Locale identifiers, BCP47 good first issue Good for newcomers help wanted Issue needs an assignee S-medium Size: Less than a week (larger bug fix or enhancement) T-core Type: Required functionality

Comments

@dminor
Copy link
Contributor

dminor commented May 31, 2021

In #218, we're adding locale canonicalization based upon CLDR json aliases.json data. This data is missing a handful of aliases that are defined in the bcp xml data. Once this data is added to json as tracked by #562, we'll be able to update the locale_canonicalizer to use these aliases as well.

This is blocked on both #218 and #562.

@dminor dminor added T-core Type: Required functionality C-locale Component: Locale identifiers, BCP47 blocked A dependency must be resolved before this is actionable S-medium Size: Less than a week (larger bug fix or enhancement) labels May 31, 2021
@sffc sffc added the help wanted Issue needs an assignee label Jun 4, 2021
@sffc sffc added this to the ICU4X 0.4 milestone Jun 4, 2021
@dminor dminor modified the milestones: ICU4X 0.4, ICU4X 0.5 Aug 26, 2021
@sffc sffc added backlog and removed blocked A dependency must be resolved before this is actionable labels Jan 27, 2022
@sffc sffc removed this from the ICU4X 0.5 milestone Jan 27, 2022
@sffc
Copy link
Member

sffc commented Jan 27, 2022

@dminor Do you consider this to be a 1.0 blocker? Is it required for spec compliance?

@sffc sffc added this to the ICU4X 1.0 milestone Apr 1, 2022
@sffc sffc added good first issue Good for newcomers and removed backlog labels Apr 1, 2022
@sapriyag sapriyag added the discuss-priority Discuss at the next ICU4X meeting label May 25, 2022
@dminor
Copy link
Contributor Author

dminor commented May 26, 2022

@dminor Do you consider this to be a 1.0 blocker? Is it required for spec compliance?

Not fixing this is a bug, but it's a pretty minor bug, the handful of missing aliases are very much edge cases. I think we can comfortably fix this post 1.0. I suggest punting it.

@kartva
Copy link
Member

kartva commented Mar 31, 2024

My understanding so far:

  • this repository contains alias data.
  • I have to interact with icu-datagen in some way to include these files into the locid_transform crate.
    • Presumably by install the icu-datagen binary tool, and committing the generated files in the repository.
  • Create an AliasesV3 struct that includes these new sources of alias data.

@kartva
Copy link
Member

kartva commented Apr 8, 2024

I've obtained the calendar.json file that seems to contain JSON data by running the download-repo-sources tool. Other bcp47 JSON files can presumably be acquired using the same process.

Next steps:

  • write serde-mapping structs in provider/datagen/src/transform/cldr/cldr_serde/bcp47_*.rs (for each relevant bcp47 file).
  • create AliasesV3, parse and store bcp47 alias data in impl From<&cldr_serde::aliases::Resource> for AliasesV3<'_>, then impl DataProvider for AliasesV3

@sffc do you see anything that I might be missing?

@sffc
Copy link
Member

sffc commented Apr 8, 2024

This sounds right. I'm not sure if you should need a new AliasesV3. But yes the general idea of pulling the JSON files in with download-repo-sources and then getting them into a canonicalizer data structure is correct. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-locale Component: Locale identifiers, BCP47 good first issue Good for newcomers help wanted Issue needs an assignee S-medium Size: Less than a week (larger bug fix or enhancement) T-core Type: Required functionality
Projects
None yet
Development

No branches or pull requests

4 participants