Refine locale filtering in data exporter #2072
Labels
A-design
Area: Architecture or design
C-data-infra
Component: provider, datagen, fallback, adapters
duplicate
This issue or pull request already exists
S-large
Size: A few weeks (larger feature, major refactoring)
T-core
Type: Required functionality
ICU4C's data build tool has a really nice algorithm for selecting locales to be included in the data file. We should implement this algorithm in ICU4X. (Disclaimer: I wrote the ICU4C tool in 2019.)
The algorithm is based on the idea that the space of all locales forms a DAG (almost a Tree but not quite). When the user requests that we add a certain locale, we add the full parent chain for that locale, and we also add all children for that locale that have data. This algorithm balances the need to reduce data size with the i18n quality that comes with including regional variants, which are cheap if their parents are being included.
As part of this project, we should also refine the way we handle fully resolved locales. The list of requested locales should become our list of fully resolved locales, and we should pre-compute the fallback data for them and store those pointers in the data provider.
The text was updated successfully, but these errors were encountered: