Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add smarter locale filtering in DataExporter #834

Open
sffc opened this issue Jun 29, 2021 · 2 comments
Open

Add smarter locale filtering in DataExporter #834

sffc opened this issue Jun 29, 2021 · 2 comments
Assignees
Labels
C-data-infra Component: provider, datagen, fallback, adapters S-medium Size: Less than a week (larger bug fix or enhancement) T-core Type: Required functionality

Comments

@sffc
Copy link
Member

sffc commented Jun 29, 2021

ICU4C's data build tool has a fairly smart algorithm when figuring out which locales to include based on an allowlist: all children and parents. For example, including "en-001" results in all parents ("en", "root") and children ("en-GB", "en-ZA", ...) being included.

We should implement this as well in ICU4X. It should be done in the LanguageIdentifierFilter trait in icu_provider.

This depends on the resolution to #173, since the locale fallback chain will need to be computed by this filter.

@sffc sffc added T-core Type: Required functionality C-data-infra Component: provider, datagen, fallback, adapters blocked A dependency must be resolved before this is actionable S-medium Size: Less than a week (larger bug fix or enhancement) labels Jun 29, 2021
@sffc sffc added this to the ICU4X 0.5 milestone Jul 15, 2021
@sffc sffc self-assigned this Jul 15, 2021
mildgravitas added a commit to mildgravitas/icu4x that referenced this issue Dec 7, 2021
The latter uses non default weekdata which is useful for week-of-year/month
tests.

I've looked at replacing fr with fr-BE to keep the number of locales
constant but fails due to plurals data not having locale regions. This should
be fixed by unicode-org#834 which will also IIUC add all regional variants to testdata
anyways.
mildgravitas added a commit to mildgravitas/icu4x that referenced this issue Dec 7, 2021
The latter uses non default weekdata which is useful for week-of-year/month
tests.

I've looked at replacing fr with fr-BE to keep the number of locales
constant but fails due to plurals data not having locale regions. This should
be fixed by unicode-org#834 which will also IIUC add all regional variants to testdata
anyways.
mildgravitas added a commit to mildgravitas/icu4x that referenced this issue Dec 7, 2021
The latter uses non default weekdata which is useful for week-of-year/month
tests.

I've looked at replacing fr with fr-BE to keep the number of locales
constant but fails due to plurals data not having locale regions. This should
be fixed by unicode-org#834 which will also IIUC add all regional variants to testdata
anyways.
mildgravitas added a commit to mildgravitas/icu4x that referenced this issue Dec 7, 2021
The latter uses non default weekdata which is useful for week-of-year/month
tests.

I've looked at replacing fr with fr-BE to keep the number of locales
constant but fails due to plurals data not having locale regions. This should
be fixed by unicode-org#834 which will also IIUC add all regional variants to testdata
anyways.
mildgravitas added a commit to mildgravitas/icu4x that referenced this issue Jan 4, 2022
The latter uses non default weekdata which is useful for week-of-year/month
tests.

I've looked at replacing fr with fr-BE to keep the number of locales
constant but fails due to plurals data not having locale regions. This should
be fixed by unicode-org#834 which will also IIUC add all regional variants to testdata
anyways.
@sffc sffc modified the milestones: ICU4X 0.5, ICU4X 0.6 Jan 9, 2022
mildgravitas added a commit to mildgravitas/icu4x that referenced this issue Jan 10, 2022
The latter uses non default weekdata which is useful for week-of-year/month
tests.

I've looked at replacing fr with fr-BE to keep the number of locales
constant but fails due to plurals data not having locale regions. This should
be fixed by unicode-org#834 which will also IIUC add all regional variants to testdata
anyways.
mildgravitas added a commit to mildgravitas/icu4x that referenced this issue Jan 14, 2022
The latter uses non default weekdata which is useful for week-of-year/month
tests.

I've looked at replacing fr with fr-BE to keep the number of locales
constant but fails due to plurals data not having locale regions. This should
be fixed by unicode-org#834 which will also IIUC add all regional variants to testdata
anyways.
mildgravitas added a commit to mildgravitas/icu4x that referenced this issue Feb 3, 2022
The latter uses non default weekdata which is useful for week-of-year/month
tests.

I've looked at replacing fr with fr-BE to keep the number of locales
constant but fails due to plurals data not having locale regions. This should
be fixed by unicode-org#834 which will also IIUC add all regional variants to testdata
anyways.
mildgravitas added a commit to mildgravitas/icu4x that referenced this issue Feb 4, 2022
The latter uses non default weekdata which is useful for week-of-year/month
tests.

I've looked at replacing fr with fr-BE to keep the number of locales
constant but fails due to plurals data not having locale regions. This should
be fixed by unicode-org#834 which will also IIUC add all regional variants to testdata
anyways.
mildgravitas added a commit to mildgravitas/icu4x that referenced this issue Feb 4, 2022
The latter uses non default weekdata which is useful for week-of-year/month
tests.

I've looked at replacing fr with fr-BE to keep the number of locales
constant but fails due to plurals data not having locale regions. This should
be fixed by unicode-org#834 which will also IIUC add all regional variants to testdata
anyways.
@sffc sffc modified the milestones: ICU4X 0.6, ICU4X 1.0 (Features) May 25, 2022
@sffc
Copy link
Member Author

sffc commented Jun 26, 2022

This does not need to block 1.0 because this is something we can add incrementally.

@sffc sffc removed the blocked A dependency must be resolved before this is actionable label Jun 26, 2022
@Manishearth
Copy link
Member

Manishearth commented Sep 30, 2022

A major component of this would be #2683 (unclear what other things can be done along this axis)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-data-infra Component: provider, datagen, fallback, adapters S-medium Size: Less than a week (larger bug fix or enhancement) T-core Type: Required functionality
Projects
None yet
Development

No branches or pull requests

2 participants