Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LocaleFallbackProvider to tutorial examples #3334

Merged
merged 4 commits into from
Aug 29, 2023
Merged

Conversation

sffc
Copy link
Member

@sffc sffc commented Apr 15, 2023

Fixes #3331

CC @autarch

@sffc sffc requested a review from robertbastian April 15, 2023 18:49
@sffc sffc requested a review from a team as a code owner April 15, 2023 18:49
@@ -178,6 +191,11 @@ impl_data_provider!(UnstableProvider);
fn main() {
let unstable_provider = UnstableProvider;

// Wrapping the raw UnstableProvider in a LocaleFallbackProvider enables
// locales to fall back to other locales, like "en-US" to "en".
let unstable_provider = LocaleFallbackProvider::try_new_unstable(unstable_provider)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying this in my code but I get an error like this:

et provider = LocaleFallbackProvider::try_new_unstable(UnstableProvider)?;
    |                    ---------------------------------------- ^^^^^^^^^^^^^^^^ the trait `icu_provider::DataProvider<LocaleFallbackLikelySubtagsV1Marker>` is not implemented for `UnstableProvider`
    |                    |
    |                    required by a bound introduced by this call
    |
    = help: the following other types implement trait `icu_provider::DataProvider<M>`:
              <UnstableProvider as icu_provider::DataProvider<CanonicalDecompositionDataV1Marker>>
              <UnstableProvider as icu_provider::DataProvider<CanonicalDecompositionTablesV1Marker>>
              <UnstableProvider as icu_provider::DataProvider<CollationDataV1Marker>>
              <UnstableProvider as icu_provider::DataProvider<CollationDiacriticsV1Marker>>
              <UnstableProvider as icu_provider::DataProvider<CollationJamoV1Marker>>
              <UnstableProvider as icu_provider::DataProvider<CollationMetadataV1Marker>>
              <UnstableProvider as icu_provider::DataProvider<CollationReorderingV1Marker>>
              <UnstableProvider as icu_provider::DataProvider<CollationSpecialPrimariesV1Marker>>
note: required by a bound in `LocaleFallbackProvider::<P>::try_new_unstable`
   --> /home/autarch/.cargo/registry/src/github.com-1ecc6299db9ec823/icu_provider_adapters-1.2.0/src/fallback/adapter.rs:54:8
    |
54  |     P: DataProvider<LocaleFallbackLikelySubtagsV1Marker>
    |        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `LocaleFallbackProvider::<P>::try_new_unstable`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this is because I had to add more keys to the data generated by icu4x-datagen. The workflow around this is pretty frustrating:

  1. Generate all the keys.
  2. Build my program, which takes forever since I generated all the keys.
  3. Generate using --keys-for-bin.
  4. Add a new locale-related bit of code, go to step Transpilation approach #1.

It'd be nice if there was more guidance on which keys are needed for each part of the icu crate's API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, yes, acknowledged that this process is a bit frustrating. It was designed to give more control over the data lifecycle than ICU4C, and it goes very much to the extreme in that direction. For the next 1.3 release, we have #2945 to streamline this for users who don't need all the control.

That said, there are ways you can speed up your development loop. For example, in step 1 and 2, if you generate data with --locales none, it builds a lot faster, and still works with --keys-for-bin.

It'd be nice if there was more guidance on which keys are needed for each part of the icu crate's API.

You can see the list of keys in the signature of the try_new_unstable functions. For example: LineSegmenter::try_new_auto_unstable has the following bounds:

DataProvider<LineBreakDataV1Marker> + DataProvider<LstmForWordLineAutoV1Marker> + DataProvider<GraphemeClusterBreakDataV1Marker>

If you click into those three data markers, it gives you the key in the docs string.

Copy link
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right location for this. The datagen tutorial should be minimal and focused on generating data and obtaining DataProviders. It uses the same example from intro.md over and over again, only changing the way the provider is constructed (the rest of the examples can be glanced over as it's already familiar).

Fallback is a concept that should be taught before data customisation, as it also applies if the user doesn't bring their own data. You should put it in intro.md, either "5. Using an ICU4X component", or in a new section, and explain it in the text rather than a code comment.

@sffc
Copy link
Member Author

sffc commented Apr 18, 2023

I don't think this is the right location for this. The datagen tutorial should be minimal and focused on generating data and obtaining DataProviders. It uses the same example from intro.md over and over again, only changing the way the provider is constructed (the rest of the examples can be glanced over as it's already familiar).

Fallback is a concept that should be taught before data customisation, as it also applies if the user doesn't bring their own data. You should put it in intro.md, either "5. Using an ICU4X component", or in a new section, and explain it in the text rather than a code comment.

My response is that we should consider a data provider ill-defined unless we are in one of the following three scenarios:

  1. Runtime fallbacking enabled
  2. Compile-time fallbacking enabled (Add smarter locale filtering in DataExporter #834) and the locale is known to be in the subset specified when building the data file (even in this situation, it is likely necessary to include a naive fallback that marshals the Unicode extension keywords)
  3. No APIs in use require a locale parameter (normalization, segmenter)

Therefore, the code currently shown in the tutorial is not code a user should ever write.

@sffc sffc requested a review from Manishearth April 18, 2023 01:25
@robertbastian
Copy link
Member

My response is that we should consider a data provider ill-defined unless we are in one of the following three scenarios

Ok, then document this somewhere? Following this line of reasoning we should deprecate the current provider constructors and have "safer" ones. The two step construct-and-wrap seems too optional for that.

Therefore, the code currently shown in the tutorial is not code a user should ever write.

This also applies to intro.md then. The examples in the datagen tutorial are line for line the example from intro.md with icu_testdata swapped out for other data provider constructors. Now icu_testdata already includes fallback, but users don't know that, there should be a dedicated section explaining this.

@robertbastian
Copy link
Member

My response is that we should consider a data provider ill-defined unless we are in one of the following three scenarios:

  1. Runtime fallbacking enabled
  2. Compile-time fallbacking enabled (Add smarter locale filtering in DataExporter #834) and the locale is known to be in the subset specified when building the data file (even in this situation, it is likely necessary to include a naive fallback that marshals the Unicode extension keywords)
  3. No APIs in use require a locale parameter (normalization, segmenter)

Therefore, the code currently shown in the tutorial is not code a user should ever write.

With singleton keys and #2683 I think we're almost there. Do we have a solution for extension keywords when not using runtime fallback?

We'll update the tutorial with these changes in 1.3 so I think we can close this.

@sffc
Copy link
Member Author

sffc commented Aug 2, 2023

I prefer to merge this because:

  1. Real people are hitting this issue in 1.2 (Confusing error when trying to create Collator for en-US #3331, MissingLocale error when creating Collator from BlobDataProvider #3654); for every report we get, there are 10x as many people who don't report the issue and give up using ICU4X.
  2. Even in 1.3, you should be setting up your fallbacker manually if using blob data
  3. We can use the changes proposed in this PR as a baseline for the 1.3 tutorial updates

Do we have a solution for extension keywords when not using runtime fallback?

I've thought about this a number of times but never made an issue to discuss it. #3764

@sffc sffc requested a review from robertbastian August 2, 2023 15:42
Manishearth
Manishearth previously approved these changes Aug 2, 2023
@robertbastian
Copy link
Member

Still think this concept should be introduced in intro.md.

Copy link
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@sffc
Copy link
Member Author

sffc commented Aug 26, 2023

I added an explanatory section in the text for why we need the LocaleFallbackProvider.

Now that #2683 is finally landed, it will be the case even in 1.3 that blob and fs providers need this. So, this tutorial change is not obsolete and should still be merged.

Still think this concept should be introduced in intro.md.

No, because intro.md uses icu_testdata::unstable() and in 1.3 it will use compiled data.

@sffc sffc merged commit ae22efa into unicode-org:main Aug 29, 2023
26 checks passed
@sffc sffc deleted the lfp-xmp branch August 29, 2023 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Confusing error when trying to create Collator for en-US
4 participants