Ergonomic API & Data Providers #30

nciric · 2020-04-15T20:23:54Z

Based on pull request #28 I would like to discuss ways we can deal with data providers and end client API (in document referred as ergonomic API).

I feel that average developer shouldn't care where the data comes from, but should be aware of async nature of the request, as long as project as a whole can set it up for them. Think about Chrome, where Browser/Renderer processes set up data to be fetched from disk or if missing from a service. Ordinary developer wouldn't need to make that decision on every point of interaction with our API.

A similar approach to what @zbraniecki proposed for caching can be applied to data providers. We can have a simple DataProviderCache object, that's globally available to all constructors/methods. I don't expect that a single instance of our library will have more than handful different providers (if that), so cache would be fairly small.

An example of DataProviderCache initialization:

data_provider_cache = DataProviderCache()
data_provider_cache.insert('static_data', static_provider[, preference_level_0])
data_provider_cache.insert('aws_data', aws_provider[, preference_level_1])
data_provider_cache.insert('slow_data', slow_provider[, preference_level_2])
...

Preference level was added in case two providers can supply the same data set, but potentially with higher cost to speed, dollar amount etc.

Each data provider would know which locale it can handle, and data it can provide for each. It would also be able to tell if it already has that data so new fetch is not necessary.

Our ergonomic API in that case would be in a shape of:

Intl.NumberFormat(locale, options)

or if we want to enable developers to enforce specific data sources:

Intl.NumberFormat(locale, options, ['static_data', 'aws_data'])

The text was updated successfully, but these errors were encountered:

nciric · 2020-04-15T20:24:36Z

@hagbard to discussion, I know he had some thoughts about this.

sffc · 2020-04-16T07:45:32Z

I feel that average developer shouldn't care where the data comes from

The developer, should, at some point in time, make some kind of conscious decision about where to load the data from: in-memory, data file, service, operating system, etc. The decision could be made when installing ICU4X, when writing code using ICU4X, or at some other point in the lifecycle.

Locale data is such a fundamental piece of i18n infrastructure that we would be doing a disservice by hiding it under the hood, which is largely what ICU4C and especially ICU4J tend to do.

but should be aware of async nature of the request, as long as project as a whole can set it up for them.

I don't understand what you mean by "as long as project as a whole can set it up for them".

Think about Chrome, where Browser/Renderer processes set up data to be fetched from disk or if missing from a service. Ordinary developer wouldn't need to make that decision on every point of interaction with our API.

Just to make sure we're on the same page, is it okay in your opinion if we make the developer "await" objects? That's the point of view I have been taking since I first started discussing this in tc39/ecma402#210.

A similar approach to what @zbraniecki proposed for caching can be applied to data providers. We can have a simple DataProviderCache object, that's globally available to all constructors/methods. I don't expect that a single instance of our library will have more than handful different providers (if that), so cache would be fairly small.

The word "cache" is misleading here, because, if I understand correctly, it appears that this object doesn't actually cache any data. A more appropriate name would be "registry".

However, I actually see no reason for a registry given a flexible data provider trait. If it's too ugly to pass a data provider into every constructor, a single default data provider can be provided in global state.

Preference level was added in case two providers can supply the same data set, but potentially with higher cost to speed, dollar amount etc.

The mechanics of "preference level" would be handled by a userland forking data provider according to my proposal in data-pipeline.md.

or if we want to enable developers to enforce specific data sources:
Intl.NumberFormat(locale, options, ['static_data', 'aws_data'])

I would rather have this decision made in a custom userland data provider.

hagbard · 2020-04-16T12:10:59Z

I can't even follow this. Can we at least have a collaborative doc where people write down the straw man ideas for what calling code would look like (no consideration for implementation yet, just the API, maybe just pseudo code for now). This should include enough information to understand the conceptual cost of the call (i.e. not just the call, but surrounding code to handle set-up and error handling if needed). I can't begin to form a coherent shape of what anyone thinks the API will look like from the caller's perspective, and so can't judge these points at all. Are async calls being made from setup logic (i.e. get me a segmenter for this locale) or in the business logic (e.g. when formatting a message with a date in, inside a UI thread)? I'm fine with async in a situation where the user can deal with failures (all async APIs must be allowed to fail). This is onerous to the caller, but if that's in a situation where callers are expected (by proposed best practice) to be reasonably able to handle failure, that's fine. Async in our own business logic with ICU4X but then somehow hidden from users feels like all sorts of trouble. David

…

On Thu, 16 Apr 2020 at 09:45, Shane F. Carr ***@***.***> wrote: I feel that average developer shouldn't care where the data comes from The developer, should, at some point in time, make some kind of conscious decision about where to load the data from: in-memory, data file, service, operating system, etc. The decision could be made when installing ICU4X, when writing code using ICU4X, or at some other point in the lifecycle. Locale data is such a fundamental piece of i18n infrastructure that we would be doing a disservice by hiding it under the hood, which is largely what ICU4C and especially ICU4J tend to do. but should be aware of async nature of the request, as long as project as a whole can set it up for them. I don't understand what you mean by "as long as project as a whole can set it up for them". Think about Chrome, where Browser/Renderer processes set up data to be fetched from disk or if missing from a service. Ordinary developer wouldn't need to make that decision on every point of interaction with our API. Just to make sure we're on the same page, is it okay in your opinion if we make the developer "await" objects? That's the point of view I have been taking since I first started discussing this in tc39/ecma402#210 <tc39/ecma402#210>. A similar approach to what @zbraniecki <https://github.com/zbraniecki> proposed for caching can be applied to data providers. We can have a simple DataProviderCache object, that's globally available to all constructors/methods. I don't expect that a single instance of our library will have more than handful different providers (if that), so cache would be fairly small. The word "cache" is misleading here, because, if I understand correctly, it appears that this object doesn't actually cache any data. A more appropriate name would be "registry". However, I actually see no reason for a registry given a flexible data provider trait. If it's too ugly to pass a data provider into every constructor, a single default data provider can be provided in global state. Preference level was added in case two providers can supply the same data set, but potentially with higher cost to speed, dollar amount etc. The mechanics of "preference level" would be handled by a userland forking data provider according to my proposal in data-pipeline.md <https://github.com/unicode-org/omnicu/blob/master/docs/data-pipeline.md>. or if we want to enable developers to enforce specific data sources: Intl.NumberFormat(locale, options, ['static_data', 'aws_data']) I would rather have this decision made in a custom userland data provider. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#30 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGTWYSOUNFNBL4OKZSEIE3RM2ZSZANCNFSM4MI4ANAA> .

-- David Beaumont :: Îñţérñåţîöñåļîžåţîờñ Libraries :: Google Google Switzerland GmbH., Brandschenkestrasse 110, CH-8002, Zürich - Switzerland

sffc · 2020-06-23T02:04:33Z

I don't see what is immediately actionable on this issue. We currently have two Markdown files that discuss the subjects of data provider and ergonomic API. I am putting it on the backlog to revisit before v1 to make sure we end up with something consistent with what @nciric wrote in the OP.

sffc · 2022-04-01T17:17:48Z

Here is a doc explaining async data providers:

https://docs.google.com/document/d/1haiE_XsYpyDGNpAKTZWhRwU0TU-OjDURDiZtIVjYkCk/edit

Let's make a full-stack async provider in scope for 1.1.

nciric mentioned this issue Apr 15, 2020

Adding initial writeup of wrapper-layer.md #28

Merged

sffc added the T-docs-tests Type: Code change outside core library label Apr 16, 2020

sffc self-assigned this Apr 17, 2020

sffc added C-process Component: Team processes A-design Area: Architecture or design C-data-infra Component: provider, datagen, fallback, adapters and removed C-process Component: Team processes labels May 7, 2020

sffc added this to the 2020 Q2 milestone Jun 17, 2020

sffc closed this as completed Jun 23, 2020

sffc added backlog labels Jun 23, 2020

sffc removed this from the 2020 Q2 milestone Jun 23, 2020

sffc reopened this Sep 4, 2020

sffc added the question Unresolved questions; type unclear label Apr 3, 2021

sffc mentioned this issue Jul 29, 2021

Add caching DataProvider #919

Open

sffc added this to the ICU4X 1.1 milestone Apr 1, 2022

sffc removed backlog labels Apr 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ergonomic API & Data Providers #30

Ergonomic API & Data Providers #30

nciric commented Apr 15, 2020 •

edited

nciric commented Apr 15, 2020

sffc commented Apr 16, 2020

hagbard commented Apr 16, 2020 via email

sffc commented Jun 23, 2020

sffc commented Apr 1, 2022

Ergonomic API & Data Providers #30

Ergonomic API & Data Providers #30

Comments

nciric commented Apr 15, 2020 • edited

nciric commented Apr 15, 2020

sffc commented Apr 16, 2020

hagbard commented Apr 16, 2020 via email

sffc commented Jun 23, 2020

sffc commented Apr 1, 2022

nciric commented Apr 15, 2020 •

edited