Clarify "combining options" support in the spec #26

sffc · 2018-11-28T01:13:32Z

The "combining options" feature is allowed to be implementation-dependent, but the spec should at least require that no information is lost when processing it. Investigate how to do this in the data loading section of the spec text.

littledan · 2018-12-05T15:27:15Z

What is allowed to be implementation-dependent about it, and what's the rationale for that?

sffc · 2019-02-27T03:40:20Z

@leobalter

This has to do with the "LocaleData" internal field. This is the part of the spec that is really long and wordy. I made a flow chart to demonstrate the data specified to be in LocaleData:

Once you're out the bottom of the flow chart (into negativePattern, zeroPattern, or positivePattern), then this line from the spec applies:

The value of the aforementioned fields (the sign-dependent pattern fields) must be string values that must contain the substring "{number}" and may contain the substrings "{plusSign}", "{minusSign}", "{compactSymbol}", "{compactName}", "{scientificSeparator}", and "{scientificExponent}"; the values within the "currency" field must also contain one of the following substrings: "{currencyCode}", "{currencySymbol}", "{currencyNarrowSymbol}", or "{currencyName}"; and the values within the "unit" field must also contain one of the following substrings: "{percentSign}", "{unitSymbol}", "{unitNarrowSymbol}", or "{unitName}".

What this means: if a user requests to format a currency like GBP, or a unit like meter, the implementation is required to produce a string that has a placeholder for the currency sign or the unit name.

For the notation, if the implementation doesn't know how to display a certain notation, that's fine as long as the implementation doesn't artificially change the magnitude of the number without providing a corresponding pattern. However, right now, the spec doesn't forbid them from doing that.

FYI, the scale of the number is determined by ComputeExponentForMagnitude, which reads:

    1. Let _notation_ be _numberFormat_.[[Notation]].
    1. If _notation_ is `"standard"`, then
      1. Return 0.
    1. Else If _notation_ is `"scientific"`, then
      1. Return _magnitude_.
    1. Else If _notation_ is `"engineering"`, then
      1. Let _thousands_ be the greatest integer that is not greater than _magnitude_ ÷ 3.
      1. Return _thousands_ × 3.
    1. Else,
      1. Assert: _notation_ is `"compact"`.
      1. Let _exponent_ be an implementation-dependent integer by which to scale a number of the given magnitude in compact notation for the current locale.
      1. Return _exponent_.

Probably what we need to do is:

When scientific notation is requested, require that implementations provide a valid scientific notation string (one containing "{scientificSeparator}" and "{scientificExponent}")
When compact notation is requested, require that if implementations give a scale in ComputeExponentForMagnitude, then they must also provide "{compactSymbol}" or "{compactName}".

Does that make any sense? Thoughts?

leobalter · 2019-02-27T22:45:57Z

@sffc I did not have full time to absorb all of this information today. I have dedicated time tomorrow to look at this and I just wanted to ping you with this update.

Thanks for being descriptive and the flow chart.

leobalter · 2019-02-28T23:38:13Z

I haven't found a full conclusion on the topic but I guess we could reuse the flow chart in the specs for a better visualization? If the image is too big maybe SVG is doable?

What this means: if a user requests to format a currency like GBP, or a unit like meter, the implementation is required to produce a string that has a placeholder for the currency sign or the unit name.

Seems valid.

For the notation, if the implementation doesn't know how to display a certain notation, that's fine as long as the implementation doesn't artificially change the magnitude of the number without providing a corresponding pattern.

I agree.

However, right now, the spec doesn't forbid them from doing that.

When scientific notation is requested, require that implementations provide a valid scientific notation string (one containing "{scientificSeparator}" and "{scientificExponent}")

+1

When compact notation is requested, require that if implementations give a scale in ComputeExponentForMagnitude, then they must also provide "{compactSymbol}" or "{compactName}".

I'm +1 but guess we use just one notation? If we can't reuse Symbol and Name together, "{compactIdentifier}" for both, maybe? I'm not sure if I'm saying the correct thing here.

Back from the top post:

the spec should at least require that no information is lost when processing it. Investigate how to do this in the data loading section of the spec text.

I'm in favor and I appreciate the proposed outcome. Hopefully it finds acceptance for possible compatibility across implementations.

littledan · 2019-03-18T15:48:31Z

Traditionally, in ECMA-402, for things which are in ICU algorithms but not specified in CLDR, we specified the algorithm in ECMA-402 itself. This can be helpful for people who are writing polyfills based on the CLDR data, so they don't have to read ICU source. Would such an approach be possible here?

sffc · 2019-03-19T04:30:49Z

Another problem is that there are locales that might not have a number in the unit or compact notation. For example, Hebrew and Somali and a few other languages conventionally use a singular word in place of the digit 1. For example, the compact notation in Somali for 1000 is "Kun" (not "1 Kun", just "Kun"). So, anything that generates a pattern with a placeholder does not technically work for all locales, based on what ICU does under the hood.

LocaleData is really just a hack right now. I would prefer to see this overhauled, probably not just in Intl.NumberFormat but also across Intl. We should build it with the following ticket in mind:

tc39/ecma402#210

I haven't found a full conclusion on the topic but I guess we could reuse the flow chart in the specs for a better visualization? If the image is too big maybe SVG is doable?

Not sure; is there an accepted way to put images into the spec?

I'm +1 but guess we use just one notation? If we can't reuse Symbol and Name together,
"{compactIdentifier}" for both, maybe? I'm not sure if I'm saying the correct thing here.

I followed the model from currencies, which was to use different placeholder symbols for symbols vs. names.

Traditionally, in ECMA-402, for things which are in ICU algorithms but not specified in CLDR, we specified the algorithm in ECMA-402 itself. This can be helpful for people who are writing polyfills based on the CLDR data, so they don't have to read ICU source. Would such an approach be possible here?

I already put in the spec the algorithm ICU uses for picking the exponent in scientific notation. I left the algorithm for picking the exponent in compact notation abstract. If I wanted to make that algorithm non-abstract, it would solve some of the problems in this thread, but it would mean adding another LocaleData tree specifically for encoding the compact notation.

sffc · 2019-03-19T04:33:28Z

FYI, the algorithm for picking the exponent in compact notation is actually in the CLDR specification, but it requires data in order to execute on it:

https://unicode.org/reports/tr35/tr35-numbers.html#Compact_Number_Formats

sffc · 2019-07-17T00:04:18Z

Related: tc39/test262#2233

…attern As per tc39#26, the pattern hierarchy is `patterns` -> `signDisplay` -> `displayNotation` -> `zero/negative/positionPattern`. Currently in the spec it’s reversed (`patterns` -> `displayNotation` -> `signDisplay`) Reference implementation; https://github.com/formatjs/formatjs/blob/master/packages/intl-unified-numberformat/src/core.ts#L889

longlho · 2019-12-27T03:11:20Z

I'm really hoping this would be addressed before stage-4. The current spec actually doesn't allow us (FormatJS) to write a 100% polyfill because compact patterns are also ILD (which the spec doesn't allow room for in GetNumberFormatPattern).

longlho · 2020-01-03T03:32:07Z

An idea I have is to create another layer in the tree to account for exponent, with fallback key as well.

sffc · 2020-01-08T14:40:18Z

I'm really hoping this would be addressed before stage-4.

This issue is still open and I intend to address it before Stage 4.

The current spec actually doesn't allow us (FormatJS) to write a 100% polyfill because compact patterns are also ILD (which the spec doesn't allow room for in GetNumberFormatPattern).

The result of GetNumberFormatPattern is ILD. Does the spec not allow this?

An idea I have is to create another layer in the tree to account for exponent, with fallback key as well.

I'll look into this.

longlho · 2020-01-08T15:05:45Z

GetNumberFormatPattern result is pretty tight (no ILD). It basically traverses down the tree above.

sffc · 2020-01-08T15:07:15Z

It's ILD in the sense that the tree comes from locale data, which is ILD.

longlho · 2020-01-08T15:10:44Z

Sorry I might have misunderstood this. So compact pattern is in fact ILD already but the issue here is that it's exponent & plural-dependent (e.g the Kun example you mentioned). The current tree right now, in my understanding, doesn't allow room for that.

sffc · 2020-01-08T15:16:55Z

This is hand-waived by the line,

Let compactSymbol be an ILD string representing exponent in short form, which may depend on x in languages having different plural forms. The implementation must be able to provide this string, or else the pattern would not have a "{compactSymbol}" placeholder.

So we basically say that the data is something like {number}{compactSymbol}, and the symbol is the thing that depends on the exponent and plural form. This doesn't directly handle the Kun example. I just don't know if that's a case worth covering in the spec right now. I don't really like this whole data tree, and I think it should be overhauled, but that's a separate issue being tracked in tc39/ecma402#210.

longlho · 2020-01-08T15:24:59Z

So I understand the symbol can be exponent/plural-dependent but we need some wording for the compact pattern to be as well. I believe this is more than the Kun example. Take zh for example: https://github.com/unicode-cldr/cldr-numbers-full/blob/master/main/zh/numbers.json

The pattern for 1000 is just {number} while for others it's {number}{compactSymbol}

A more extreme one would be it where nothing <1M has a symbol in it, or sw where they also come w/ their own negative form.

longlho · 2020-01-08T15:26:41Z

Oh are you saying to just insert {compactSymbol} in the pattern and set them as ''? That would assume them to have the same positions, but will fail in cases where they are not for 2 different exponents.

sffc · 2020-01-08T15:32:09Z

Oh are you saying to just insert {compactSymbol} in the pattern and set them as ''?

As written, yes, basically that's how you're intended to do it. That's something specific that we could clarify in the spec, that {compactSymbol} is allowed to be the empty string, or maybe force it to be the empty string if exponent = 0.

As written, we don't handle cases where the symbol moves around or removes the number field entirely. It's not the only case, though, where the spec doesn't handle all edge cases of rendered localized output. Intl.NumberFormat already has this pre-existing problem in currency formatting where the symbol position could replace the decimal separator (tc39/ecma402#241). So we're not really adding any new problems by not handling some of the compact notation edge cases in the spec.

longlho · 2020-01-08T15:57:17Z

Gotcha thanks for the clarification!

longlho · 2020-01-09T04:54:32Z

Just an update I gave this a try and unfortunately got into an edge case in production w/ nb where 1000 pattern is 0k and 1000000 is 0 mill'.' (extra space) so I think for now as a bandaid I do need to maintain the assumption that compact pattern is also exponent-dependent :(

sffc · 2020-01-09T10:05:36Z

You should include the space in {compactSymbol}. For example, compactSymbol for exponent 3 should be "k", and for exponent 6 should be " mill."

longlho · 2020-01-09T15:07:52Z

Hmm but that would seem to disagree w/ web reality for formatToParts I believe:

sffc · 2020-01-09T15:20:51Z

OK. Great point.

I'll see if there's a way to fix this in the spec without too big of a diff.

sffc · 2020-01-25T00:01:44Z

I believe all the issues in this thread were addressed by #90.

sffc added the editorial Issue involves spec text with no behavior change label Nov 12, 2019

longlho mentioned this issue Dec 24, 2019

[Normative] Fix issue in locating correct pattern in GetNumberFormatP… #82

Closed

sffc mentioned this issue Jan 10, 2020

Refactoring notation pattern into a subpattern dependent on exponent. #90

Merged

sffc added locale-data This issue/PR pertains to locale data and removed editorial Issue involves spec text with no behavior change labels Jan 24, 2020

sffc closed this as completed Jan 25, 2020

longlho mentioned this issue Feb 4, 2020

Spec is vague regarding internal data structure representation tc39/proposal-intl-locale#94

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify "combining options" support in the spec #26

Clarify "combining options" support in the spec #26

sffc commented Nov 28, 2018

littledan commented Dec 5, 2018

sffc commented Feb 27, 2019

leobalter commented Feb 27, 2019

leobalter commented Feb 28, 2019

littledan commented Mar 18, 2019

sffc commented Mar 19, 2019

sffc commented Mar 19, 2019

sffc commented Jul 17, 2019

longlho commented Dec 27, 2019

longlho commented Jan 3, 2020

sffc commented Jan 8, 2020

longlho commented Jan 8, 2020

sffc commented Jan 8, 2020

longlho commented Jan 8, 2020

sffc commented Jan 8, 2020 •

edited

longlho commented Jan 8, 2020 •

edited

longlho commented Jan 8, 2020 •

edited

sffc commented Jan 8, 2020 •

edited

longlho commented Jan 8, 2020

longlho commented Jan 9, 2020

sffc commented Jan 9, 2020 •

edited

longlho commented Jan 9, 2020

sffc commented Jan 9, 2020

sffc commented Jan 25, 2020

Clarify "combining options" support in the spec #26

Clarify "combining options" support in the spec #26

Comments

sffc commented Nov 28, 2018

littledan commented Dec 5, 2018

sffc commented Feb 27, 2019

leobalter commented Feb 27, 2019

leobalter commented Feb 28, 2019

littledan commented Mar 18, 2019

sffc commented Mar 19, 2019

sffc commented Mar 19, 2019

sffc commented Jul 17, 2019

longlho commented Dec 27, 2019

longlho commented Jan 3, 2020

sffc commented Jan 8, 2020

longlho commented Jan 8, 2020

sffc commented Jan 8, 2020

longlho commented Jan 8, 2020

sffc commented Jan 8, 2020 • edited

longlho commented Jan 8, 2020 • edited

longlho commented Jan 8, 2020 • edited

sffc commented Jan 8, 2020 • edited

longlho commented Jan 8, 2020

longlho commented Jan 9, 2020

sffc commented Jan 9, 2020 • edited

longlho commented Jan 9, 2020

sffc commented Jan 9, 2020

sffc commented Jan 25, 2020

sffc commented Jan 8, 2020 •

edited

longlho commented Jan 8, 2020 •

edited

longlho commented Jan 8, 2020 •

edited

sffc commented Jan 8, 2020 •

edited

sffc commented Jan 9, 2020 •

edited