Skip to content
This repository has been archived by the owner on Jan 26, 2022. It is now read-only.

Clarify "combining options" support in the spec #26

Closed
sffc opened this issue Nov 28, 2018 · 24 comments
Closed

Clarify "combining options" support in the spec #26

sffc opened this issue Nov 28, 2018 · 24 comments
Labels
locale-data This issue/PR pertains to locale data

Comments

@sffc
Copy link
Collaborator

sffc commented Nov 28, 2018

The "combining options" feature is allowed to be implementation-dependent, but the spec should at least require that no information is lost when processing it. Investigate how to do this in the data loading section of the spec text.

@littledan
Copy link
Member

What is allowed to be implementation-dependent about it, and what's the rationale for that?

@sffc
Copy link
Collaborator Author

sffc commented Feb 27, 2019

@leobalter

This has to do with the "LocaleData" internal field. This is the part of the spec that is really long and wordy. I made a flow chart to demonstrate the data specified to be in LocaleData:

localedata flow

Once you're out the bottom of the flow chart (into negativePattern, zeroPattern, or positivePattern), then this line from the spec applies:

The value of the aforementioned fields (the sign-dependent pattern fields) must be string values that must contain the substring "{number}" and may contain the substrings "{plusSign}", "{minusSign}", "{compactSymbol}", "{compactName}", "{scientificSeparator}", and "{scientificExponent}"; the values within the "currency" field must also contain one of the following substrings: "{currencyCode}", "{currencySymbol}", "{currencyNarrowSymbol}", or "{currencyName}"; and the values within the "unit" field must also contain one of the following substrings: "{percentSign}", "{unitSymbol}", "{unitNarrowSymbol}", or "{unitName}".

What this means: if a user requests to format a currency like GBP, or a unit like meter, the implementation is required to produce a string that has a placeholder for the currency sign or the unit name.

For the notation, if the implementation doesn't know how to display a certain notation, that's fine as long as the implementation doesn't artificially change the magnitude of the number without providing a corresponding pattern. However, right now, the spec doesn't forbid them from doing that.

FYI, the scale of the number is determined by ComputeExponentForMagnitude, which reads:

    1. Let _notation_ be _numberFormat_.[[Notation]].
    1. If _notation_ is `"standard"`, then
      1. Return 0.
    1. Else If _notation_ is `"scientific"`, then
      1. Return _magnitude_.
    1. Else If _notation_ is `"engineering"`, then
      1. Let _thousands_ be the greatest integer that is not greater than _magnitude_ ÷ 3.
      1. Return _thousands_ × 3.
    1. Else,
      1. Assert: _notation_ is `"compact"`.
      1. Let _exponent_ be an implementation-dependent integer by which to scale a number of the given magnitude in compact notation for the current locale.
      1. Return _exponent_.

Probably what we need to do is:

  • When scientific notation is requested, require that implementations provide a valid scientific notation string (one containing "{scientificSeparator}" and "{scientificExponent}")
  • When compact notation is requested, require that if implementations give a scale in ComputeExponentForMagnitude, then they must also provide "{compactSymbol}" or "{compactName}".

Does that make any sense? Thoughts?

@leobalter
Copy link
Member

@sffc I did not have full time to absorb all of this information today. I have dedicated time tomorrow to look at this and I just wanted to ping you with this update.

Thanks for being descriptive and the flow chart.

@leobalter
Copy link
Member

I haven't found a full conclusion on the topic but I guess we could reuse the flow chart in the specs for a better visualization? If the image is too big maybe SVG is doable?

What this means: if a user requests to format a currency like GBP, or a unit like meter, the implementation is required to produce a string that has a placeholder for the currency sign or the unit name.

Seems valid.

For the notation, if the implementation doesn't know how to display a certain notation, that's fine as long as the implementation doesn't artificially change the magnitude of the number without providing a corresponding pattern.

I agree.

However, right now, the spec doesn't forbid them from doing that.

  • When scientific notation is requested, require that implementations provide a valid scientific notation string (one containing "{scientificSeparator}" and "{scientificExponent}")

+1

  • When compact notation is requested, require that if implementations give a scale in ComputeExponentForMagnitude, then they must also provide "{compactSymbol}" or "{compactName}".

I'm +1 but guess we use just one notation? If we can't reuse Symbol and Name together, "{compactIdentifier}" for both, maybe? I'm not sure if I'm saying the correct thing here.


Back from the top post:

the spec should at least require that no information is lost when processing it. Investigate how to do this in the data loading section of the spec text.

I'm in favor and I appreciate the proposed outcome. Hopefully it finds acceptance for possible compatibility across implementations.

@littledan
Copy link
Member

Traditionally, in ECMA-402, for things which are in ICU algorithms but not specified in CLDR, we specified the algorithm in ECMA-402 itself. This can be helpful for people who are writing polyfills based on the CLDR data, so they don't have to read ICU source. Would such an approach be possible here?

@sffc
Copy link
Collaborator Author

sffc commented Mar 19, 2019

Another problem is that there are locales that might not have a number in the unit or compact notation. For example, Hebrew and Somali and a few other languages conventionally use a singular word in place of the digit 1. For example, the compact notation in Somali for 1000 is "Kun" (not "1 Kun", just "Kun"). So, anything that generates a pattern with a placeholder does not technically work for all locales, based on what ICU does under the hood.

LocaleData is really just a hack right now. I would prefer to see this overhauled, probably not just in Intl.NumberFormat but also across Intl. We should build it with the following ticket in mind:

tc39/ecma402#210

I haven't found a full conclusion on the topic but I guess we could reuse the flow chart in the specs for a better visualization? If the image is too big maybe SVG is doable?

Not sure; is there an accepted way to put images into the spec?

I'm +1 but guess we use just one notation? If we can't reuse Symbol and Name together,
"{compactIdentifier}" for both, maybe? I'm not sure if I'm saying the correct thing here.

I followed the model from currencies, which was to use different placeholder symbols for symbols vs. names.

Traditionally, in ECMA-402, for things which are in ICU algorithms but not specified in CLDR, we specified the algorithm in ECMA-402 itself. This can be helpful for people who are writing polyfills based on the CLDR data, so they don't have to read ICU source. Would such an approach be possible here?

I already put in the spec the algorithm ICU uses for picking the exponent in scientific notation. I left the algorithm for picking the exponent in compact notation abstract. If I wanted to make that algorithm non-abstract, it would solve some of the problems in this thread, but it would mean adding another LocaleData tree specifically for encoding the compact notation.

@sffc
Copy link
Collaborator Author

sffc commented Mar 19, 2019

FYI, the algorithm for picking the exponent in compact notation is actually in the CLDR specification, but it requires data in order to execute on it:

https://unicode.org/reports/tr35/tr35-numbers.html#Compact_Number_Formats

@sffc
Copy link
Collaborator Author

sffc commented Jul 17, 2019

Related: tc39/test262#2233

@sffc sffc added the editorial Issue involves spec text with no behavior change label Nov 12, 2019
longlho added a commit to longlho/proposal-unified-intl-numberformat that referenced this issue Dec 24, 2019
…attern

As per tc39#26, the pattern hierarchy is `patterns` -> `signDisplay` -> `displayNotation` -> `zero/negative/positionPattern`. Currently in the spec it’s reversed (`patterns` -> `displayNotation` -> `signDisplay`)

Reference implementation; https://github.com/formatjs/formatjs/blob/master/packages/intl-unified-numberformat/src/core.ts#L889
@longlho
Copy link
Contributor

longlho commented Dec 27, 2019

I'm really hoping this would be addressed before stage-4. The current spec actually doesn't allow us (FormatJS) to write a 100% polyfill because compact patterns are also ILD (which the spec doesn't allow room for in GetNumberFormatPattern).

@longlho
Copy link
Contributor

longlho commented Jan 3, 2020

An idea I have is to create another layer in the tree to account for exponent, with fallback key as well.

@sffc
Copy link
Collaborator Author

sffc commented Jan 8, 2020

I'm really hoping this would be addressed before stage-4.

This issue is still open and I intend to address it before Stage 4.

The current spec actually doesn't allow us (FormatJS) to write a 100% polyfill because compact patterns are also ILD (which the spec doesn't allow room for in GetNumberFormatPattern).

The result of GetNumberFormatPattern is ILD. Does the spec not allow this?

An idea I have is to create another layer in the tree to account for exponent, with fallback key as well.

I'll look into this.

@longlho
Copy link
Contributor

longlho commented Jan 8, 2020

GetNumberFormatPattern result is pretty tight (no ILD). It basically traverses down the tree above.

@sffc
Copy link
Collaborator Author

sffc commented Jan 8, 2020

It's ILD in the sense that the tree comes from locale data, which is ILD.

@longlho
Copy link
Contributor

longlho commented Jan 8, 2020

Sorry I might have misunderstood this. So compact pattern is in fact ILD already but the issue here is that it's exponent & plural-dependent (e.g the Kun example you mentioned). The current tree right now, in my understanding, doesn't allow room for that.

@sffc
Copy link
Collaborator Author

sffc commented Jan 8, 2020

This is hand-waived by the line,

Let compactSymbol be an ILD string representing exponent in short form, which may depend on x in languages having different plural forms. The implementation must be able to provide this string, or else the pattern would not have a "{compactSymbol}" placeholder.

So we basically say that the data is something like {number}{compactSymbol}, and the symbol is the thing that depends on the exponent and plural form. This doesn't directly handle the Kun example. I just don't know if that's a case worth covering in the spec right now. I don't really like this whole data tree, and I think it should be overhauled, but that's a separate issue being tracked in tc39/ecma402#210.

@longlho
Copy link
Contributor

longlho commented Jan 8, 2020

So I understand the symbol can be exponent/plural-dependent but we need some wording for the compact pattern to be as well. I believe this is more than the Kun example. Take zh for example: https://github.com/unicode-cldr/cldr-numbers-full/blob/master/main/zh/numbers.json

The pattern for 1000 is just {number} while for others it's {number}{compactSymbol}

A more extreme one would be it where nothing <1M has a symbol in it, or sw where they also come w/ their own negative form.

@longlho
Copy link
Contributor

longlho commented Jan 8, 2020

Oh are you saying to just insert {compactSymbol} in the pattern and set them as ''? That would assume them to have the same positions, but will fail in cases where they are not for 2 different exponents.

@sffc
Copy link
Collaborator Author

sffc commented Jan 8, 2020

Oh are you saying to just insert {compactSymbol} in the pattern and set them as ''?

As written, yes, basically that's how you're intended to do it. That's something specific that we could clarify in the spec, that {compactSymbol} is allowed to be the empty string, or maybe force it to be the empty string if exponent = 0.

As written, we don't handle cases where the symbol moves around or removes the number field entirely. It's not the only case, though, where the spec doesn't handle all edge cases of rendered localized output. Intl.NumberFormat already has this pre-existing problem in currency formatting where the symbol position could replace the decimal separator (tc39/ecma402#241). So we're not really adding any new problems by not handling some of the compact notation edge cases in the spec.

@longlho
Copy link
Contributor

longlho commented Jan 8, 2020

Gotcha thanks for the clarification!

@longlho
Copy link
Contributor

longlho commented Jan 9, 2020

Just an update I gave this a try and unfortunately got into an edge case in production w/ nb where 1000 pattern is 0k and 1000000 is 0 mill'.' (extra space) so I think for now as a bandaid I do need to maintain the assumption that compact pattern is also exponent-dependent :(

@sffc
Copy link
Collaborator Author

sffc commented Jan 9, 2020

You should include the space in {compactSymbol}. For example, compactSymbol for exponent 3 should be "k", and for exponent 6 should be " mill."

@longlho
Copy link
Contributor

longlho commented Jan 9, 2020

Hmm but that would seem to disagree w/ web reality for formatToParts I believe:

ss

@sffc
Copy link
Collaborator Author

sffc commented Jan 9, 2020

OK. Great point.

I'll see if there's a way to fix this in the spec without too big of a diff.

@sffc sffc added locale-data This issue/PR pertains to locale data and removed editorial Issue involves spec text with no behavior change labels Jan 24, 2020
@sffc
Copy link
Collaborator Author

sffc commented Jan 25, 2020

I believe all the issues in this thread were addressed by #90.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locale-data This issue/PR pertains to locale data
Projects
None yet
Development

No branches or pull requests

4 participants