Double-to-decimal in ICU4X #166

sffc · 2020-07-06T06:03:45Z

In ICU4C, we use a short, fast algorithm when we need 14 or fewer significant digits from a double, and google/double-conversion, which implements Grisu3, if we need more than 14 significant digits.

@zbraniecki pointed to https://github.com/dtolnay/ryu, a Rust library that implements a 2018 algorithm, Ryū, that outperforms Grisu3 and the Rust standard library to_string. I also expect that Ryū is probably smaller in code size than the standard library, but I have not tested this.

First question: does ICU4X depend on Ryū for double-to-decimal (feature-gated), or do we make ICU4X implement FromStr, and let the user pick how they want to do the conversion themselves?

Pro: Directly depending on Ryū might make the library easier to use and more consistent in Rust
Con: Double-to-decimal isn't a core competency of ICU4X, so maybe we should not include it

Second question: how does this affect FFI or WebAssembly? For example, in JavaScript, maybe we want to use Number.prototype.toString so that we don't need to ship any double-to-decimal code in WASM. On the other hand, in C++, where there is no standard library function that does this*, maybe users would want ICU4X to handle this problem on our end.

Third question: do we implement ICU4C's fastpath algorithm in ICU4X?

* There is sprintf, but this requires you to know a fixed number of decimal points. A general double-to-decimal algorithm akin to Number.prototype.toString is not in the standard library, although I believe either LLVM or GNU has a non-standard extension, but I can't find it right now.

The text was updated successfully, but these errors were encountered:

sffc · 2020-07-17T19:26:04Z

Tentative decisions from meeting on 2020-07-17:

~~Expose impl From<f64/f32> for FixedDecimal with a feature flag, calling Ryū under the hood.~~ (Update: see further discussion below.)
Don't implement the ICU4C fastpath algorithm. Users who care about top speed and code size should architect their code to give us integers, not floats.

sffc · 2020-10-14T01:46:45Z

Additional thoughts:

Any API for this conversion should mandatorily take a number of decimal places, since trailing zeros are very important for i18n applications.
It's not really all that difficult to just punt this problem to user land; people can call Ryū directly (or choose their favorite algorithm) and then do string-to-FixedDecimal.

@filmil

filmil · 2020-10-14T16:47:04Z

It's not really all that difficult to just punt this problem to user land; people can call Ryū directly (or choose their favorite algorithm) and then do string-to-FixedDecimal.

I thought about this, and perhaps it is the best way forward to have the user convert a float into a format of their liking. This is not easy for a library author to reuse formatting utilities: when writing an implementation of select, I could not use format! because the format! macro only supports static formatting. But, the user always knows their format, so it would be trivial for them to use format! to convert a float into a representation they want formatted.

tl;dr: perhaps the best option for select() would be to take a formatted string instead of a float. (FixedDecimal) is not an option, as the trait definitions are supposed to not have significant dependencies.

sffc · 2020-10-14T17:24:09Z

The 402 trait accepting a decimal string SGTM.

filmil · 2020-10-14T18:44:08Z

The 402 trait accepting a decimal string SGTM.

I see now that your original comment has a footnote section written in cursive which says essentially the same thing. I missed that part earlier.

sffc · 2020-10-30T17:50:48Z

2020-10-30 discussion:

Add this behind a feature flag; indicates that this is not a trivial conversion (unlike integer and string)
Encourage best practices by having options to perform the several different strategies for double-to-decimal
Use Ryū under the hood

sffc · 2021-05-17T18:21:01Z

@shadaj

This introduces the `FixedDecimal::from_float_ryu` API, which uses Ryu to convert a floating value (`f32` or `f64`, as defined by the impls of `ryu::Float`) into a `FixedDecimal` by first converting the value into a string representation. To avoid introducing unnecessary dependencies, this API and its dependency on Ryu are hidden behind the `ryu_decimal` feature. Fixes unicode-org#166 TODO: - [ ] Figure out if scientific notation is handled correctly

sffc · 2021-06-18T04:50:39Z

Note: We actually already have a FromString impl for FixedDecimal, but it doesn't support the exponent field.

I want to add the exponent field, but I've been thinking that we should preserve the exponent field in the data model of FixedDecimal, such that we can roundtrip "1.2e3" to "1.2e3". We will be able to use this model to power compact decimal notation and scientific notation, and it is also useful as input to plural rules.

We could consider an enumeration of three exponent types: NONE, SCIENTIFIC ("e3"), and COMPACT ("c3").

pub enum Exponent {
    None,
    Scientific(i16),
    Compact(i16),
}

On the other hand, perhaps it's best to keep FixedDecimal as minimal as possible and figure out some other way to keep track of the scientific notation information.

sffc · 2021-07-29T17:19:27Z

Discussion 2021-07-29:

@nordzilla - Implementing this in SpiderMonkey, I think ICU4X should do more to handle this problem.
@zbraniecki - I'm swayed.
@echeran - Could we have a separate function for double-to-decimal instead of baking it in?
@nordzilla - My understanding is that Option 3 means that ICU4X would provide a function to convert a double to a string with additional options like precision, trailing zeros, etc.
@sffc - I think we're talking about double to FixedDecimal.
@sffc - I think Option 3-6 will allow us to provide an API that helps avoid common pitfalls.
@dminor - I think having this build in will make it more ergonomic for everyone.

Consensus: Proceed with Option 3.

sffc · 2021-07-29T18:07:50Z

For background / previous attempt, see #724

sffc · 2021-10-28T19:56:53Z

Now that we've settled on Option 3, we still need to have the discussion on this point:

Any API for this conversion should mandatorily take a number of decimal places, since trailing zeros are very important for i18n applications.

Here's what I think would be a reasonable API. Details explained in the doc comments.

/// Specifies the precision of a floating point value when constructing a FixedDecimal.
///
/// IEEE 754 is a representation of a point on the number line. On the other hand, FixedDecimal
/// specifies not only the point on the number line but also the precision of the number to a
/// specific power of 10. This enum augments a floating-point value with the additional
/// information required by FixedDecimal.
pub enum DoublePrecision {
    /// Specify that the floating point number is integer-valued.
    ///
    /// If the floating point is not actually integer-valued, an error will be returned.
    Integer,

    /// Specify that the floating point number is precise to a specific power of 10.
    /// The number may be rounded or trailing zeros may be added as necessary.
    Magnitude(i16, RoundingMode),

    /// Specify that the floating point number is precise to a specific number of significant digits.
    /// The number may be rounded or trailing zeros may be added as necessary.
    SignificantDigits(u8, RoundingMode),

    /// Specify that the floating point number is precise to the maximum representable by IEEE.
    ///
    /// This results in a FixedDecimal having enough digits to recover the original floating point
    /// value, with no trailing zeros.
    Maximum,
}

pub enum RoundingMode {
    /// Specify that the number should not need to be rounded, or else return an error.
    Unnecessary,

    /// Specify that the number should be truncated.
    Truncate,

    // TODO(#1177): Add more rounding modes.
}

impl FixedDecimal {
    pub fn from_f32(value: f32, precision: DoublePrecision) -> Result<Self, Error> { ... }
    pub fn from_f64(value: f64, precision: DoublePrecision) -> Result<Self, Error> { ... }
}

Manishearth · 2021-10-29T19:43:47Z

So I feel like we already have a "builder pattern" approach to FixedDecimal, and we're adding more such things in #1195: I'd really prefer for these to be handled in a separate step, so for example you can call: from_f32(), and follow that up with a call to .round_digits() or truncate_digits() (and then finally follow it up with padding calls if necessary). In fact this handles my complaint with #1195 better than my suggestion there (to split up padding/truncate functions but have them work mostly the same).

Furthermore, enums like that are tricky to handle around FFI, so we'd need some sort of step-by-step API anyway. I don't mind adding an API that does from_f32_auto_precision() with separate .round_to() and .truncate_to() functions that we can combine into the fancy API, but I think the low level step by step API should exist and should be what we expose over FFI.

sffc · 2021-10-30T02:53:35Z

I understand where you're coming from. But the issue is really twofold:

Theoretical purity: it is undefined to create a FixedDecimal from a Double without the precision, since a Double does not carry precision
Performance: we can, as ICU does, optimize the code path when we have the double and the precision in the same function

sffc · 2021-10-30T03:03:56Z

If we wanted a true "builder" type pattern, then we could consider an intermediate class called something like FixedDecimalFromDoubleBuilder that resolves to a FixedDecimal once the precision is specified. This is actually what ICU does under the hood, although it does so with a boolean flag on the fixed decimal type instead of a separate type.

zbraniecki · 2021-10-30T04:28:42Z

I'm with Shane here. I am not a huge fan of builder patterns in general, but when used, I really prefer to never operate on half baked states. If the only thing you can do with a state is to add precision to it to produce fixed decimal, then this intermediate step is pointless and harmful from the API design perspective.

So, even if you do go for a builder pattern here, I'd suggest that we either allow for precisionless construction of FixedDecimal (with known limitations of that approach and strong recommendation to use a precision-full constructor), or only allow for construction with value+precision provided at the same time.

Manishearth · 2021-10-30T04:45:35Z

So I'll respond to some of the points later when I get the chance, but a thing I want to point out quickly is that while floating point numbers have no inherent decimal precision, ryū's model for them does, so we may need to reevaluate using ryū here if we want this level of control. At the moment the arguments for such an API do not make sense in the context of our double to decimal library being ryū, which is why I prefer the builder design. If we want something else, we should pick a different double-to-decimal routine; maybe write one ourselves.

sffc · 2021-10-30T04:56:02Z

while floating point numbers have no inherent decimal precision, ryū's model for them does

ryū, as well as other algorithms like grisu3, implements an algorithm that returns a decimal string with the fewest number of digits required to unambiguously recover the original decimal. This is different than the concept of "precision" that I'm discussing.

One valid precision strategy is "maximum supported by the floating point type I am using". This is what a vanilla ryū-based conversion performs. But this is not the only precision strategy. I also don't want to give this precision strategy favoritism over other precision strategies.

If we want something else, we should pick a different double-to-decimal routine; maybe write one ourselves.

Most precision strategies need to fall back on a ryū-like algorithm in edge cases anyway.

Manishearth · 2021-10-30T06:39:38Z

@sffc right, I'm saying that ryū is not sufficient for us to support these other options since it has a different strategy here. We need something more, and I'm asking where we get that from, because otherwise this can't be implemented.

I'm trying to find an MVP that supports WearOS' use case and also find what we need to pull in to support the full API here.

MVP wise I think we may be able to use ryū as the Maximum mode with a separate function for truncation and rounding, and we can add a full version later that the Maximum mode function calls.

Manishearth · 2021-11-04T15:21:23Z

Update from the discussion I had with @sffc:

I think what I'd misunderstood here was an implication that ryū could not be used as a base for the rounding/truncating cases because ryū can "shorten" some floats. It seems like that's not central to this; so we should be able to implement Shane's proposed API, with less enumy versions for FFI.

Manishearth · 2021-11-04T18:29:41Z

A hitch: ryū formats scientific notation in its double-to-decimal algorithm, we may need from_str to handle that as well

shadaj · 2021-11-05T00:30:08Z

A hitch: ryū formats scientific notation in its double-to-decimal algorithm, we may need from_str to handle that as well

Yeah, this is why I dropped using the main ryū crate in favor of the fork that just gives us an integer when putting together the PR for this issue.

Manishearth · 2021-11-05T02:59:17Z

Opened #1265 to handle exponents

sffc added question Unresolved questions; type unclear A-design Area: Architecture or design C-numbers Component: Numbers, units, currencies discuss Discuss at a future ICU4X-SC meeting labels Jul 6, 2020

zbraniecki mentioned this issue Jul 6, 2020

FixedDecimal as a standalone crate #167

Closed

sffc self-assigned this Jul 10, 2020

sffc added T-core Type: Required functionality and removed question Unresolved questions; type unclear discuss Discuss at a future ICU4X-SC meeting labels Jul 17, 2020

sffc mentioned this issue Oct 14, 2020

Implement a conversion from f64 to FixedDecimal #340

Closed

sffc added the discuss Discuss at a future ICU4X-SC meeting label Oct 30, 2020

sffc added this to the 2020 Q4 milestone Oct 30, 2020

sffc removed the discuss Discuss at a future ICU4X-SC meeting label Oct 30, 2020

sffc added good first issue Good for newcomers help wanted Issue needs an assignee labels Nov 19, 2020

sffc removed their assignment Nov 19, 2020

sffc added the backlog label Nov 19, 2020

sffc removed this from the 2020 Q4 milestone Nov 19, 2020

sffc removed the help wanted Issue needs an assignee label May 17, 2021

sffc self-assigned this May 17, 2021

shadaj mentioned this issue May 17, 2021

Support creating a FixedDecimal from a float value through Ryu #724

Closed

sffc removed their assignment Jun 10, 2021

nordzilla mentioned this issue Jun 22, 2021

Handling of float inputs for PluralRules and FixedDecimal #191

Closed

nordzilla added the discuss Discuss at a future ICU4X-SC meeting label Jul 12, 2021

sffc added discuss-priority Discuss at the next ICU4X meeting and removed discuss Discuss at a future ICU4X-SC meeting labels Jul 15, 2021

This was referenced Jul 22, 2021

Improve docs in PluralRules and FixedDecimal #886

Merged

Handle [NaN, -NaN, Infinity, -Infinity] for FixedDecimal #862

Open

sffc removed the discuss-priority Discuss at the next ICU4X meeting label Jul 29, 2021

Manishearth self-assigned this Oct 21, 2021

Manishearth added this to the 2021 Q4 0.5 Sprint A milestone Oct 21, 2021

Manishearth mentioned this issue Oct 26, 2021

Add double-to-decimal for FixedDecimal via ryū #1217

Merged

Manishearth modified the milestones: 2021 Q4 0.5 Sprint A, 2021 Q4 0.5 Sprint B Nov 4, 2021

Manishearth mentioned this issue Nov 5, 2021

Handle exponents in FixedDecimal::from_str() #1265

Merged

Manishearth closed this as completed in #1217 Nov 10, 2021

sffc mentioned this issue Mar 23, 2022

Division of concerns and scientific notation in FixedDecimal #1267

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Double-to-decimal in ICU4X #166

Double-to-decimal in ICU4X #166

sffc commented Jul 6, 2020 •

edited

Loading

sffc commented Jul 17, 2020 •

edited

Loading

sffc commented Oct 14, 2020

filmil commented Oct 14, 2020 •

edited

Loading

sffc commented Oct 14, 2020

filmil commented Oct 14, 2020 •

edited

Loading

sffc commented Oct 30, 2020

sffc commented May 17, 2021

sffc commented Jun 18, 2021 •

edited

Loading

sffc commented Jul 29, 2021

sffc commented Jul 29, 2021

sffc commented Oct 28, 2021

Manishearth commented Oct 29, 2021

sffc commented Oct 30, 2021

sffc commented Oct 30, 2021

zbraniecki commented Oct 30, 2021

Manishearth commented Oct 30, 2021

sffc commented Oct 30, 2021 •

edited

Loading

Manishearth commented Oct 30, 2021

Manishearth commented Nov 4, 2021

Manishearth commented Nov 4, 2021

shadaj commented Nov 5, 2021

Manishearth commented Nov 5, 2021

Double-to-decimal in ICU4X #166

Double-to-decimal in ICU4X #166

Comments

sffc commented Jul 6, 2020 • edited Loading

sffc commented Jul 17, 2020 • edited Loading

sffc commented Oct 14, 2020

filmil commented Oct 14, 2020 • edited Loading

sffc commented Oct 14, 2020

filmil commented Oct 14, 2020 • edited Loading

sffc commented Oct 30, 2020

sffc commented May 17, 2021

sffc commented Jun 18, 2021 • edited Loading

sffc commented Jul 29, 2021

sffc commented Jul 29, 2021

sffc commented Oct 28, 2021

Manishearth commented Oct 29, 2021

sffc commented Oct 30, 2021

sffc commented Oct 30, 2021

zbraniecki commented Oct 30, 2021

Manishearth commented Oct 30, 2021

sffc commented Oct 30, 2021 • edited Loading

Manishearth commented Oct 30, 2021

Manishearth commented Nov 4, 2021

Manishearth commented Nov 4, 2021

shadaj commented Nov 5, 2021

Manishearth commented Nov 5, 2021

sffc commented Jul 6, 2020 •

edited

Loading

sffc commented Jul 17, 2020 •

edited

Loading

filmil commented Oct 14, 2020 •

edited

Loading

filmil commented Oct 14, 2020 •

edited

Loading

sffc commented Jun 18, 2021 •

edited

Loading

sffc commented Oct 30, 2021 •

edited

Loading