Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double-to-decimal in ICU4X #166

Closed
sffc opened this issue Jul 6, 2020 · 24 comments · Fixed by #1217
Closed

Double-to-decimal in ICU4X #166

sffc opened this issue Jul 6, 2020 · 24 comments · Fixed by #1217
Assignees
Labels
A-design Area: Architecture or design C-numbers Component: Numbers, units, currencies good first issue Good for newcomers help wanted Issue needs an assignee T-core Type: Required functionality

Comments

@sffc
Copy link
Member

sffc commented Jul 6, 2020

In ICU4C, we use a short, fast algorithm when we need 14 or fewer significant digits from a double, and google/double-conversion, which implements Grisu3, if we need more than 14 significant digits.

@zbraniecki pointed to https://github.com/dtolnay/ryu, a Rust library that implements a 2018 algorithm, Ryū, that outperforms Grisu3 and the Rust standard library to_string. I also expect that Ryū is probably smaller in code size than the standard library, but I have not tested this.

First question: does ICU4X depend on Ryū for double-to-decimal (feature-gated), or do we make ICU4X implement FromStr, and let the user pick how they want to do the conversion themselves?

  • Pro: Directly depending on Ryū might make the library easier to use and more consistent in Rust
  • Con: Double-to-decimal isn't a core competency of ICU4X, so maybe we should not include it

Second question: how does this affect FFI or WebAssembly? For example, in JavaScript, maybe we want to use Number.prototype.toString so that we don't need to ship any double-to-decimal code in WASM. On the other hand, in C++, where there is no standard library function that does this*, maybe users would want ICU4X to handle this problem on our end.

Third question: do we implement ICU4C's fastpath algorithm in ICU4X?

* There is sprintf, but this requires you to know a fixed number of decimal points. A general double-to-decimal algorithm akin to Number.prototype.toString is not in the standard library, although I believe either LLVM or GNU has a non-standard extension, but I can't find it right now.

@sffc sffc added question Unresolved questions; type unclear A-design Area: Architecture or design C-numbers Component: Numbers, units, currencies discuss Discuss at a future ICU4X-SC meeting labels Jul 6, 2020
@sffc sffc self-assigned this Jul 10, 2020
@sffc
Copy link
Member Author

sffc commented Jul 17, 2020

Tentative decisions from meeting on 2020-07-17:

  • Expose impl From<f64/f32> for FixedDecimal with a feature flag, calling Ryū under the hood. (Update: see further discussion below.)
  • Don't implement the ICU4C fastpath algorithm. Users who care about top speed and code size should architect their code to give us integers, not floats.

@sffc sffc added T-core Type: Required functionality and removed question Unresolved questions; type unclear discuss Discuss at a future ICU4X-SC meeting labels Jul 17, 2020
@sffc
Copy link
Member Author

sffc commented Oct 14, 2020

Additional thoughts:

  1. Any API for this conversion should mandatorily take a number of decimal places, since trailing zeros are very important for i18n applications.
  2. It's not really all that difficult to just punt this problem to user land; people can call Ryū directly (or choose their favorite algorithm) and then do string-to-FixedDecimal.

@filmil

@filmil
Copy link
Contributor

filmil commented Oct 14, 2020

  1. It's not really all that difficult to just punt this problem to user land; people can call Ryū directly (or choose their favorite algorithm) and then do string-to-FixedDecimal.

I thought about this, and perhaps it is the best way forward to have the user convert a float into a format of their liking. This is not easy for a library author to reuse formatting utilities: when writing an implementation of select, I could not use format! because the format! macro only supports static formatting. But, the user always knows their format, so it would be trivial for them to use format! to convert a float into a representation they want formatted.

tl;dr: perhaps the best option for select() would be to take a formatted string instead of a float. (FixedDecimal) is not an option, as the trait definitions are supposed to not have significant dependencies.

@sffc
Copy link
Member Author

sffc commented Oct 14, 2020

The 402 trait accepting a decimal string SGTM.

@filmil
Copy link
Contributor

filmil commented Oct 14, 2020

The 402 trait accepting a decimal string SGTM.

I see now that your original comment has a footnote section written in cursive which says essentially the same thing. I missed that part earlier.

@sffc sffc added the discuss Discuss at a future ICU4X-SC meeting label Oct 30, 2020
@sffc
Copy link
Member Author

sffc commented Oct 30, 2020

2020-10-30 discussion:

  • Add this behind a feature flag; indicates that this is not a trivial conversion (unlike integer and string)
  • Encourage best practices by having options to perform the several different strategies for double-to-decimal
  • Use Ryū under the hood

@sffc sffc added this to the 2020 Q4 milestone Oct 30, 2020
@sffc sffc removed the discuss Discuss at a future ICU4X-SC meeting label Oct 30, 2020
@sffc sffc added good first issue Good for newcomers help wanted Issue needs an assignee labels Nov 19, 2020
@sffc sffc removed their assignment Nov 19, 2020
@sffc sffc added the backlog label Nov 19, 2020
@sffc sffc removed this from the 2020 Q4 milestone Nov 19, 2020
@sffc
Copy link
Member Author

sffc commented May 17, 2021

@shadaj

@sffc sffc removed the help wanted Issue needs an assignee label May 17, 2021
@sffc sffc self-assigned this May 17, 2021
shadaj added a commit to shadaj/icu4x that referenced this issue May 17, 2021
This introduces the `FixedDecimal::from_float_ryu` API, which uses Ryu to convert a floating value (`f32` or `f64`, as defined by the impls of `ryu::Float`) into a `FixedDecimal` by first converting the value into a string representation.

To avoid introducing unnecessary dependencies, this API and its dependency on Ryu are hidden behind the `ryu_decimal` feature.

Fixes unicode-org#166

TODO:
- [ ] Figure out if scientific notation is handled correctly
@sffc sffc removed their assignment Jun 10, 2021
@sffc
Copy link
Member Author

sffc commented Jun 18, 2021

Note: We actually already have a FromString impl for FixedDecimal, but it doesn't support the exponent field.

I want to add the exponent field, but I've been thinking that we should preserve the exponent field in the data model of FixedDecimal, such that we can roundtrip "1.2e3" to "1.2e3". We will be able to use this model to power compact decimal notation and scientific notation, and it is also useful as input to plural rules.

We could consider an enumeration of three exponent types: NONE, SCIENTIFIC ("e3"), and COMPACT ("c3").

pub enum Exponent {
    None,
    Scientific(i16),
    Compact(i16),
}

On the other hand, perhaps it's best to keep FixedDecimal as minimal as possible and figure out some other way to keep track of the scientific notation information.

@nordzilla nordzilla added the discuss Discuss at a future ICU4X-SC meeting label Jul 12, 2021
@sffc sffc added discuss-priority Discuss at the next ICU4X meeting and removed discuss Discuss at a future ICU4X-SC meeting labels Jul 15, 2021
@sffc
Copy link
Member Author

sffc commented Jul 29, 2021

Discussion 2021-07-29:

  • @nordzilla - Implementing this in SpiderMonkey, I think ICU4X should do more to handle this problem.
  • @zbraniecki - I'm swayed.
  • @echeran - Could we have a separate function for double-to-decimal instead of baking it in?
  • @nordzilla - My understanding is that Option 3 means that ICU4X would provide a function to convert a double to a string with additional options like precision, trailing zeros, etc.
  • @sffc - I think we're talking about double to FixedDecimal.
  • @sffc - I think Option 3-6 will allow us to provide an API that helps avoid common pitfalls.
  • @dminor - I think having this build in will make it more ergonomic for everyone.

Consensus: Proceed with Option 3.

@sffc sffc removed the discuss-priority Discuss at the next ICU4X meeting label Jul 29, 2021
@sffc
Copy link
Member Author

sffc commented Jul 29, 2021

For background / previous attempt, see #724

@sffc
Copy link
Member Author

sffc commented Oct 28, 2021

Now that we've settled on Option 3, we still need to have the discussion on this point:

  • Any API for this conversion should mandatorily take a number of decimal places, since trailing zeros are very important for i18n applications.

Here's what I think would be a reasonable API. Details explained in the doc comments.

/// Specifies the precision of a floating point value when constructing a FixedDecimal.
///
/// IEEE 754 is a representation of a point on the number line. On the other hand, FixedDecimal
/// specifies not only the point on the number line but also the precision of the number to a
/// specific power of 10. This enum augments a floating-point value with the additional
/// information required by FixedDecimal.
pub enum DoublePrecision {
    /// Specify that the floating point number is integer-valued.
    ///
    /// If the floating point is not actually integer-valued, an error will be returned.
    Integer,

    /// Specify that the floating point number is precise to a specific power of 10.
    /// The number may be rounded or trailing zeros may be added as necessary.
    Magnitude(i16, RoundingMode),

    /// Specify that the floating point number is precise to a specific number of significant digits.
    /// The number may be rounded or trailing zeros may be added as necessary.
    SignificantDigits(u8, RoundingMode),

    /// Specify that the floating point number is precise to the maximum representable by IEEE.
    ///
    /// This results in a FixedDecimal having enough digits to recover the original floating point
    /// value, with no trailing zeros.
    Maximum,
}

pub enum RoundingMode {
    /// Specify that the number should not need to be rounded, or else return an error.
    Unnecessary,

    /// Specify that the number should be truncated.
    Truncate,

    // TODO(#1177): Add more rounding modes.
}

impl FixedDecimal {
    pub fn from_f32(value: f32, precision: DoublePrecision) -> Result<Self, Error> { ... }
    pub fn from_f64(value: f64, precision: DoublePrecision) -> Result<Self, Error> { ... }
}

@Manishearth
Copy link
Member

So I feel like we already have a "builder pattern" approach to FixedDecimal, and we're adding more such things in #1195: I'd really prefer for these to be handled in a separate step, so for example you can call: from_f32(), and follow that up with a call to .round_digits() or truncate_digits() (and then finally follow it up with padding calls if necessary). In fact this handles my complaint with #1195 better than my suggestion there (to split up padding/truncate functions but have them work mostly the same).

Furthermore, enums like that are tricky to handle around FFI, so we'd need some sort of step-by-step API anyway. I don't mind adding an API that does from_f32_auto_precision() with separate .round_to() and .truncate_to() functions that we can combine into the fancy API, but I think the low level step by step API should exist and should be what we expose over FFI.

@sffc
Copy link
Member Author

sffc commented Oct 30, 2021

I understand where you're coming from. But the issue is really twofold:

  1. Theoretical purity: it is undefined to create a FixedDecimal from a Double without the precision, since a Double does not carry precision
  2. Performance: we can, as ICU does, optimize the code path when we have the double and the precision in the same function

@sffc
Copy link
Member Author

sffc commented Oct 30, 2021

If we wanted a true "builder" type pattern, then we could consider an intermediate class called something like FixedDecimalFromDoubleBuilder that resolves to a FixedDecimal once the precision is specified. This is actually what ICU does under the hood, although it does so with a boolean flag on the fixed decimal type instead of a separate type.

@zbraniecki
Copy link
Member

I'm with Shane here. I am not a huge fan of builder patterns in general, but when used, I really prefer to never operate on half baked states. If the only thing you can do with a state is to add precision to it to produce fixed decimal, then this intermediate step is pointless and harmful from the API design perspective.

So, even if you do go for a builder pattern here, I'd suggest that we either allow for precisionless construction of FixedDecimal (with known limitations of that approach and strong recommendation to use a precision-full constructor), or only allow for construction with value+precision provided at the same time.

@Manishearth
Copy link
Member

So I'll respond to some of the points later when I get the chance, but a thing I want to point out quickly is that while floating point numbers have no inherent decimal precision, ryū's model for them does, so we may need to reevaluate using ryū here if we want this level of control. At the moment the arguments for such an API do not make sense in the context of our double to decimal library being ryū, which is why I prefer the builder design. If we want something else, we should pick a different double-to-decimal routine; maybe write one ourselves.

@sffc
Copy link
Member Author

sffc commented Oct 30, 2021

while floating point numbers have no inherent decimal precision, ryū's model for them does

ryū, as well as other algorithms like grisu3, implements an algorithm that returns a decimal string with the fewest number of digits required to unambiguously recover the original decimal. This is different than the concept of "precision" that I'm discussing.

One valid precision strategy is "maximum supported by the floating point type I am using". This is what a vanilla ryū-based conversion performs. But this is not the only precision strategy. I also don't want to give this precision strategy favoritism over other precision strategies.

If we want something else, we should pick a different double-to-decimal routine; maybe write one ourselves.

Most precision strategies need to fall back on a ryū-like algorithm in edge cases anyway.

@Manishearth
Copy link
Member

@sffc right, I'm saying that ryū is not sufficient for us to support these other options since it has a different strategy here. We need something more, and I'm asking where we get that from, because otherwise this can't be implemented.

I'm trying to find an MVP that supports WearOS' use case and also find what we need to pull in to support the full API here.

MVP wise I think we may be able to use ryū as the Maximum mode with a separate function for truncation and rounding, and we can add a full version later that the Maximum mode function calls.

@Manishearth
Copy link
Member

Update from the discussion I had with @sffc:

I think what I'd misunderstood here was an implication that ryū could not be used as a base for the rounding/truncating cases because ryū can "shorten" some floats. It seems like that's not central to this; so we should be able to implement Shane's proposed API, with less enumy versions for FFI.

@Manishearth
Copy link
Member

A hitch: ryū formats scientific notation in its double-to-decimal algorithm, we may need from_str to handle that as well

@shadaj
Copy link
Contributor

shadaj commented Nov 5, 2021

A hitch: ryū formats scientific notation in its double-to-decimal algorithm, we may need from_str to handle that as well

Yeah, this is why I dropped using the main ryū crate in favor of the fork that just gives us an integer when putting together the PR for this issue.

@Manishearth
Copy link
Member

Opened #1265 to handle exponents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-design Area: Architecture or design C-numbers Component: Numbers, units, currencies good first issue Good for newcomers help wanted Issue needs an assignee T-core Type: Required functionality
Projects
None yet
6 participants