-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Double-to-decimal in ICU4X #166
Comments
Tentative decisions from meeting on 2020-07-17:
|
Additional thoughts:
|
I thought about this, and perhaps it is the best way forward to have the user convert a float into a format of their liking. This is not easy for a library author to reuse formatting utilities: when writing an implementation of tl;dr: perhaps the best option for select() would be to take a formatted string instead of a float. ( |
The 402 trait accepting a decimal string SGTM. |
I see now that your original comment has a footnote section written in cursive which says essentially the same thing. I missed that part earlier. |
2020-10-30 discussion:
|
This introduces the `FixedDecimal::from_float_ryu` API, which uses Ryu to convert a floating value (`f32` or `f64`, as defined by the impls of `ryu::Float`) into a `FixedDecimal` by first converting the value into a string representation. To avoid introducing unnecessary dependencies, this API and its dependency on Ryu are hidden behind the `ryu_decimal` feature. Fixes unicode-org#166 TODO: - [ ] Figure out if scientific notation is handled correctly
Note: We actually already have a FromString impl for FixedDecimal, but it doesn't support the exponent field. I want to add the exponent field, but I've been thinking that we should preserve the exponent field in the data model of FixedDecimal, such that we can roundtrip "1.2e3" to "1.2e3". We will be able to use this model to power compact decimal notation and scientific notation, and it is also useful as input to plural rules. We could consider an enumeration of three exponent types: NONE, SCIENTIFIC ("e3"), and COMPACT ("c3"). pub enum Exponent {
None,
Scientific(i16),
Compact(i16),
} On the other hand, perhaps it's best to keep FixedDecimal as minimal as possible and figure out some other way to keep track of the scientific notation information. |
Discussion 2021-07-29:
Consensus: Proceed with Option 3. |
For background / previous attempt, see #724 |
Now that we've settled on Option 3, we still need to have the discussion on this point:
Here's what I think would be a reasonable API. Details explained in the doc comments. /// Specifies the precision of a floating point value when constructing a FixedDecimal.
///
/// IEEE 754 is a representation of a point on the number line. On the other hand, FixedDecimal
/// specifies not only the point on the number line but also the precision of the number to a
/// specific power of 10. This enum augments a floating-point value with the additional
/// information required by FixedDecimal.
pub enum DoublePrecision {
/// Specify that the floating point number is integer-valued.
///
/// If the floating point is not actually integer-valued, an error will be returned.
Integer,
/// Specify that the floating point number is precise to a specific power of 10.
/// The number may be rounded or trailing zeros may be added as necessary.
Magnitude(i16, RoundingMode),
/// Specify that the floating point number is precise to a specific number of significant digits.
/// The number may be rounded or trailing zeros may be added as necessary.
SignificantDigits(u8, RoundingMode),
/// Specify that the floating point number is precise to the maximum representable by IEEE.
///
/// This results in a FixedDecimal having enough digits to recover the original floating point
/// value, with no trailing zeros.
Maximum,
}
pub enum RoundingMode {
/// Specify that the number should not need to be rounded, or else return an error.
Unnecessary,
/// Specify that the number should be truncated.
Truncate,
// TODO(#1177): Add more rounding modes.
}
impl FixedDecimal {
pub fn from_f32(value: f32, precision: DoublePrecision) -> Result<Self, Error> { ... }
pub fn from_f64(value: f64, precision: DoublePrecision) -> Result<Self, Error> { ... }
} |
So I feel like we already have a "builder pattern" approach to Furthermore, enums like that are tricky to handle around FFI, so we'd need some sort of step-by-step API anyway. I don't mind adding an API that does |
I understand where you're coming from. But the issue is really twofold:
|
If we wanted a true "builder" type pattern, then we could consider an intermediate class called something like |
I'm with Shane here. I am not a huge fan of builder patterns in general, but when used, I really prefer to never operate on half baked states. If the only thing you can do with a state is to add precision to it to produce fixed decimal, then this intermediate step is pointless and harmful from the API design perspective. So, even if you do go for a builder pattern here, I'd suggest that we either allow for precisionless construction of FixedDecimal (with known limitations of that approach and strong recommendation to use a precision-full constructor), or only allow for construction with value+precision provided at the same time. |
So I'll respond to some of the points later when I get the chance, but a thing I want to point out quickly is that while floating point numbers have no inherent decimal precision, ryū's model for them does, so we may need to reevaluate using ryū here if we want this level of control. At the moment the arguments for such an API do not make sense in the context of our double to decimal library being ryū, which is why I prefer the builder design. If we want something else, we should pick a different double-to-decimal routine; maybe write one ourselves. |
ryū, as well as other algorithms like grisu3, implements an algorithm that returns a decimal string with the fewest number of digits required to unambiguously recover the original decimal. This is different than the concept of "precision" that I'm discussing. One valid precision strategy is "maximum supported by the floating point type I am using". This is what a vanilla ryū-based conversion performs. But this is not the only precision strategy. I also don't want to give this precision strategy favoritism over other precision strategies.
Most precision strategies need to fall back on a ryū-like algorithm in edge cases anyway. |
@sffc right, I'm saying that ryū is not sufficient for us to support these other options since it has a different strategy here. We need something more, and I'm asking where we get that from, because otherwise this can't be implemented. I'm trying to find an MVP that supports WearOS' use case and also find what we need to pull in to support the full API here. MVP wise I think we may be able to use ryū as the Maximum mode with a separate function for truncation and rounding, and we can add a full version later that the Maximum mode function calls. |
Update from the discussion I had with @sffc: I think what I'd misunderstood here was an implication that ryū could not be used as a base for the rounding/truncating cases because ryū can "shorten" some floats. It seems like that's not central to this; so we should be able to implement Shane's proposed API, with less enumy versions for FFI. |
A hitch: ryū formats scientific notation in its double-to-decimal algorithm, we may need |
Yeah, this is why I dropped using the main ryū crate in favor of the fork that just gives us an integer when putting together the PR for this issue. |
Opened #1265 to handle exponents |
In ICU4C, we use a short, fast algorithm when we need 14 or fewer significant digits from a double, and google/double-conversion, which implements Grisu3, if we need more than 14 significant digits.
@zbraniecki pointed to https://github.com/dtolnay/ryu, a Rust library that implements a 2018 algorithm, Ryū, that outperforms Grisu3 and the Rust standard library
to_string
. I also expect that Ryū is probably smaller in code size than the standard library, but I have not tested this.First question: does ICU4X depend on Ryū for double-to-decimal (feature-gated), or do we make ICU4X implement FromStr, and let the user pick how they want to do the conversion themselves?
Second question: how does this affect FFI or WebAssembly? For example, in JavaScript, maybe we want to use Number.prototype.toString so that we don't need to ship any double-to-decimal code in WASM. On the other hand, in C++, where there is no standard library function that does this*, maybe users would want ICU4X to handle this problem on our end.
Third question: do we implement ICU4C's fastpath algorithm in ICU4X?
* There is sprintf, but this requires you to know a fixed number of decimal points. A general double-to-decimal algorithm akin to Number.prototype.toString is not in the standard library, although I believe either LLVM or GNU has a non-standard extension, but I can't find it right now.
The text was updated successfully, but these errors were encountered: