Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.fmt: add ryu floating-point formatting implementation #19229

Merged
merged 6 commits into from Mar 12, 2024

Conversation

tiehuis
Copy link
Member

@tiehuis tiehuis commented Mar 9, 2024

This PR replaces the existing errol floating point formatting algorithm with one based on Ryu.

Ryu is an algorithm for converting IEEE-754 floating-point numbers to decimal strings: https://github.com/ulfjack/ryu

The improvements this PR brings are:

  • Full f80 + f128 formatting
  • More accurate f16 + f32 formatting
  • complete round-trip support for every float type
  • generic backend that can be used to print any float of a general number of bits (less than or equal to 128 bits)

See https://github.com/tiehuis/zig-ryu/tree/05927ac704170fe6c98994eb1281ad6f42034e20/src/ryu128 for accompanying programs (fuzz/tests) presented in this PR.

The binaryToDecimal function is the only thing ported from ryu. The 128-bit backend upstream does not provide a fixed-precision formatting mode. I have implemented this from scratch along with rounding (loosely adapted from the existing errol round mechanism).

Closes #1181.
Closes #1299.
Closes #3612.

Behaviour Differences

  • Exponents are no longer padded with a leading 0 to 2-digits and if positive, the sign is no longer printed.
errol: 1e+02
ryu:   1e2
  • Fractional values of 0 are omitted in full precision mode
errol: 2.0e+00
ryu:   2e0
  • Full precision output is more accurate in all cases (but f64), since we no longer do a cast internally to f64.
# Ryu
3.1234567891011121314151617181920212E0 :f128
3.1234567891011121314E0 :f80
3.1234567891011121314E0 :c_longdouble
3.123456789101112E0 :f64
3.1234567E0 :f32
3.123E0 :f16

## Errol
3.123456789101112e+00 :f128
3.123456789101112e+00 :f80
3.123456789101112e+00 :c_longdouble
3.123456789101112e+00 :f64
3.12345671e+00 :f32
3.123046875e+00 :f16

Additionally, rounding behaviour in these cases can differ in the fixed precision case as the shortest representation will typically differ.

# bits:         141333
# precision:    3
# std_shortest: 1.98049715e-40
# ryu_shortest: 1.9805e-40
# type:         f32
|
| std_dec: 0.000
| ryu_dec: 0.000
|
| std_exp: 1.980e-40
| ryu_exp: 1.981e-40

Performance

See https://github.com/tiehuis/zig-ryu/blob/05927ac704170fe6c98994eb1281ad6f42034e20/src/ryu128/perf.zig for the program used to test performance. The ryu implementation is not optimized in depth.

Errol

 (ryu-128 =) $ zig build-exe perf.zig -O ReleaseFast
 (ryu-128 =) $ ./perf 
perf: type=f64 backend=errol seed=1
263.62ns per trial (1000000 trials) (check 0x2e00419)

Ryu

 (ryu-128 =) $ zig build-exe perf.zig -O ReleaseFast
 (ryu-128 =) $ ./perf
perf: type=f64 backend=ryu seed=1
111.19ns per trial (1000000 trials) (check 0x2e00419)

We see ~2.3x performance improvement.

Code Size

See:

Errol

 (ryu-128 =) $ zig build-exe size_errol.zig -O ReleaseFast
 (ryu-128 =) $ nm --print-size --size-sort --radix=d size_errol.o | rg fmt | awk '{print $0;s+=$2}END{print s}'
0000000000012800 0000000000001195 t fmt.formatBuf__anon_4282
0000000000017280 0000000000001735 t fmt.errol.u64toa
0000000000011056 0000000000001737 t fmt.errol.errol3
0000000000009280 0000000000001770 t fmt.formatInt__anon_4273
0000000000014272 0000000000003001 t fmt.errol.errolSlow
9438

 (ryu-128 =) $ zig build-exe size_errol.zig -O ReleaseSmall
 (ryu-128 =) $ nm --print-size --size-sort --radix=d size_errol.o | rg fmt | awk '{print $0;s+=$2}END{print s}'
0000000000016196 0000000000000067 t fmt.errol.hpMul10
0000000000016121 0000000000000075 t fmt.errol.hpDiv10
0000000000012772 0000000000000435 t fmt.formatBuf__anon_4282
0000000000010756 0000000000000746 t fmt.formatInt__anon_4273
0000000000013535 0000000000000979 t fmt.errol.errolSlow
0000000000011502 0000000000001270 t fmt.errol.errol3
0000000000014514 0000000000001607 t fmt.errol.u64toa
5179

 (ryu-128 =) $ zig build-exe size_errol.zig -O ReleaseSafe
 (ryu-128 =) $ nm --print-size --size-sort --radix=d size_errol.o | rg fmt | awk '{print $0;s+=$2}END{print s}'
0000000000111840 0000000000000390 t fmt.bufPrint__anon_6530
0000000000112240 0000000000000592 t fmt.bufPrint__anon_6531
0000000000030800 0000000000001346 t fmt.formatBuf__anon_4461
0000000000182544 0000000000001923 t fmt.errol.u64toa
0000000000028496 0000000000002296 t fmt.errol.errol3
0000000000009632 0000000000002382 t fmt.bufPrint__anon_3229
0000000000012096 0000000000002382 t fmt.bufPrint__anon_3235
0000000000014480 0000000000002398 t fmt.bufPrint__anon_3241
0000000000007216 0000000000002414 t fmt.bufPrint__anon_3223
0000000000177984 0000000000002539 t fmt.errol.errolSlow
0000000000025120 0000000000003362 t fmt.formatInt__anon_4452
22024

Ryu

 (ryu-128 =) $ zig build-exe size_ryu.zig -O ReleaseFast
 (ryu-128 =) $ nm --print-size --size-sort --radix=d size_ryu.o | rg ryu | awk '{print $0;s+=$2}END{print s}'
0000000000016320 0000000000000652 t ryu128.mul_128_256_shift
0000000000004056 0000000000000896 r ryu128.GENERIC_POW5_TABLE
0000000000015072 0000000000001220 t ryu128.decimalLength
0000000000006528 0000000000002825 t ryu128.formatScientific
0000000000001208 0000000000002848 r ryu128.GENERIC_POW5_INV_SPLIT
0000000000006184 0000000000002848 r ryu128.GENERIC_POW5_SPLIT
0000000000003056 0000000000003469 t ryu128.binaryToDecimal
0000000000009360 0000000000005558 t ryu128.formatDecimal
20316

 (ryu-128 =) $ zig build-exe size_ryu.zig -O ReleaseSmall
 (ryu-128 =) $ nm --print-size --size-sort --radix=d size_ryu.o | rg ryu | awk '{print $0;s+=$2}END{print s}'
0000000000005810 0000000000000047 t ryu128.mulShift
0000000000005972 0000000000000067 t ryu128.copySpecialStr
0000000000006039 0000000000000098 t ryu128.decimalLength
0000000000005857 0000000000000115 t ryu128.multipleOfPowerOf5
0000000000006137 0000000000000165 t ryu128.writeDecimal__anon_3903
0000000000004920 0000000000000380 t ryu128.formatScientific
0000000000005300 0000000000000383 t ryu128.formatDecimal
0000000000006474 0000000000000542 t ryu128.mul_128_256_shift
0000000000003352 0000000000000896 r ryu128.GENERIC_POW5_TABLE
0000000000002815 0000000000002105 t ryu128.binaryToDecimal
0000000000000504 0000000000002848 r ryu128.GENERIC_POW5_INV_SPLIT
0000000000005480 0000000000002848 r ryu128.GENERIC_POW5_SPLIT
10494

 (ryu-128 =) $ zig build-exe size_ryu.zig -O ReleaseSafe
 (ryu-128 =) $ nm --print-size --size-sort --radix=d size_ryu.o | rg ryu | awk '{print $0;s+=$2}END{print s}'
0000000000003504 0000000000000079 t size_ryu.RandomGenerator(f32).next
0000000000003584 0000000000000125 t ryu128.format__anon_3138
0000000000003376 0000000000000126 t ryu128.format__anon_3132
0000000000003712 0000000000000129 t ryu128.format__anon_3144
0000000000003232 0000000000000132 t ryu128.format__anon_3126
0000000000019008 0000000000000707 t ryu128.mul_128_256_shift
0000000000014728 0000000000000896 r ryu128.GENERIC_POW5_TABLE
0000000000013840 0000000000001220 t ryu128.decimalLength
0000000000008480 0000000000001330 t ryu128.formatScientific
0000000000009824 0000000000002407 t ryu128.formatDecimal
0000000000011880 0000000000002848 r ryu128.GENERIC_POW5_INV_SPLIT
0000000000016856 0000000000002848 r ryu128.GENERIC_POW5_SPLIT
0000000000004208 0000000000004263 t ryu128.binaryToDecimal
17110

We do see a moderate increase in code size. The errol code size above also is likely slightly underreported due to not all showing under the same namespace.

Note too that we are receiving a lot more functionality with the new implementation as well. The old is incorrect in many cases. The 128-bit backend as well is identical between different floating point types so will not increase if using between many different types.

Testing

I have tested the following:

  • exhaustive round-trip tests of f16 + f32
$ ./fuzz
fuzzing: type=f16 method=exhaustive seed=0
65536 tests completed
./fuzz
fuzzing: type=f32 method=exhaustive seed=0
4294967296 tests completed
  • ~1 trillion round-trips for f64
  • ~10 million round-trips for f128
  • ~100 billion tests of full precision against reference c ryu implementation (f64 scientific)
  • ~few million tests stressing arbitrary output precisions + different buffer sizes (no crashes)
  • ~few million tests comparing against output to the new ryu output to note the above behaviour differences

Future Work

  • Implement f32 + f64 backends. Notably these are much quicker, smaller code size and also importantly do not rely so much 128-bit integers

This replaces the errol backend with one based on ryu. The 128-bit
backend only is implemented. This supports all floating-point types and
does not use fp logic to print.

Closes ziglang#1181.
Closes ziglang#1299.
Closes ziglang#3612.
formatFloatScientific(value, options, buf_stream.writer()) catch |err| switch (err) {
error.NoSpaceLeft => unreachable,
const s = ryu128.format(&buf, value, .{ .mode = .scientific, .precision = options.precision }) catch |err| switch (err) {
error.BufferTooSmall => "(float)",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fan of this. Previously the 512-byte buffer was sufficient for f64. f80 and f128 need a lot more space (~5k bytes) which is impractical to increase too imo.

/// value can be re-parsed back to the same type unambiguously.
///
/// Floats with more than 64 are currently rounded, see https://github.com/ziglang/zig/issues/1181
pub fn formatFloatScientific(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could use a compileError to deprecate. Alternatively, we could provide this as a slim wrapper over the actual ryu implementations (which are not currently exposed).

pub const min_buffer_size = 53;

/// Returns the minimum buffer size needed to print every float of a specific type and format.
pub fn bufferSize(comptime mode: Format, comptime T: type) comptime_int {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be public if we actually want external users to be able take advantage of this. I originally preferred the direct buffer write approach instead of using a fmt writer.

Typically though, a user can get away with much less and these are only a concern for unbounded full-precision decimal output.

@tiehuis
Copy link
Member Author

tiehuis commented Mar 9, 2024

I'm leaning towards renaming ryu128.zig to format_float.zig as the inverse of parse_float.zig and hiding ryu as an implementation detail in its naming.

@andrewrk
Copy link
Member

Awesome.

@andrewrk andrewrk merged commit bd24e66 into ziglang:master Mar 12, 2024
10 checks passed
@andrewrk andrewrk added standard library This issue involves writing Zig code for the standard library. release notes This PR should be mentioned in the release notes. labels Mar 12, 2024
@tiehuis tiehuis deleted the ryu-128 branch March 12, 2024 02:28
@dimdin dimdin mentioned this pull request Mar 21, 2024
@andrewrk andrewrk added this to the 0.12.0 milestone Apr 18, 2024
TUSF pushed a commit to TUSF/zig that referenced this pull request May 9, 2024
std.fmt: add ryu floating-point formatting implementation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes This PR should be mentioned in the release notes. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
2 participants