std.fmt: add ryu floating-point formatting implementation #19229

tiehuis · 2024-03-09T07:32:43Z

This PR replaces the existing errol floating point formatting algorithm with one based on Ryu.

Ryu is an algorithm for converting IEEE-754 floating-point numbers to decimal strings: https://github.com/ulfjack/ryu

The improvements this PR brings are:

Full f80 + f128 formatting
More accurate f16 + f32 formatting
complete round-trip support for every float type
generic backend that can be used to print any float of a general number of bits (less than or equal to 128 bits)

See https://github.com/tiehuis/zig-ryu/tree/05927ac704170fe6c98994eb1281ad6f42034e20/src/ryu128 for accompanying programs (fuzz/tests) presented in this PR.

The binaryToDecimal function is the only thing ported from ryu. The 128-bit backend upstream does not provide a fixed-precision formatting mode. I have implemented this from scratch along with rounding (loosely adapted from the existing errol round mechanism).

Closes #1181.
Closes #1299.
Closes #3612.

Behaviour Differences

Exponents are no longer padded with a leading 0 to 2-digits and if positive, the sign is no longer printed.

errol: 1e+02
ryu:   1e2

Fractional values of 0 are omitted in full precision mode

errol: 2.0e+00
ryu:   2e0

Full precision output is more accurate in all cases (but f64), since we no longer do a cast internally to f64.

# Ryu
3.1234567891011121314151617181920212E0 :f128
3.1234567891011121314E0 :f80
3.1234567891011121314E0 :c_longdouble
3.123456789101112E0 :f64
3.1234567E0 :f32
3.123E0 :f16

## Errol
3.123456789101112e+00 :f128
3.123456789101112e+00 :f80
3.123456789101112e+00 :c_longdouble
3.123456789101112e+00 :f64
3.12345671e+00 :f32
3.123046875e+00 :f16

Additionally, rounding behaviour in these cases can differ in the fixed precision case as the shortest representation will typically differ.

# bits:         141333
# precision:    3
# std_shortest: 1.98049715e-40
# ryu_shortest: 1.9805e-40
# type:         f32
|
| std_dec: 0.000
| ryu_dec: 0.000
|
| std_exp: 1.980e-40
| ryu_exp: 1.981e-40

Performance

See https://github.com/tiehuis/zig-ryu/blob/05927ac704170fe6c98994eb1281ad6f42034e20/src/ryu128/perf.zig for the program used to test performance. The ryu implementation is not optimized in depth.

Errol

 (ryu-128 =) $ zig build-exe perf.zig -O ReleaseFast
 (ryu-128 =) $ ./perf 
perf: type=f64 backend=errol seed=1
263.62ns per trial (1000000 trials) (check 0x2e00419)

Ryu

 (ryu-128 =) $ zig build-exe perf.zig -O ReleaseFast
 (ryu-128 =) $ ./perf
perf: type=f64 backend=ryu seed=1
111.19ns per trial (1000000 trials) (check 0x2e00419)

We see ~2.3x performance improvement.

Code Size

See:

Errol

 (ryu-128 =) $ zig build-exe size_errol.zig -O ReleaseFast
 (ryu-128 =) $ nm --print-size --size-sort --radix=d size_errol.o | rg fmt | awk '{print $0;s+=$2}END{print s}'
0000000000012800 0000000000001195 t fmt.formatBuf__anon_4282
0000000000017280 0000000000001735 t fmt.errol.u64toa
0000000000011056 0000000000001737 t fmt.errol.errol3
0000000000009280 0000000000001770 t fmt.formatInt__anon_4273
0000000000014272 0000000000003001 t fmt.errol.errolSlow
9438

 (ryu-128 =) $ zig build-exe size_errol.zig -O ReleaseSmall
 (ryu-128 =) $ nm --print-size --size-sort --radix=d size_errol.o | rg fmt | awk '{print $0;s+=$2}END{print s}'
0000000000016196 0000000000000067 t fmt.errol.hpMul10
0000000000016121 0000000000000075 t fmt.errol.hpDiv10
0000000000012772 0000000000000435 t fmt.formatBuf__anon_4282
0000000000010756 0000000000000746 t fmt.formatInt__anon_4273
0000000000013535 0000000000000979 t fmt.errol.errolSlow
0000000000011502 0000000000001270 t fmt.errol.errol3
0000000000014514 0000000000001607 t fmt.errol.u64toa
5179

 (ryu-128 =) $ zig build-exe size_errol.zig -O ReleaseSafe
 (ryu-128 =) $ nm --print-size --size-sort --radix=d size_errol.o | rg fmt | awk '{print $0;s+=$2}END{print s}'
0000000000111840 0000000000000390 t fmt.bufPrint__anon_6530
0000000000112240 0000000000000592 t fmt.bufPrint__anon_6531
0000000000030800 0000000000001346 t fmt.formatBuf__anon_4461
0000000000182544 0000000000001923 t fmt.errol.u64toa
0000000000028496 0000000000002296 t fmt.errol.errol3
0000000000009632 0000000000002382 t fmt.bufPrint__anon_3229
0000000000012096 0000000000002382 t fmt.bufPrint__anon_3235
0000000000014480 0000000000002398 t fmt.bufPrint__anon_3241
0000000000007216 0000000000002414 t fmt.bufPrint__anon_3223
0000000000177984 0000000000002539 t fmt.errol.errolSlow
0000000000025120 0000000000003362 t fmt.formatInt__anon_4452
22024

Ryu

 (ryu-128 =) $ zig build-exe size_ryu.zig -O ReleaseFast
 (ryu-128 =) $ nm --print-size --size-sort --radix=d size_ryu.o | rg ryu | awk '{print $0;s+=$2}END{print s}'
0000000000016320 0000000000000652 t ryu128.mul_128_256_shift
0000000000004056 0000000000000896 r ryu128.GENERIC_POW5_TABLE
0000000000015072 0000000000001220 t ryu128.decimalLength
0000000000006528 0000000000002825 t ryu128.formatScientific
0000000000001208 0000000000002848 r ryu128.GENERIC_POW5_INV_SPLIT
0000000000006184 0000000000002848 r ryu128.GENERIC_POW5_SPLIT
0000000000003056 0000000000003469 t ryu128.binaryToDecimal
0000000000009360 0000000000005558 t ryu128.formatDecimal
20316

 (ryu-128 =) $ zig build-exe size_ryu.zig -O ReleaseSmall
 (ryu-128 =) $ nm --print-size --size-sort --radix=d size_ryu.o | rg ryu | awk '{print $0;s+=$2}END{print s}'
0000000000005810 0000000000000047 t ryu128.mulShift
0000000000005972 0000000000000067 t ryu128.copySpecialStr
0000000000006039 0000000000000098 t ryu128.decimalLength
0000000000005857 0000000000000115 t ryu128.multipleOfPowerOf5
0000000000006137 0000000000000165 t ryu128.writeDecimal__anon_3903
0000000000004920 0000000000000380 t ryu128.formatScientific
0000000000005300 0000000000000383 t ryu128.formatDecimal
0000000000006474 0000000000000542 t ryu128.mul_128_256_shift
0000000000003352 0000000000000896 r ryu128.GENERIC_POW5_TABLE
0000000000002815 0000000000002105 t ryu128.binaryToDecimal
0000000000000504 0000000000002848 r ryu128.GENERIC_POW5_INV_SPLIT
0000000000005480 0000000000002848 r ryu128.GENERIC_POW5_SPLIT
10494

 (ryu-128 =) $ zig build-exe size_ryu.zig -O ReleaseSafe
 (ryu-128 =) $ nm --print-size --size-sort --radix=d size_ryu.o | rg ryu | awk '{print $0;s+=$2}END{print s}'
0000000000003504 0000000000000079 t size_ryu.RandomGenerator(f32).next
0000000000003584 0000000000000125 t ryu128.format__anon_3138
0000000000003376 0000000000000126 t ryu128.format__anon_3132
0000000000003712 0000000000000129 t ryu128.format__anon_3144
0000000000003232 0000000000000132 t ryu128.format__anon_3126
0000000000019008 0000000000000707 t ryu128.mul_128_256_shift
0000000000014728 0000000000000896 r ryu128.GENERIC_POW5_TABLE
0000000000013840 0000000000001220 t ryu128.decimalLength
0000000000008480 0000000000001330 t ryu128.formatScientific
0000000000009824 0000000000002407 t ryu128.formatDecimal
0000000000011880 0000000000002848 r ryu128.GENERIC_POW5_INV_SPLIT
0000000000016856 0000000000002848 r ryu128.GENERIC_POW5_SPLIT
0000000000004208 0000000000004263 t ryu128.binaryToDecimal
17110

We do see a moderate increase in code size. The errol code size above also is likely slightly underreported due to not all showing under the same namespace.

Note too that we are receiving a lot more functionality with the new implementation as well. The old is incorrect in many cases. The 128-bit backend as well is identical between different floating point types so will not increase if using between many different types.

Testing

I have tested the following:

exhaustive round-trip tests of f16 + f32

$ ./fuzz
fuzzing: type=f16 method=exhaustive seed=0
65536 tests completed

./fuzz
fuzzing: type=f32 method=exhaustive seed=0
4294967296 tests completed

~1 trillion round-trips for f64
~10 million round-trips for f128
~100 billion tests of full precision against reference c ryu implementation (f64 scientific)
~few million tests stressing arbitrary output precisions + different buffer sizes (no crashes)
~few million tests comparing against output to the new ryu output to note the above behaviour differences

Future Work

Implement f32 + f64 backends. Notably these are much quicker, smaller code size and also importantly do not rely so much 128-bit integers

This replaces the errol backend with one based on ryu. The 128-bit backend only is implemented. This supports all floating-point types and does not use fp logic to print. Closes ziglang#1181. Closes ziglang#1299. Closes ziglang#3612.

tiehuis · 2024-03-09T07:33:47Z

lib/std/fmt.zig

-        formatFloatScientific(value, options, buf_stream.writer()) catch |err| switch (err) {
-            error.NoSpaceLeft => unreachable,
+        const s = ryu128.format(&buf, value, .{ .mode = .scientific, .precision = options.precision }) catch |err| switch (err) {
+            error.BufferTooSmall => "(float)",


I am not a big fan of this. Previously the 512-byte buffer was sufficient for f64. f80 and f128 need a lot more space (~5k bytes) which is impractical to increase too imo.

tiehuis · 2024-03-09T07:34:35Z

lib/std/fmt.zig

-/// value can be re-parsed back to the same type unambiguously.
-///
-/// Floats with more than 64 are currently rounded, see https://github.com/ziglang/zig/issues/1181
-pub fn formatFloatScientific(


These could use a compileError to deprecate. Alternatively, we could provide this as a slim wrapper over the actual ryu implementations (which are not currently exposed).

tiehuis · 2024-03-09T07:36:23Z

lib/std/fmt/ryu128.zig

+pub const min_buffer_size = 53;
+
+/// Returns the minimum buffer size needed to print every float of a specific type and format.
+pub fn bufferSize(comptime mode: Format, comptime T: type) comptime_int {


This needs to be public if we actually want external users to be able take advantage of this. I originally preferred the direct buffer write approach instead of using a fmt writer.

Typically though, a user can get away with much less and these are only a concern for unbounded full-precision decimal output.

tiehuis · 2024-03-09T07:40:07Z

I'm leaning towards renaming ryu128.zig to format_float.zig as the inverse of parse_float.zig and hiding ryu as an implementation detail in its naming.

lib/std/fmt/ryu128.zig

andrewrk · 2024-03-12T01:46:16Z

Awesome.

std.fmt: add ryu floating-point formatting implementation

tiehuis added 3 commits March 9, 2024 15:57

std.fmt: add ryu floating-point formatting

c6ad551

This replaces the errol backend with one based on ryu. The 128-bit backend only is implemented. This supports all floating-point types and does not use fp logic to print. Closes ziglang#1181. Closes ziglang#1299. Closes ziglang#3612.

std.fmt: add ryu upstream unit tests

04fd113

std.json: update tests to match new floating point formatting

b6695f0

tiehuis requested a review from thejoshwolfe as a code owner March 9, 2024 07:32

tiehuis commented Mar 9, 2024

View reviewed changes

replace errol with ryu in CMakeLists.txt

2e60d4d

tiehuis commented Mar 9, 2024

View reviewed changes

lib/std/fmt/ryu128.zig Show resolved Hide resolved

tiehuis added 2 commits March 9, 2024 22:23

std.fmt: fix std-cases and perform round-trip check in ryu unit tests

da4acf9

wasm/codegen: add "and" + "or" impl for big ints

bb1fe11

andrewrk merged commit bd24e66 into ziglang:master Mar 12, 2024
10 checks passed

andrewrk added standard library This issue involves writing Zig code for the standard library. release notes This PR should be mentioned in the release notes. labels Mar 12, 2024

tiehuis deleted the ryu-128 branch March 12, 2024 02:28

tiehuis mentioned this pull request Mar 12, 2024

std.fmt.formatFloat: implement 32-bit and 64-bit ryu backends #19264

Closed

dimdin mentioned this pull request Mar 21, 2024

wrong type inference #19379

Closed

Vexu mentioned this pull request Mar 23, 2024

std.fmt on comptime_int fails to handle some decimal values #18046

Closed

andrewrk added this to the 0.12.0 milestone Apr 18, 2024

TUSF pushed a commit to TUSF/zig that referenced this pull request May 9, 2024

Merge pull request ziglang#19229 from tiehuis/ryu-128

1a74b4a

std.fmt: add ryu floating-point formatting implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

std.fmt: add ryu floating-point formatting implementation #19229

std.fmt: add ryu floating-point formatting implementation #19229

tiehuis commented Mar 9, 2024 •

edited

tiehuis Mar 9, 2024

tiehuis Mar 9, 2024

tiehuis Mar 9, 2024

tiehuis commented Mar 9, 2024 •

edited

andrewrk commented Mar 12, 2024

std.fmt: add ryu floating-point formatting implementation #19229

std.fmt: add ryu floating-point formatting implementation #19229

Conversation

tiehuis commented Mar 9, 2024 • edited

Behaviour Differences

Performance

Errol

Ryu

Code Size

Errol

Ryu

Testing

Future Work

tiehuis Mar 9, 2024

Choose a reason for hiding this comment

tiehuis Mar 9, 2024

Choose a reason for hiding this comment

tiehuis Mar 9, 2024

Choose a reason for hiding this comment

tiehuis commented Mar 9, 2024 • edited

andrewrk commented Mar 12, 2024

tiehuis commented Mar 9, 2024 •

edited

tiehuis commented Mar 9, 2024 •

edited