Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Manual Modulo from Puff Algorithm. #120

Closed
RogueForge1 opened this issue Oct 14, 2019 · 1 comment
Closed

Integrate Manual Modulo from Puff Algorithm. #120

RogueForge1 opened this issue Oct 14, 2019 · 1 comment

Comments

@RogueForge1
Copy link

RogueForge1 commented Oct 14, 2019

I am the author of the Fastest Method to Printing Integers and Floating-point numbers, which contains the manual modulo optimization for Ryu. When you do mod 100 div 100, you just did 2 division instructions when you only needed to do one. You're welcome. You are now apart of the Uniprinter (Universal Printer), which I'm aiming to be the world's fastest text processing library that grows from stack to heap with optional dynamic memory, compiles in about 3 seconds, and actually prints Unicode in C++ right.

I'm still looking through the algorithm for optimizations, I just cloned.

I just ran the 128-bit bit shifting tables from Puff and yes, the pattern generalizes. The uint32_t decimalLength(const uint128_t v) function is very slow.

#Bits Max #Bits Max #Bits #Bits Max
1 1.00E+00 33 8.59E+09 65 3.69E+19 97
2 3.00E+00 34 1.72E+10 66 7.38E+19 98
3 7.00E+00 35 3.44E+10 67 1.48E+20 99
4 1.50E+01 36 6.87E+10 68 2.95E+20 100
5 3.10E+01 37 1.37E+11 69 5.90E+20 101
6 6.30E+01 38 2.75E+11 70 1.18E+21 102
7 1.27E+02 39 5.50E+11 71 2.36E+21 103
8 2.55E+02 40 1.10E+12 72 4.72E+21 104
9 5.11E+02 41 2.20E+12 73 9.44E+21 105
10 1.02E+03 42 4.40E+12 74 1.89E+22 106
11 2.05E+03 43 8.80E+12 75 3.78E+22 107
12 4.10E+03 44 1.76E+13 76 7.56E+22 108
13 8.19E+03 45 3.52E+13 77 1.51E+23 109
14 1.64E+04 46 7.04E+13 78 3.02E+23 110
15 3.28E+04 47 1.41E+14 79 6.04E+23 111
16 6.55E+04 48 2.81E+14 80 1.21E+24 112
17 1.31E+05 49 5.63E+14 81 2.42E+24 113
18 2.62E+05 50 1.13E+15 82 4.84E+24 114
19 5.24E+05 51 2.25E+15 83 9.67E+24 115
20 1.05E+06 52 4.50E+15 84 1.93E+25 116
21 2.10E+06 53 9.01E+15 85 3.87E+25 117
22 4.19E+06 54 1.80E+16 86 7.74E+25 118
23 8.39E+06 55 3.60E+16 87 1.55E+26 119
24 1.68E+07 56 7.21E+16 88 3.09E+26 120
25 3.36E+07 57 1.44E+17 89 6.19E+26 121
26 6.71E+07 58 2.88E+17 90 1.24E+27 122
27 1.34E+08 59 5.76E+17 91 2.48E+27 123
28 2.68E+08 60 1.15E+18 92 4.95E+27 124
29 5.37E+08 61 2.31E+18 93 9.90E+27 125
30 1.07E+09 62 4.61E+18 94 1.98E+28 126
31 2.15E+09 63 9.22E+18 95 3.96E+28 127
32 4.29E+09 64 1.84E+19 96 7.92E+28 128

Mod 10 is very slow due to the number of decimals in the result (i.e. the less you have to divide by the faster their newton's method they use for the hardware division) I've benchmarked on 64-bit systems to be faster than mod 10 what is called the MSD Look-Up-Down (MSD-LUD), but I gave up on it when I discovered Puff. The only version I saved prints both ends of the number simultaneously, it is here, but that version of the algorithm has issues and Script2 is down for surgery for at least one more month to update the v1 ASCII Data Spec.

@ulfjack
Copy link
Owner

ulfjack commented Oct 14, 2019

I don't understand what you're talking about. Did you realize that the compiler replaces division by a constant with a multiplication? The decimalLength function is written that way because we expect ~17 digits on average after conversion if the numbers are uniformly randomly distributed.

@ulfjack ulfjack closed this as completed Nov 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants